API Rate Limiting: How It Works & How to Implement It
A single unthrottled API client sent 18 million requests in 24 hours to a mid-sized SaaS company's REST API in 2024 — a misconfigured SDK retry loop. The API stayed up, but it degraded response times for every other customer by 340% and consumed $4,200 in unexpected compute costs before the team noticed. The fix? 12 lines of rate limiting middleware. Total implementation time: 45 minutes.
According to a 2024 Gartner analysis cited by Zuplo, 73% of SaaS outages are linked to API overuse or poor traffic management. Rate limiting is not optional infrastructure — it is a correctness requirement for any public or multi-tenant API.
Key Takeaways
- ▸Token bucket is the most widely deployed algorithm — allows controlled bursts while enforcing an average rate. Used by Stripe, GitHub, and Cloudflare.
- ▸In-process rate limiting breaks with horizontal scaling — use Redis for distributed counter coordination across all server instances.
- ▸Always return RateLimit-Limit, RateLimit-Remaining, and Retry-After headers — clients need them to implement polite backoff.
- ▸Rate limit at multiple layers: API gateway (global), middleware (per route), and application code (per operation type).
- ▸HTTP 429 is the correct status code per RFC 6585 — not 503, which signals a service outage rather than a client limit.
Why APIs Need Rate Limiting
Rate limiting serves four distinct purposes in a production API:
- DDoS protection — volumetric attacks overwhelm servers with requests. Per APIsec's 2025 report, API-targeted DDoS attacks increased 118% year-over-year, with attackers specifically targeting unprotected endpoints.
- Runaway client prevention — misconfigured SDK retry loops, infinite polling, and bugs can generate millions of requests without rate limits to stop them.
- Fair multi-tenancy — one customer's traffic spike should not degrade service for others. Rate limiting ensures equitable resource distribution.
- Business model enforcement — tiered SaaS pricing (100 req/min for free tier, 10,000 req/min for enterprise) is implemented through rate limits.
The Four Rate Limiting Algorithms
1. Fixed Window Counter
Divide time into fixed intervals (e.g., 1-minute windows). Count requests in the current window. Reject when count exceeds limit. Reset counter at window boundary.
// Fixed window counter in Redis
async function fixedWindowRateLimit(
key: string, // e.g., "ratelimit:user:123"
limit: number, // max requests per window
windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
const windowKey = `${key}:${Math.floor(Date.now() / (windowSeconds * 1000))}`
const pipeline = redis.pipeline()
pipeline.incr(windowKey)
pipeline.expire(windowKey, windowSeconds)
const results = await pipeline.exec()
const count = results![0][1] as number
const resetAt = (Math.floor(Date.now() / (windowSeconds * 1000)) + 1) * windowSeconds
return {
allowed: count <= limit,
remaining: Math.max(0, limit - count),
resetAt,
}
}Boundary problem: A client can make 100 requests at 00:59 and 100 more at 01:01 — 200 requests in 2 seconds, double the intended rate. Acceptable for most use cases, but not for sensitive operations.
2. Sliding Window Counter
Maintains a more accurate rate by weighting the previous window's count proportionally. Eliminates the boundary burst problem with minimal memory overhead.
async function slidingWindowRateLimit(
key: string,
limit: number,
windowSeconds: number
): Promise<{ allowed: boolean; remaining: number }> {
const now = Date.now()
const windowMs = windowSeconds * 1000
// Current and previous window
const currentWindow = Math.floor(now / windowMs)
const previousWindow = currentWindow - 1
const currentKey = `${key}:${currentWindow}`
const previousKey = `${key}:${previousWindow}`
// Fetch both window counts atomically
const [currentCount, previousCount] = await redis.mget(currentKey, previousKey)
const current = parseInt(currentCount ?? '0')
const previous = parseInt(previousCount ?? '0')
// Weight previous window by how far we are through the current window
const windowProgress = (now % windowMs) / windowMs
const weightedCount = previous * (1 - windowProgress) + current
if (weightedCount >= limit) {
return { allowed: false, remaining: 0 }
}
// Increment current window
await redis.pipeline()
.incr(currentKey)
.expire(currentKey, windowSeconds * 2)
.exec()
return {
allowed: true,
remaining: Math.floor(limit - weightedCount - 1),
}
}3. Token Bucket
Each client has a "bucket" with a maximum token capacity. Tokens are added at a fixed refill rate. Each request consumes one token. When the bucket is empty, requests are rejected. This is the most widely deployed algorithm — it allows burst traffic up to the bucket capacity while enforcing an average rate.
// Token bucket implementation using Redis Lua script for atomicity
const TOKEN_BUCKET_SCRIPT = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1]) -- max tokens
local refillRate = tonumber(ARGV[2]) -- tokens added per second
local now = tonumber(ARGV[3]) -- current timestamp (ms)
local requested = tonumber(ARGV[4]) -- tokens needed (usually 1)
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local lastRefill = tonumber(bucket[2]) or now
-- Add tokens based on elapsed time
local elapsed = (now - lastRefill) / 1000 -- seconds
local newTokens = math.min(capacity, tokens + (elapsed * refillRate))
if newTokens < requested then
-- Not enough tokens
redis.call('HMSET', key, 'tokens', newTokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {0, math.floor(newTokens)}
end
-- Consume tokens
newTokens = newTokens - requested
redis.call('HMSET', key, 'tokens', newTokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {1, math.floor(newTokens)}
`
async function tokenBucketRateLimit(
key: string,
capacity: number, // max burst size
refillRate: number, // tokens per second
): Promise<{ allowed: boolean; remaining: number }> {
const result = await redis.eval(
TOKEN_BUCKET_SCRIPT,
1,
`ratelimit:bucket:${key}`,
capacity.toString(),
refillRate.toString(),
Date.now().toString(),
'1'
) as [number, number]
return {
allowed: result[0] === 1,
remaining: result[1],
}
}
// Example: Stripe-like API limits
// 100 req/s burst capacity, refills at 10 tokens/second (600 req/min average)
const { allowed } = await tokenBucketRateLimit(
`user:${userId}`,
100, // burst
10 // sustained rate
)4. Leaky Bucket
Requests fill a bucket (queue) and are processed at a fixed rate, regardless of how fast they arrive. Unlike token bucket which rejects excess requests, leaky bucket queues them. This smooths traffic spikes at the cost of latency.
// Leaky bucket: throttle outbound requests at a fixed rate
// Useful for rate-limiting YOUR calls to third-party APIs
class LeakyBucket {
private queue: Array<() => void> = []
private processing = false
constructor(
private ratePerSecond: number,
private maxQueueSize = 100
) {}
async throttle<T>(fn: () => Promise<T>): Promise<T> {
if (this.queue.length >= this.maxQueueSize) {
throw new Error('Leaky bucket overflow — queue full')
}
return new Promise<T>((resolve, reject) => {
this.queue.push(async () => {
try {
resolve(await fn())
} catch (err) {
reject(err)
}
})
this.processQueue()
})
}
private processQueue() {
if (this.processing || this.queue.length === 0) return
this.processing = true
const intervalMs = 1000 / this.ratePerSecond
const timer = setInterval(() => {
const fn = this.queue.shift()
if (!fn) {
clearInterval(timer)
this.processing = false
return
}
fn()
}, intervalMs)
}
}
// Rate-limit outgoing requests to a third-party API (e.g., 5 req/s)
const limiter = new LeakyBucket(5, 50)
const results = await Promise.all(
items.map(item => limiter.throttle(() => externalApi.process(item)))
)Algorithm Comparison
| Algorithm | Burst Handling | Memory | Accuracy | Best For |
|---|---|---|---|---|
| Fixed Window | ❌ 2x burst at boundaries | ✅ O(1) per key | ⚠️ Medium | Simple quota enforcement |
| Sliding Window | ✅ No boundary spikes | ✅ O(1) per key | ✅ High | Most APIs — accuracy + efficiency |
| Token Bucket | ✅ Controlled burst capacity | ✅ O(1) per key | ✅ High | APIs needing burst allowance (Stripe, GitHub) |
| Sliding Window Log | ✅ No boundary spikes | ❌ O(n) per key (timestamp per request) | ✅ Perfect | Low-traffic endpoints needing exact counts |
| Leaky Bucket | ⚠️ Queued (adds latency) | ✅ O(queue size) | ✅ Perfect throughput | Outbound rate limiting to third-party APIs |
Express.js Middleware Implementation
Here is a production-ready rate limiting middleware for Express that uses Redis sliding window and returns all required headers per the IETF draft-ietf-httpapi-ratelimit-headers spec.
import { Request, Response, NextFunction } from 'express'
import Redis from 'ioredis'
const redis = new Redis(process.env.REDIS_URL)
interface RateLimitOptions {
windowSeconds: number // Time window duration
limit: number // Max requests per window
keyFn?: (req: Request) => string // How to identify the client
skipFn?: (req: Request) => boolean // Skip rate limiting for certain requests
}
function createRateLimiter(options: RateLimitOptions) {
const { windowSeconds, limit, keyFn, skipFn } = options
return async (req: Request, res: Response, next: NextFunction) => {
if (skipFn?.(req)) return next()
// Identify client: prefer API key over IP
const clientId = keyFn
? keyFn(req)
: (req.headers['x-api-key'] as string) || req.ip!
const key = `ratelimit:${req.path}:${clientId}`
try {
const now = Date.now()
const windowMs = windowSeconds * 1000
const currentWindow = Math.floor(now / windowMs)
const previousWindow = currentWindow - 1
const currentKey = `${key}:${currentWindow}`
const previousKey = `${key}:${previousWindow}`
const [currentCount, previousCount] = await redis.mget(currentKey, previousKey)
const current = parseInt(currentCount ?? '0')
const previous = parseInt(previousCount ?? '0')
const windowProgress = (now % windowMs) / windowMs
const weightedCount = Math.floor(previous * (1 - windowProgress) + current)
const remaining = Math.max(0, limit - weightedCount - 1)
const resetAt = (currentWindow + 1) * windowSeconds
// Set rate limit headers (IETF draft standard)
res.setHeader('RateLimit-Limit', limit)
res.setHeader('RateLimit-Remaining', remaining)
res.setHeader('RateLimit-Reset', resetAt)
if (weightedCount >= limit) {
const retryAfter = Math.ceil(resetAt - now / 1000)
res.setHeader('Retry-After', retryAfter)
return res.status(429).json({
error: 'Too Many Requests',
message: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
retryAfter,
})
}
await redis.pipeline()
.incr(currentKey)
.expire(currentKey, windowSeconds * 2)
.exec()
next()
} catch (err) {
// On Redis failure, fail open (allow the request)
// Log the error but don't block the user
console.error('Rate limiter Redis error:', err)
next()
}
}
}
// Usage: apply different limits to different routes
const globalLimiter = createRateLimiter({
windowSeconds: 60,
limit: 100,
})
const authLimiter = createRateLimiter({
windowSeconds: 900, // 15-minute window
limit: 10, // 10 login attempts per 15 minutes
keyFn: (req) => req.body?.email ?? req.ip,
})
const strictLimiter = createRateLimiter({
windowSeconds: 3600,
limit: 1000,
skipFn: (req) => {
// Internal health checks bypass rate limiting
return req.headers['x-internal-token'] === process.env.INTERNAL_TOKEN
},
})
app.use('/api/', globalLimiter)
app.post('/auth/login', authLimiter)
app.post('/auth/forgot-password', createRateLimiter({
windowSeconds: 3600,
limit: 3, // Only 3 password resets per hour
keyFn: (req) => req.body?.email ?? req.ip,
}))Tiered Rate Limits for SaaS APIs
Production SaaS APIs implement tiered limits based on subscription level. According to a 2025 analysis of 50 SaaS pricing pages, 84% of APIs with tiered pricing use rate limits as the primary differentiator between free and paid tiers.
// Tiered rate limits by API key subscription level
const TIER_LIMITS = {
free: { requestsPerMin: 60, requestsPerDay: 1000 },
starter: { requestsPerMin: 600, requestsPerDay: 50000 },
pro: { requestsPerMin: 3000, requestsPerDay: 500000 },
enterprise: { requestsPerMin: 30000, requestsPerDay: Infinity },
}
async function getTierForApiKey(apiKey: string): Promise<keyof typeof TIER_LIMITS> {
const subscription = await db.apiKeys.findUnique({
where: { key: apiKey },
select: { tier: true },
})
return (subscription?.tier ?? 'free') as keyof typeof TIER_LIMITS
}
function createTieredRateLimiter() {
return async (req: Request, res: Response, next: NextFunction) => {
const apiKey = req.headers['x-api-key'] as string
if (!apiKey) {
return res.status(401).json({ error: 'API key required' })
}
const tier = await getTierForApiKey(apiKey)
const limits = TIER_LIMITS[tier]
// Check per-minute limit
const minuteResult = await slidingWindowRateLimit(
`${apiKey}:minute`,
limits.requestsPerMin,
60
)
if (!minuteResult.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
tier,
limit: limits.requestsPerMin,
window: '1 minute',
upgradeUrl: 'https://yourapi.com/pricing',
})
}
// Check daily limit (skip for enterprise — Infinity check)
if (limits.requestsPerDay !== Infinity) {
const dayResult = await slidingWindowRateLimit(
`${apiKey}:day`,
limits.requestsPerDay,
86400
)
if (!dayResult.allowed) {
return res.status(429).json({
error: 'Daily quota exceeded',
tier,
limit: limits.requestsPerDay,
window: '24 hours',
upgradeUrl: 'https://yourapi.com/pricing',
})
}
}
// Expose tier info in headers
res.setHeader('X-API-Tier', tier)
res.setHeader('RateLimit-Limit', limits.requestsPerMin)
next()
}
}Client-Side: Handling 429 Correctly
Most 429 errors in production come from SDK retry bugs — a client retrying immediately in a tight loop, amplifying the problem rather than backing off. Here is a correct implementation:
class RateLimitAwareClient {
private circuitBreakerCount = 0
private circuitBreakerOpen = false
private circuitBreakerResetAt: number | null = null
async request<T>(
url: string,
options: RequestInit = {},
maxRetries = 3
): Promise<T> {
// Circuit breaker: stop hammering after 5 consecutive 429s
if (this.circuitBreakerOpen) {
if (Date.now() < (this.circuitBreakerResetAt ?? 0)) {
throw new Error('Circuit breaker open — cooling off after rate limit errors')
}
this.circuitBreakerOpen = false
this.circuitBreakerCount = 0
}
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options)
if (response.status === 429) {
this.circuitBreakerCount++
if (this.circuitBreakerCount >= 5) {
this.circuitBreakerOpen = true
this.circuitBreakerResetAt = Date.now() + 60_000 // 1 minute
throw new Error('Too many rate limit errors — circuit breaker opened')
}
if (attempt === maxRetries) {
throw new Error('Max retries exceeded due to rate limiting')
}
// Respect Retry-After header
const retryAfter = response.headers.get('Retry-After')
const waitMs = retryAfter
? parseInt(retryAfter) * 1000
: Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 32000)
console.warn(`Rate limited. Waiting ${waitMs}ms before retry ${attempt + 1}.`)
await new Promise(resolve => setTimeout(resolve, waitMs))
continue
}
this.circuitBreakerCount = 0 // Reset on success
return response.json() as Promise<T>
}
throw new Error('Unreachable')
}
}Rate Limiting at Scale: Distributed Systems
In-process rate limiting is broken by definition in a horizontally scaled API. If you have 5 app servers and each allows 100 req/min per key, your actual limit is 500 req/min. The solution is always a shared external store.
# Redis Cell module (recommended for production)
# Implements GCRA (Generic Cell Rate Algorithm) — similar to sliding window
# O(1) time, O(1) memory per key, single round-trip
# Install: loadmodule /path/to/redis-cell.so
# Command: CL.THROTTLE key max_burst count_per_period period [quantity]
CL.THROTTLE user:123 99 60 60 1
# key: user:123
# max_burst: 99 (allow up to 100 total, 0-indexed)
# count_per_period: 60 requests
# period: 60 seconds
# quantity: 1 (requesting 1 token)
# Response: [allowed, limit, remaining, retry_after, reset_after]
# [0, 100, 0, 2, 2] → rejected, 2s until retry
# Node.js integration:
const result = await redis.call(
'CL.THROTTLE',
`ratelimit:${userId}`,
'99', // max burst (capacity - 1)
'60', // requests per period
'60', // period in seconds
'1' // tokens to consume
) as [number, number, number, number, number]
const [allowed, limit, remaining, retryAfter, resetAfter] = result
// allowed: 0 = blocked, 1 = allowed (note: inverted from typical boolean)
# For teams without Redis Cell, use redis-rate-limiter-flexible (npm)
# It implements multiple algorithms with Redis backend and 16M weekly downloadsFor even larger scale — hundreds of thousands of requests per second — consider API-gateway level rate limiting with consistent hashing: route all requests for a given key to the same gateway node, enabling local rate limiting without Redis round-trips. Cloudflare uses this approach for their Workers rate limiting, processing billions of requests per day.
Rate Limiting Headers Reference
The IETF draft-ietf-httpapi-ratelimit-headers specification standardizes the headers your API should return. Both GitHub and Stripe follow this convention.
| Header | Example Value | Meaning |
|---|---|---|
| RateLimit-Limit | 60 | Requests allowed per window |
| RateLimit-Remaining | 42 | Requests remaining in current window |
| RateLimit-Reset | 1714085460 | Unix timestamp when window resets |
| Retry-After | 30 | Seconds until client can retry (429 only) |
| X-RateLimit-Limit | 5000 | GitHub convention (legacy prefix) |
| X-RateLimit-Used | 23 | GitHub: requests used in window |
Where to Apply Rate Limiting
Production APIs layer rate limiting at multiple points. Each layer serves a distinct purpose and protects against different attack vectors.
# Layer 1: Infrastructure (Cloudflare, AWS WAF, Nginx)
# Purpose: Volume-based DDoS protection before traffic hits your app
# Limit: 10,000 req/min per IP at the edge — blocks botnets and scanners
# Cost: Free (Cloudflare) or near-free (WAF rules)
# Nginx rate limiting (simple, no Redis required)
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s;
location /api/ {
limit_req zone=api burst=200 nodelay;
limit_req_status 429;
}
# Layer 2: API Gateway (Kong, AWS API Gateway, Nginx with Lua)
# Purpose: Per-consumer limits and route-specific policies
# Limit: Based on API key tier — free/pro/enterprise
# Layer 3: Application middleware (your code)
# Purpose: Business logic limits — operation-specific constraints
# Examples:
# - Auth endpoints: 10 login attempts per 15 min per email
# - File uploads: 50 uploads per day per account
# - Expensive operations: 10 AI generation requests per hour
# Layer 4: Database query guards
# Purpose: Prevent individual queries from consuming all DB connections
# Use connection pool limits + query timeout (not request rate limiting)
pool.connect({
max: 20, // Max 20 concurrent DB connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
})For API design patterns that complement rate limiting, see our guide on REST API Best Practices. For understanding the HTTP status codes your rate limiter should return, see our HTTP Status Codes Guide.
Frequently Asked Questions
What is API rate limiting?
API rate limiting controls how many requests a client can make within a time window. When exceeded, the server returns HTTP 429 Too Many Requests. It protects against DDoS attacks, prevents runaway clients from consuming all resources, ensures fair usage across tenants, and enforces tiered pricing models.
What is the difference between rate limiting and throttling?
Rate limiting hard-rejects requests over the threshold with HTTP 429. Throttling queues or delays excess requests rather than rejecting them. Rate limiting is better for abuse prevention. Throttling is used when you want to honor every request but control throughput — such as outbound calls to third-party APIs.
Which rate limiting algorithm should I use?
Token bucket for most APIs — it allows controlled bursts while enforcing an average rate. Sliding window counter when you need accuracy without per-request memory storage. Sliding window log for perfect accuracy on low-traffic endpoints. Leaky bucket for smoothing outbound requests to external services.
What headers should a rate-limited API return?
Per IETF draft spec: RateLimit-Limit (window limit), RateLimit-Remaining (left in window), RateLimit-Reset (Unix timestamp of next reset). Include Retry-After on 429 responses. GitHub and Stripe both follow this convention with the X-RateLimit-* prefix for legacy compatibility.
How do I implement rate limiting across multiple servers?
Use Redis as a shared counter store — in-process limiters multiply your effective limit by server count. Redis's atomic INCR + EXPIRE or Lua scripts ensure consistent counts. Redis Cell module implements sliding window in a single O(1) command. redis-rate-limiter-flexible (16M npm downloads/week) wraps this for Node.js.
How should clients handle HTTP 429 Too Many Requests?
Read the Retry-After header and wait that many seconds before retrying. If absent, use exponential backoff with jitter (1s, 2s, 4s, 8s...). Implement a circuit breaker: after 5 consecutive 429s, stop retrying for 60 seconds. Never retry immediately — tight retry loops amplify the problem.
Should I rate limit by IP address or API key?
API key for authenticated endpoints — correctly isolates per-customer and survives shared IPs. IP-based for unauthenticated/public endpoints where no API key exists, but be aware corporate NAT can hide thousands of users behind one IP. Use both in layers: strict IP limits for unauthenticated, generous API-key limits for authenticated endpoints.
What HTTP status code does rate limiting return?
HTTP 429 Too Many Requests, defined in RFC 6585. Include Retry-After in the response. Do not use 503 Service Unavailable — that signals a service outage and causes different client behavior. Correct status codes let clients distinguish rate limits from genuine failures and implement appropriate retry logic.
Build and Debug APIs with BytePane
Format and validate your rate limit response payloads with the JSON Formatter. Decode JWT tokens from API authentication headers with the JWT Decoder. Check API response codes with our HTTP Status Codes Guide.
Open JSON FormatterRelated Articles
REST API Best Practices
Holistic API design including versioning, error handling, and security.
HTTP Status Codes Guide
429, 503, 401 — the codes your rate limiter needs to return correctly.
Webhook Guide
Rate limiting outbound webhook delivery to third-party endpoints.
API Authentication Methods
API keys, JWT, OAuth — the identity layer that powers per-key rate limits.