API Rate Limiting: How It Works & How to Implement It (2026)

Q: What is the difference between rate limiting and throttling?

Rate limiting hard-rejects requests that exceed a threshold with HTTP 429. Throttling slows requests down by queuing or delaying them rather than rejecting them outright. Rate limiting is more common for API abuse prevention. Throttling is used when you want to honor every request but control throughput — such as a queue-based worker that processes jobs at a fixed rate regardless of how many are submitted.

Q: Which rate limiting algorithm should I use?

Token bucket is the most commonly used: it allows bursts up to the bucket capacity while enforcing an average rate. Sliding window log is the most accurate but memory-intensive (stores per-request timestamps). Fixed window counter is simplest but allows double-rate bursts at window boundaries. Leaky bucket smooths traffic but adds latency. For most APIs, use sliding window counter with Redis — it is accurate, memory-efficient, and scales horizontally.

Q: What headers should a rate-limited API return?

The IETF draft standard (draft-ietf-httpapi-ratelimit-headers) defines: RateLimit-Limit (the limit), RateLimit-Remaining (requests left in window), RateLimit-Reset (Unix timestamp when window resets). Also include Retry-After on 429 responses. GitHub and Stripe follow this convention. Returning these headers lets clients implement polite backoff rather than hammering your API until they fail.

Q: How do I implement rate limiting across multiple servers?

In-process rate limiting (single server) breaks with horizontal scaling — each server has its own counters, effectively multiplying your limit by the number of servers. Use a shared data store: Redis is the standard choice. Redis's atomic operations (INCR, SET with EX, Lua scripts) ensure consistency across all server instances. Redis Cell (a Redis module) implements sliding window rate limiting in O(1) with a single command.

Q: How should clients handle HTTP 429 Too Many Requests?

Read the Retry-After header — it specifies how many seconds to wait before retrying. If absent, use exponential backoff with jitter starting at 1 second. Never retry immediately in a tight loop — this amplifies the load rather than reducing it. Circuit breaker pattern: after N consecutive 429 responses, stop retrying for a cooling-off period and surface an error to the application rather than spinning indefinitely.

Q: Should I rate limit by IP address or API key?

Prefer API key rate limiting for authenticated endpoints — it correctly isolates usage per customer and survives NAT/shared IPs. IP-based limiting is appropriate for public/unauthenticated endpoints where no API key exists, but be aware that corporate networks and mobile carriers share IPs across thousands of users. Use both in a layered approach: strict IP-based limits for unauthenticated endpoints, generous API-key limits for authenticated ones.

Q: What HTTP status code does rate limiting return?

HTTP 429 Too Many Requests, defined in RFC 6585. The response should include a Retry-After header indicating when the client can retry. Some APIs incorrectly use 503 Service Unavailable, but 429 is semantically correct and allows clients to distinguish a rate limit from a genuine service outage and implement appropriate retry logic.

A single unthrottled API client sent 18 million requests in 24 hours to a mid-sized SaaS company's REST API in 2024 — a misconfigured SDK retry loop. The API stayed up, but it degraded response times for every other customer by 340% and consumed $4,200 in unexpected compute costs before the team noticed. The fix? 12 lines of rate limiting middleware. Total implementation time: 45 minutes.

According to a 2024 Gartner analysis cited by Zuplo, 73% of SaaS outages are linked to API overuse or poor traffic management. Rate limiting is not optional infrastructure — it is a correctness requirement for any public or multi-tenant API.

Key Takeaways

▸Token bucket is the most widely deployed algorithm — allows controlled bursts while enforcing an average rate. Used by Stripe, GitHub, and Cloudflare.
▸In-process rate limiting breaks with horizontal scaling — use Redis for distributed counter coordination across all server instances.
▸Always return RateLimit-Limit, RateLimit-Remaining, and Retry-After headers — clients need them to implement polite backoff.
▸Rate limit at multiple layers: API gateway (global), middleware (per route), and application code (per operation type).
▸HTTP 429 is the correct status code per RFC 6585 — not 503, which signals a service outage rather than a client limit.

Why APIs Need Rate Limiting

Rate limiting serves four distinct purposes in a production API:

DDoS protection — volumetric attacks overwhelm servers with requests. Per APIsec's 2025 report, API-targeted DDoS attacks increased 118% year-over-year, with attackers specifically targeting unprotected endpoints.
Runaway client prevention — misconfigured SDK retry loops, infinite polling, and bugs can generate millions of requests without rate limits to stop them.
Fair multi-tenancy — one customer's traffic spike should not degrade service for others. Rate limiting ensures equitable resource distribution.
Business model enforcement — tiered SaaS pricing (100 req/min for free tier, 10,000 req/min for enterprise) is implemented through rate limits.

The Four Rate Limiting Algorithms

1. Fixed Window Counter

Divide time into fixed intervals (e.g., 1-minute windows). Count requests in the current window. Reject when count exceeds limit. Reset counter at window boundary.

// Fixed window counter in Redis
async function fixedWindowRateLimit(
  key: string,        // e.g., "ratelimit:user:123"
  limit: number,      // max requests per window
  windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
  const windowKey = `${key}:${Math.floor(Date.now() / (windowSeconds * 1000))}`

  const pipeline = redis.pipeline()
  pipeline.incr(windowKey)
  pipeline.expire(windowKey, windowSeconds)
  const results = await pipeline.exec()
  const count = results![0][1] as number

  const resetAt = (Math.floor(Date.now() / (windowSeconds * 1000)) + 1) * windowSeconds

  return {
    allowed: count <= limit,
    remaining: Math.max(0, limit - count),
    resetAt,
  }
}

Boundary problem: A client can make 100 requests at 00:59 and 100 more at 01:01 — 200 requests in 2 seconds, double the intended rate. Acceptable for most use cases, but not for sensitive operations.

2. Sliding Window Counter

Maintains a more accurate rate by weighting the previous window's count proportionally. Eliminates the boundary burst problem with minimal memory overhead.

async function slidingWindowRateLimit(
  key: string,
  limit: number,
  windowSeconds: number
): Promise<{ allowed: boolean; remaining: number }> {
  const now = Date.now()
  const windowMs = windowSeconds * 1000

  // Current and previous window
  const currentWindow = Math.floor(now / windowMs)
  const previousWindow = currentWindow - 1

  const currentKey = `${key}:${currentWindow}`
  const previousKey = `${key}:${previousWindow}`

  // Fetch both window counts atomically
  const [currentCount, previousCount] = await redis.mget(currentKey, previousKey)
  const current = parseInt(currentCount ?? '0')
  const previous = parseInt(previousCount ?? '0')

  // Weight previous window by how far we are through the current window
  const windowProgress = (now % windowMs) / windowMs
  const weightedCount = previous * (1 - windowProgress) + current

  if (weightedCount >= limit) {
    return { allowed: false, remaining: 0 }
  }

  // Increment current window
  await redis.pipeline()
    .incr(currentKey)
    .expire(currentKey, windowSeconds * 2)
    .exec()

  return {
    allowed: true,
    remaining: Math.floor(limit - weightedCount - 1),
  }
}

3. Token Bucket

Each client has a "bucket" with a maximum token capacity. Tokens are added at a fixed refill rate. Each request consumes one token. When the bucket is empty, requests are rejected. This is the most widely deployed algorithm — it allows burst traffic up to the bucket capacity while enforcing an average rate.

// Token bucket implementation using Redis Lua script for atomicity
const TOKEN_BUCKET_SCRIPT = `
  local key = KEYS[1]
  local capacity = tonumber(ARGV[1])    -- max tokens
  local refillRate = tonumber(ARGV[2])  -- tokens added per second
  local now = tonumber(ARGV[3])         -- current timestamp (ms)
  local requested = tonumber(ARGV[4])   -- tokens needed (usually 1)

  local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
  local tokens = tonumber(bucket[1]) or capacity
  local lastRefill = tonumber(bucket[2]) or now

  -- Add tokens based on elapsed time
  local elapsed = (now - lastRefill) / 1000  -- seconds
  local newTokens = math.min(capacity, tokens + (elapsed * refillRate))

  if newTokens < requested then
    -- Not enough tokens
    redis.call('HMSET', key, 'tokens', newTokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)
    return {0, math.floor(newTokens)}
  end

  -- Consume tokens
  newTokens = newTokens - requested
  redis.call('HMSET', key, 'tokens', newTokens, 'last_refill', now)
  redis.call('EXPIRE', key, 3600)
  return {1, math.floor(newTokens)}
`

async function tokenBucketRateLimit(
  key: string,
  capacity: number,   // max burst size
  refillRate: number, // tokens per second
): Promise<{ allowed: boolean; remaining: number }> {
  const result = await redis.eval(
    TOKEN_BUCKET_SCRIPT,
    1,
    `ratelimit:bucket:${key}`,
    capacity.toString(),
    refillRate.toString(),
    Date.now().toString(),
    '1'
  ) as [number, number]

  return {
    allowed: result[0] === 1,
    remaining: result[1],
  }
}

// Example: Stripe-like API limits
// 100 req/s burst capacity, refills at 10 tokens/second (600 req/min average)
const { allowed } = await tokenBucketRateLimit(
  `user:${userId}`,
  100,  // burst
  10    // sustained rate
)

4. Leaky Bucket

Requests fill a bucket (queue) and are processed at a fixed rate, regardless of how fast they arrive. Unlike token bucket which rejects excess requests, leaky bucket queues them. This smooths traffic spikes at the cost of latency.

// Leaky bucket: throttle outbound requests at a fixed rate
// Useful for rate-limiting YOUR calls to third-party APIs

class LeakyBucket {
  private queue: Array<() => void> = []
  private processing = false

  constructor(
    private ratePerSecond: number,
    private maxQueueSize = 100
  ) {}

  async throttle<T>(fn: () => Promise<T>): Promise<T> {
    if (this.queue.length >= this.maxQueueSize) {
      throw new Error('Leaky bucket overflow — queue full')
    }

    return new Promise<T>((resolve, reject) => {
      this.queue.push(async () => {
        try {
          resolve(await fn())
        } catch (err) {
          reject(err)
        }
      })
      this.processQueue()
    })
  }

  private processQueue() {
    if (this.processing || this.queue.length === 0) return
    this.processing = true

    const intervalMs = 1000 / this.ratePerSecond
    const timer = setInterval(() => {
      const fn = this.queue.shift()
      if (!fn) {
        clearInterval(timer)
        this.processing = false
        return
      }
      fn()
    }, intervalMs)
  }
}

// Rate-limit outgoing requests to a third-party API (e.g., 5 req/s)
const limiter = new LeakyBucket(5, 50)
const results = await Promise.all(
  items.map(item => limiter.throttle(() => externalApi.process(item)))
)

Algorithm Comparison

Algorithm	Burst Handling	Memory	Accuracy	Best For
Fixed Window	❌ 2x burst at boundaries	✅ O(1) per key	⚠️ Medium	Simple quota enforcement
Sliding Window	✅ No boundary spikes	✅ O(1) per key	✅ High	Most APIs — accuracy + efficiency
Token Bucket	✅ Controlled burst capacity	✅ O(1) per key	✅ High	APIs needing burst allowance (Stripe, GitHub)
Sliding Window Log	✅ No boundary spikes	❌ O(n) per key (timestamp per request)	✅ Perfect	Low-traffic endpoints needing exact counts
Leaky Bucket	⚠️ Queued (adds latency)	✅ O(queue size)	✅ Perfect throughput	Outbound rate limiting to third-party APIs

Express.js Middleware Implementation

Here is a production-ready rate limiting middleware for Express that uses Redis sliding window and returns all required headers per the IETF draft-ietf-httpapi-ratelimit-headers spec.

import { Request, Response, NextFunction } from 'express'
import Redis from 'ioredis'

const redis = new Redis(process.env.REDIS_URL)

interface RateLimitOptions {
  windowSeconds: number   // Time window duration
  limit: number           // Max requests per window
  keyFn?: (req: Request) => string  // How to identify the client
  skipFn?: (req: Request) => boolean  // Skip rate limiting for certain requests
}

function createRateLimiter(options: RateLimitOptions) {
  const { windowSeconds, limit, keyFn, skipFn } = options

  return async (req: Request, res: Response, next: NextFunction) => {
    if (skipFn?.(req)) return next()

    // Identify client: prefer API key over IP
    const clientId = keyFn
      ? keyFn(req)
      : (req.headers['x-api-key'] as string) || req.ip!

    const key = `ratelimit:${req.path}:${clientId}`

    try {
      const now = Date.now()
      const windowMs = windowSeconds * 1000
      const currentWindow = Math.floor(now / windowMs)
      const previousWindow = currentWindow - 1

      const currentKey = `${key}:${currentWindow}`
      const previousKey = `${key}:${previousWindow}`

      const [currentCount, previousCount] = await redis.mget(currentKey, previousKey)
      const current = parseInt(currentCount ?? '0')
      const previous = parseInt(previousCount ?? '0')

      const windowProgress = (now % windowMs) / windowMs
      const weightedCount = Math.floor(previous * (1 - windowProgress) + current)

      const remaining = Math.max(0, limit - weightedCount - 1)
      const resetAt = (currentWindow + 1) * windowSeconds

      // Set rate limit headers (IETF draft standard)
      res.setHeader('RateLimit-Limit', limit)
      res.setHeader('RateLimit-Remaining', remaining)
      res.setHeader('RateLimit-Reset', resetAt)

      if (weightedCount >= limit) {
        const retryAfter = Math.ceil(resetAt - now / 1000)
        res.setHeader('Retry-After', retryAfter)

        return res.status(429).json({
          error: 'Too Many Requests',
          message: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
          retryAfter,
        })
      }

      await redis.pipeline()
        .incr(currentKey)
        .expire(currentKey, windowSeconds * 2)
        .exec()

      next()
    } catch (err) {
      // On Redis failure, fail open (allow the request)
      // Log the error but don't block the user
      console.error('Rate limiter Redis error:', err)
      next()
    }
  }
}

// Usage: apply different limits to different routes
const globalLimiter = createRateLimiter({
  windowSeconds: 60,
  limit: 100,
})

const authLimiter = createRateLimiter({
  windowSeconds: 900,  // 15-minute window
  limit: 10,           // 10 login attempts per 15 minutes
  keyFn: (req) => req.body?.email ?? req.ip,
})

const strictLimiter = createRateLimiter({
  windowSeconds: 3600,
  limit: 1000,
  skipFn: (req) => {
    // Internal health checks bypass rate limiting
    return req.headers['x-internal-token'] === process.env.INTERNAL_TOKEN
  },
})

app.use('/api/', globalLimiter)
app.post('/auth/login', authLimiter)
app.post('/auth/forgot-password', createRateLimiter({
  windowSeconds: 3600,
  limit: 3,  // Only 3 password resets per hour
  keyFn: (req) => req.body?.email ?? req.ip,
}))

Tiered Rate Limits for SaaS APIs

Production SaaS APIs implement tiered limits based on subscription level. According to a 2025 analysis of 50 SaaS pricing pages, 84% of APIs with tiered pricing use rate limits as the primary differentiator between free and paid tiers.

// Tiered rate limits by API key subscription level
const TIER_LIMITS = {
  free:       { requestsPerMin: 60,    requestsPerDay: 1000 },
  starter:    { requestsPerMin: 600,   requestsPerDay: 50000 },
  pro:        { requestsPerMin: 3000,  requestsPerDay: 500000 },
  enterprise: { requestsPerMin: 30000, requestsPerDay: Infinity },
}

async function getTierForApiKey(apiKey: string): Promise<keyof typeof TIER_LIMITS> {
  const subscription = await db.apiKeys.findUnique({
    where: { key: apiKey },
    select: { tier: true },
  })
  return (subscription?.tier ?? 'free') as keyof typeof TIER_LIMITS
}

function createTieredRateLimiter() {
  return async (req: Request, res: Response, next: NextFunction) => {
    const apiKey = req.headers['x-api-key'] as string

    if (!apiKey) {
      return res.status(401).json({ error: 'API key required' })
    }

    const tier = await getTierForApiKey(apiKey)
    const limits = TIER_LIMITS[tier]

    // Check per-minute limit
    const minuteResult = await slidingWindowRateLimit(
      `${apiKey}:minute`,
      limits.requestsPerMin,
      60
    )

    if (!minuteResult.allowed) {
      return res.status(429).json({
        error: 'Rate limit exceeded',
        tier,
        limit: limits.requestsPerMin,
        window: '1 minute',
        upgradeUrl: 'https://yourapi.com/pricing',
      })
    }

    // Check daily limit (skip for enterprise — Infinity check)
    if (limits.requestsPerDay !== Infinity) {
      const dayResult = await slidingWindowRateLimit(
        `${apiKey}:day`,
        limits.requestsPerDay,
        86400
      )

      if (!dayResult.allowed) {
        return res.status(429).json({
          error: 'Daily quota exceeded',
          tier,
          limit: limits.requestsPerDay,
          window: '24 hours',
          upgradeUrl: 'https://yourapi.com/pricing',
        })
      }
    }

    // Expose tier info in headers
    res.setHeader('X-API-Tier', tier)
    res.setHeader('RateLimit-Limit', limits.requestsPerMin)
    next()
  }
}

Client-Side: Handling 429 Correctly

Most 429 errors in production come from SDK retry bugs — a client retrying immediately in a tight loop, amplifying the problem rather than backing off. Here is a correct implementation:

class RateLimitAwareClient {
  private circuitBreakerCount = 0
  private circuitBreakerOpen = false
  private circuitBreakerResetAt: number | null = null

  async request<T>(
    url: string,
    options: RequestInit = {},
    maxRetries = 3
  ): Promise<T> {
    // Circuit breaker: stop hammering after 5 consecutive 429s
    if (this.circuitBreakerOpen) {
      if (Date.now() < (this.circuitBreakerResetAt ?? 0)) {
        throw new Error('Circuit breaker open — cooling off after rate limit errors')
      }
      this.circuitBreakerOpen = false
      this.circuitBreakerCount = 0
    }

    for (let attempt = 0; attempt <= maxRetries; attempt++) {
      const response = await fetch(url, options)

      if (response.status === 429) {
        this.circuitBreakerCount++

        if (this.circuitBreakerCount >= 5) {
          this.circuitBreakerOpen = true
          this.circuitBreakerResetAt = Date.now() + 60_000  // 1 minute
          throw new Error('Too many rate limit errors — circuit breaker opened')
        }

        if (attempt === maxRetries) {
          throw new Error('Max retries exceeded due to rate limiting')
        }

        // Respect Retry-After header
        const retryAfter = response.headers.get('Retry-After')
        const waitMs = retryAfter
          ? parseInt(retryAfter) * 1000
          : Math.min(1000 * Math.pow(2, attempt) + Math.random() * 1000, 32000)

        console.warn(`Rate limited. Waiting ${waitMs}ms before retry ${attempt + 1}.`)
        await new Promise(resolve => setTimeout(resolve, waitMs))
        continue
      }

      this.circuitBreakerCount = 0  // Reset on success
      return response.json() as Promise<T>
    }

    throw new Error('Unreachable')
  }
}

Rate Limiting at Scale: Distributed Systems

In-process rate limiting is broken by definition in a horizontally scaled API. If you have 5 app servers and each allows 100 req/min per key, your actual limit is 500 req/min. The solution is always a shared external store.

# Redis Cell module (recommended for production)
# Implements GCRA (Generic Cell Rate Algorithm) — similar to sliding window
# O(1) time, O(1) memory per key, single round-trip

# Install: loadmodule /path/to/redis-cell.so
# Command: CL.THROTTLE key max_burst count_per_period period [quantity]

CL.THROTTLE user:123 99 60 60 1
# key:           user:123
# max_burst:     99 (allow up to 100 total, 0-indexed)
# count_per_period: 60 requests
# period:        60 seconds
# quantity:      1 (requesting 1 token)

# Response: [allowed, limit, remaining, retry_after, reset_after]
# [0, 100, 0, 2, 2]  → rejected, 2s until retry

# Node.js integration:
const result = await redis.call(
  'CL.THROTTLE',
  `ratelimit:${userId}`,
  '99',   // max burst (capacity - 1)
  '60',   // requests per period
  '60',   // period in seconds
  '1'     // tokens to consume
) as [number, number, number, number, number]

const [allowed, limit, remaining, retryAfter, resetAfter] = result
// allowed: 0 = blocked, 1 = allowed (note: inverted from typical boolean)

# For teams without Redis Cell, use redis-rate-limiter-flexible (npm)
# It implements multiple algorithms with Redis backend and 16M weekly downloads

For even larger scale — hundreds of thousands of requests per second — consider API-gateway level rate limiting with consistent hashing: route all requests for a given key to the same gateway node, enabling local rate limiting without Redis round-trips. Cloudflare uses this approach for their Workers rate limiting, processing billions of requests per day.

Rate Limiting Headers Reference

The IETF draft-ietf-httpapi-ratelimit-headers specification standardizes the headers your API should return. Both GitHub and Stripe follow this convention.

Header	Example Value	Meaning
RateLimit-Limit	60	Requests allowed per window
RateLimit-Remaining	42	Requests remaining in current window
RateLimit-Reset	1714085460	Unix timestamp when window resets
Retry-After	30	Seconds until client can retry (429 only)
X-RateLimit-Limit	5000	GitHub convention (legacy prefix)
X-RateLimit-Used	23	GitHub: requests used in window

Where to Apply Rate Limiting

Production APIs layer rate limiting at multiple points. Each layer serves a distinct purpose and protects against different attack vectors.

# Layer 1: Infrastructure (Cloudflare, AWS WAF, Nginx)
# Purpose: Volume-based DDoS protection before traffic hits your app
# Limit: 10,000 req/min per IP at the edge — blocks botnets and scanners
# Cost: Free (Cloudflare) or near-free (WAF rules)

# Nginx rate limiting (simple, no Redis required)
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s;
location /api/ {
  limit_req zone=api burst=200 nodelay;
  limit_req_status 429;
}

# Layer 2: API Gateway (Kong, AWS API Gateway, Nginx with Lua)
# Purpose: Per-consumer limits and route-specific policies
# Limit: Based on API key tier — free/pro/enterprise

# Layer 3: Application middleware (your code)
# Purpose: Business logic limits — operation-specific constraints
# Examples:
#   - Auth endpoints: 10 login attempts per 15 min per email
#   - File uploads: 50 uploads per day per account
#   - Expensive operations: 10 AI generation requests per hour

# Layer 4: Database query guards
# Purpose: Prevent individual queries from consuming all DB connections
# Use connection pool limits + query timeout (not request rate limiting)
pool.connect({
  max: 20,                    // Max 20 concurrent DB connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
})

For API design patterns that complement rate limiting, see our guide on REST API Best Practices. For understanding the HTTP status codes your rate limiter should return, see our HTTP Status Codes Guide.

Frequently Asked Questions

What is API rate limiting?

API rate limiting controls how many requests a client can make within a time window. When exceeded, the server returns HTTP 429 Too Many Requests. It protects against DDoS attacks, prevents runaway clients from consuming all resources, ensures fair usage across tenants, and enforces tiered pricing models.

What is the difference between rate limiting and throttling?

Rate limiting hard-rejects requests over the threshold with HTTP 429. Throttling queues or delays excess requests rather than rejecting them. Rate limiting is better for abuse prevention. Throttling is used when you want to honor every request but control throughput — such as outbound calls to third-party APIs.

Which rate limiting algorithm should I use?

Token bucket for most APIs — it allows controlled bursts while enforcing an average rate. Sliding window counter when you need accuracy without per-request memory storage. Sliding window log for perfect accuracy on low-traffic endpoints. Leaky bucket for smoothing outbound requests to external services.

What headers should a rate-limited API return?

Per IETF draft spec: RateLimit-Limit (window limit), RateLimit-Remaining (left in window), RateLimit-Reset (Unix timestamp of next reset). Include Retry-After on 429 responses. GitHub and Stripe both follow this convention with the X-RateLimit-* prefix for legacy compatibility.

How do I implement rate limiting across multiple servers?

Use Redis as a shared counter store — in-process limiters multiply your effective limit by server count. Redis's atomic INCR + EXPIRE or Lua scripts ensure consistent counts. Redis Cell module implements sliding window in a single O(1) command. redis-rate-limiter-flexible (16M npm downloads/week) wraps this for Node.js.

How should clients handle HTTP 429 Too Many Requests?

Read the Retry-After header and wait that many seconds before retrying. If absent, use exponential backoff with jitter (1s, 2s, 4s, 8s...). Implement a circuit breaker: after 5 consecutive 429s, stop retrying for 60 seconds. Never retry immediately — tight retry loops amplify the problem.

Should I rate limit by IP address or API key?

API key for authenticated endpoints — correctly isolates per-customer and survives shared IPs. IP-based for unauthenticated/public endpoints where no API key exists, but be aware corporate NAT can hide thousands of users behind one IP. Use both in layers: strict IP limits for unauthenticated, generous API-key limits for authenticated endpoints.

What HTTP status code does rate limiting return?

HTTP 429 Too Many Requests, defined in RFC 6585. Include Retry-After in the response. Do not use 503 Service Unavailable — that signals a service outage and causes different client behavior. Correct status codes let clients distinguish rate limits from genuine failures and implement appropriate retry logic.

Build and Debug APIs with BytePane

Format and validate your rate limit response payloads with the JSON Formatter. Decode JWT tokens from API authentication headers with the JWT Decoder. Check API response codes with our HTTP Status Codes Guide.

Open JSON Formatter

API Rate Limiting: How It Works & How to Implement It