Rate limiting vs Throttling

Rate limiting caps how many requests are allowed in a window (e.g. 100/minute per user); excess requests are rejected (e.g. 429). Throttling slows down or queues requests so they are processed at a maximum rate rather than rejected — the client may wait or get delayed responses.

Rate limiting: hard cap, reject excess

flowchart LR R[Requests] --> C{Counter < Limit?} C -->|Yes| A[Allow 200] C -->|No| B[Reject 429] A --> Inc[Increment counter] Inc --> Window[Per window e.g. 1 min]

Throttling: slow down, don’t reject

flowchart LR R[Requests] --> Q[Queue] Q --> R2[Release at max rate] R2 --> P[Process] Note[Client may wait / get delayed response]
AspectRate limitingThrottling
Excess trafficRejected (429)Queued or delayed
Client experienceClear “too many requests”Slower response or backpressure
Typical useAPI quotas, abuse preventionProtect backend, smooth load
ImplementationToken bucket, sliding window, fixed windowQueue + worker rate, leaky bucket

Common rate-limit strategies

flowchart TB subgraph Fixed["Fixed window"] F1[0-60s: count] F2[61-120s: new count] end subgraph Sliding["Sliding window"] S1[Last 60s from now] end subgraph Token["Token bucket"] T1[Tokens refill at rate] T2[Request consumes 1 token] end

Use rate limiting when you want to enforce quotas and fail fast. Use throttling when you want to absorb bursts and degrade gracefully instead of rejecting.