Rate limiting vs Throttling

Rate limiting caps how many requests are allowed in a window (e.g. 100/minute per user); excess requests are rejected (e.g. 429). Throttling slows down or queues requests so they are processed at a maximum rate rather than rejected — the client may wait or get delayed responses.

Rate limiting: hard cap, reject excess

flowchart LR R[Requests] --> C{Counter < Limit?} C -->|Yes| A[Allow 200] C -->|No| B[Reject 429] A --> Inc[Increment counter] Inc --> Window[Per window e.g. 1 min]

Throttling: slow down, don’t reject

flowchart LR R[Requests] --> Q[Queue] Q --> R2[Release at max rate] R2 --> P[Process] Note[Client may wait / get delayed response]

Aspect	Rate limiting	Throttling
Excess traffic	Rejected (429)	Queued or delayed
Client experience	Clear “too many requests”	Slower response or backpressure
Typical use	API quotas, abuse prevention	Protect backend, smooth load
Implementation	Token bucket, sliding window, fixed window	Queue + worker rate, leaky bucket

Common rate-limit strategies

flowchart TB subgraph Fixed["Fixed window"] F1[0-60s: count] F2[61-120s: new count] end subgraph Sliding["Sliding window"] S1[Last 60s from now] end subgraph Token["Token bucket"] T1[Tokens refill at rate] T2[Request consumes 1 token] end

Use rate limiting when you want to enforce quotas and fail fast. Use throttling when you want to absorb bursts and degrade gracefully instead of rejecting.