Cache aggressively — Static and cacheable API responses at edge and app layer to reduce origin load.
Rate limiting — Per user/IP/key to cap abuse and give fair share; return 429 when over limit.
Queue non-critical work — E.g. send email, analytics; process from queue so web tier stays responsive.
Auto-scaling — Scale out app (and optionally DB read replicas) on CPU/request count; scale in when load drops.
Graceful degradation — When overloaded, serve reduced features or cached content instead of failing everything.
sequenceDiagram
participant U as Users
participant LB as Load Balancer
participant App as App (scaled)
participant Cache as Cache
U->>LB: Spike
LB->>App: Forward (within limit)
App->>Cache: Hit cache when possible
Cache-->>App: Reduce DB load
App-->>U: 200 or 429 if over limit
Design for spikes: stateless app tier, horizontal scaling, caching, and queues. Use rate limiting and circuit breakers to protect downstream. Plan for “viral” events with load tests and runbooks.