Read replicas & write scaling

A primary (master) accepts writes and replicates them to read replicas. Reads can be served from replicas to scale read capacity; writes go to the primary. Write scaling usually means sharding the write path (multiple primaries, each owning a shard) since a single primary has a write limit.

Primary + read replicas

flowchart TB W[Writes] --> P[(Primary)] P --> |Replication| R1[(Replica 1)] P --> |Replication| R2[(Replica 2)] R1[Replica 1] --> Read1[Reads] R2[Replica 2] --> Read2[Reads]

Replication lag

Replicas are eventually consistent: there is a short delay (lag) between write on primary and visibility on replica. Read-after-write consistency requires reading from primary or waiting for replica to catch up.

sequenceDiagram participant C as Client participant P as Primary participant R as Replica C->>P: Write P-->>C: OK C->>R: Read (may not see write yet - lag) Note over R: Replication lag: ms to seconds
GoalApproach
Scale readsAdd read replicas; route SELECT to replicas
Scale writesShard (multiple primaries); each shard takes a subset of writes
High availabilityFailover: promote replica to primary if primary fails

Use read replicas when read load is high and slight staleness is acceptable. Use sharding (multiple primaries) when write load exceeds one primary’s capacity.