Sharding & partitioning

Partitioning splits data into subsets (e.g. by range or hash); it can be within one DB (tables/partitions) or across nodes. Sharding is horizontal partitioning across multiple DB instances: each shard holds a subset of data so load and storage are distributed.

Shards = data split across nodes

flowchart TB App[Application] App --> S1[Shard 1\nuser_id 0-333] App --> S2[Shard 2\nuser_id 334-666] App --> S3[Shard 3\nuser_id 667-999] S1 --> D1[(DB1)] S2 --> D2[(DB2)] S3 --> D3[(DB3)]

Shard key strategies

Range — e.g. user_id 1–1M on shard 1, 1M–2M on shard 2. Simple; can create hot spots if key is sequential.
Hash — shard = hash(user_id) % N. Even distribution; range queries across shards are hard.
Directory — Lookup table maps key → shard. Flexible for rebalancing; extra lookup.

flowchart LR subgraph Range["Range"] R1[Shard A: A-M] R2[Shard B: N-Z] end subgraph Hash["Hash"] H1[hash(id) % 3 = 0] H2[hash(id) % 3 = 1] H3[hash(id) % 3 = 2] end

Sharding increases write/read capacity and storage but adds complexity: cross-shard queries, rebalancing, and global uniqueness (e.g. IDs) need care. Use when a single node can’t hold the data or serve the load.