Partitioning splits data into subsets (e.g. by range or hash); it can be within one DB (tables/partitions) or across nodes. Sharding is horizontal partitioning across multiple DB instances: each shard holds a subset of data so load and storage are distributed.
Range — e.g. user_id 1–1M on shard 1, 1M–2M on shard 2. Simple; can create hot spots if key is sequential.
Hash — shard = hash(user_id) % N. Even distribution; range queries across shards are hard.
Directory — Lookup table maps key → shard. Flexible for rebalancing; extra lookup.
flowchart LR
subgraph Range["Range"]
R1[Shard A: A-M]
R2[Shard B: N-Z]
end
subgraph Hash["Hash"]
H1[hash(id) % 3 = 0]
H2[hash(id) % 3 = 1]
H3[hash(id) % 3 = 2]
end
Sharding increases write/read capacity and storage but adds complexity: cross-shard queries, rebalancing, and global uniqueness (e.g. IDs) need care. Use when a single node can’t hold the data or serve the load.