Handling duplicate records

Duplicates can come from retries, double submissions, or multiple sources claiming the same entity. Handling them means: preventing duplicates (unique constraints, idempotency keys), detecting them (dedup keys, matching rules), and resolving them (merge, keep one, flag for review).

Prevention first

flowchart LR R[Request] --> K{Idempotency key\nseen?} K -->|No| P[Process & store key] K -->|Yes| Return[Return stored result] P --> DB[(DB)] DB --> U[UNIQUE on business key]

Strategies

StrategyHow
Unique constraintDB UNIQUE on (user_id, external_id) so duplicate insert fails
Idempotency keyStore (key → result); retries return same result, no second insert
UpsertINSERT ... ON CONFLICT DO UPDATE so “duplicate” becomes update
Dedup in batchBefore insert, query or hash to detect existing; skip or merge

Detect and resolve

flowchart TB In[Incoming record] --> Match{Match existing?} Match -->|No| Insert[Insert] Match -->|Yes| Decide{Policy} Decide -->|Keep first| Ignore[Ignore new] Decide -->|Merge| Merge[Merge fields] Decide -->|Flag| Flag[Flag for review]

Define a business identity (e.g. email, external_id, composite key) and enforce it with constraints or idempotency. For existing duplicates, run one-off dedup (merge/delete) and then enforce uniqueness going forward.