03 · Normalization — Eliminate Redundant Data
Foundations · Topic 3 of 4
Why Normalize?
Redundant data causes update anomalies — inconsistencies that arise when you update one copy but miss others. Normalization structures your schema to have one authoritative place for each fact.
Normal Forms
1NF — First Normal Form
- Each column holds atomic (indivisible) values
- No repeating groups or arrays in a single column
-- ❌ Violates 1NF
customer_id | name | phones
1 | Alice | "555-1234, 555-5678"
-- ✅ 1NF
customer_phones(customer_id, phone)
2NF — Second Normal Form
- Must be in 1NF
- Every non-key attribute is fully functionally dependent on the entire primary key (no partial dependencies)
3NF — Third Normal Form
- Must be in 2NF
- No transitive dependencies (non-key attribute depends on another non-key attribute)
BCNF — Boyce-Codd Normal Form
- Stricter version of 3NF: every determinant is a candidate key
Denormalization
In high-read systems, intentional denormalization trades storage for query speed:
- Precompute joins and store derived columns
- Used heavily in BigQuery (columnar, denormalized star/snowflake)
- Cassandra schema is designed query-first — tables are naturally denormalized
Cloud Comparisons
| DB | Normalization Strategy |
|---|---|
| PostgreSQL | Full normalization + views for convenience |
| Spanner | Interleaved tables to co-locate related rows |
| BigQuery | Denormalized wide tables; nested STRUCT/ARRAY instead of joins |
| Cassandra | One table per query pattern; intentional duplication |
| MongoDB | Embed vs. reference tradeoff per access pattern |
| DynamoDB | Single-table design; all access patterns in one table |