03 · Normalization — Eliminate Redundant Data

Foundations · Topic 3 of 4

Why Normalize?

Redundant data causes update anomalies — inconsistencies that arise when you update one copy but miss others. Normalization structures your schema to have one authoritative place for each fact.

Normal Forms

1NF — First Normal Form

Each column holds atomic (indivisible) values
No repeating groups or arrays in a single column

-- ❌ Violates 1NF
customer_id | name  | phones
1           | Alice | "555-1234, 555-5678"

-- ✅ 1NF
customer_phones(customer_id, phone)

2NF — Second Normal Form

Must be in 1NF
Every non-key attribute is fully functionally dependent on the entire primary key (no partial dependencies)

3NF — Third Normal Form

Must be in 2NF
No transitive dependencies (non-key attribute depends on another non-key attribute)

BCNF — Boyce-Codd Normal Form

Stricter version of 3NF: every determinant is a candidate key

Denormalization

In high-read systems, intentional denormalization trades storage for query speed:

Precompute joins and store derived columns
Used heavily in BigQuery (columnar, denormalized star/snowflake)
Cassandra schema is designed query-first — tables are naturally denormalized

Cloud Comparisons

DB	Normalization Strategy
PostgreSQL	Full normalization + views for convenience
Spanner	Interleaved tables to co-locate related rows
BigQuery	Denormalized wide tables; nested STRUCT/ARRAY instead of joins
Cassandra	One table per query pattern; intentional duplication
MongoDB	Embed vs. reference tradeoff per access pattern
DynamoDB	Single-table design; all access patterns in one table