Skip to content

03 · Normalization — Eliminate Redundant Data

Foundations · Topic 3 of 4


Why Normalize?

Redundant data causes update anomalies — inconsistencies that arise when you update one copy but miss others. Normalization structures your schema to have one authoritative place for each fact.


Normal Forms

1NF — First Normal Form

  • Each column holds atomic (indivisible) values
  • No repeating groups or arrays in a single column
-- ❌ Violates 1NF
customer_id | name  | phones
1           | Alice | "555-1234, 555-5678"

-- ✅ 1NF
customer_phones(customer_id, phone)

2NF — Second Normal Form

  • Must be in 1NF
  • Every non-key attribute is fully functionally dependent on the entire primary key (no partial dependencies)

3NF — Third Normal Form

  • Must be in 2NF
  • No transitive dependencies (non-key attribute depends on another non-key attribute)

BCNF — Boyce-Codd Normal Form

  • Stricter version of 3NF: every determinant is a candidate key

Denormalization

In high-read systems, intentional denormalization trades storage for query speed:

  • Precompute joins and store derived columns
  • Used heavily in BigQuery (columnar, denormalized star/snowflake)
  • Cassandra schema is designed query-first — tables are naturally denormalized

Cloud Comparisons

DB Normalization Strategy
PostgreSQL Full normalization + views for convenience
Spanner Interleaved tables to co-locate related rows
BigQuery Denormalized wide tables; nested STRUCT/ARRAY instead of joins
Cassandra One table per query pattern; intentional duplication
MongoDB Embed vs. reference tradeoff per access pattern
DynamoDB Single-table design; all access patterns in one table