Skip to content

04 · Database Indexing — Speed Up Read Query Performance

Foundations · Topic 4 of 4


What is an Index?

An index is a data structure that allows the database to find rows without scanning the entire table. Trading write overhead and storage for read speed.


How Indexes Work (B-Tree)

Most relational databases use a B-Tree index by default. See B-Tree vs LSM Tree for internals.

graph TD
    Root["Root: [50]"] --> L["Left: [10, 20, 30]"]
    Root --> R["Right: [60, 70, 80]"]
    L --> D1["Data Pages"]
    R --> D2["Data Pages"]

Index Types

Type Use Case Example
B-Tree Equality and range queries WHERE age > 30
Hash Exact equality only WHERE id = 42
GIN Full-text search, JSONB, arrays WHERE doc @@ 'keyword'
BRIN Very large sorted tables (time-series) WHERE created_at > '2024-01-01'
Composite Multi-column queries WHERE (user_id, created_at)
Covering Index-only scans INCLUDE (email)
Partial Filtered subset WHERE active = true

Index Design Rules

Column order matters for composite indexes

For (a, b, c) — queries on a, (a, b), and (a, b, c) use the index. Queries on b alone do not.

-- ✅ Uses index (a, b)
SELECT * FROM orders WHERE user_id = 1 AND status = 'active';

-- ❌ Won't use index (a, b) — leading column missing
SELECT * FROM orders WHERE status = 'active';

Cloud Index Implementations

  • B-Tree (default), GIN, GiST, BRIN, Hash
  • EXPLAIN ANALYZE to verify index usage
  • Partial indexes: CREATE INDEX ON orders(user_id) WHERE status = 'active'
  • Secondary indexes stored as separate interleaved tables
  • INTERLEAVE IN PARENT for locality
  • Storing columns in index to avoid back-joins: STORING (email)
  • GSI (Global Secondary Index): different partition key, eventual consistency
  • LSI (Local Secondary Index): same partition key, different sort key, strongly consistent
  • Sparse indexes: only items with the attribute appear in the index
  • Primary index = partition key (hash ring distribution)
  • Secondary indexes: avoid on high-cardinality columns
  • Prefer materialized views or duplicate tables for different access patterns
  • B-Tree indexes, compound, text, geospatial, hashed
  • explain() to check index usage
  • Covered queries (all returned fields in index)
  • No traditional indexes — uses columnar storage + partition pruning + clustering
  • Partition by date/timestamp column; cluster by frequently filtered columns

Common Pitfalls

Too many indexes

Every index slows down writes. Audit with pg_stat_user_indexes (Postgres) for unused indexes.

Index on low-cardinality column

An index on a boolean column is rarely useful — the optimizer will prefer a full scan.