04 · Database Indexing — Speed Up Read Query Performance
Foundations · Topic 4 of 4
What is an Index?
An index is a data structure that allows the database to find rows without scanning the entire table. Trading write overhead and storage for read speed.
How Indexes Work (B-Tree)
Most relational databases use a B-Tree index by default. See B-Tree vs LSM Tree for internals.
graph TD
Root["Root: [50]"] --> L["Left: [10, 20, 30]"]
Root --> R["Right: [60, 70, 80]"]
L --> D1["Data Pages"]
R --> D2["Data Pages"]
Index Types
| Type | Use Case | Example |
|---|---|---|
| B-Tree | Equality and range queries | WHERE age > 30 |
| Hash | Exact equality only | WHERE id = 42 |
| GIN | Full-text search, JSONB, arrays | WHERE doc @@ 'keyword' |
| BRIN | Very large sorted tables (time-series) | WHERE created_at > '2024-01-01' |
| Composite | Multi-column queries | WHERE (user_id, created_at) |
| Covering | Index-only scans | INCLUDE (email) |
| Partial | Filtered subset | WHERE active = true |
Index Design Rules
Column order matters for composite indexes
For (a, b, c) — queries on a, (a, b), and (a, b, c) use the index. Queries on b alone do not.
-- ✅ Uses index (a, b)
SELECT * FROM orders WHERE user_id = 1 AND status = 'active';
-- ❌ Won't use index (a, b) — leading column missing
SELECT * FROM orders WHERE status = 'active';
Cloud Index Implementations
- B-Tree (default), GIN, GiST, BRIN, Hash
EXPLAIN ANALYZEto verify index usage- Partial indexes:
CREATE INDEX ON orders(user_id) WHERE status = 'active'
- Secondary indexes stored as separate interleaved tables
INTERLEAVE IN PARENTfor locality- Storing columns in index to avoid back-joins:
STORING (email)
- GSI (Global Secondary Index): different partition key, eventual consistency
- LSI (Local Secondary Index): same partition key, different sort key, strongly consistent
- Sparse indexes: only items with the attribute appear in the index
- Primary index = partition key (hash ring distribution)
- Secondary indexes: avoid on high-cardinality columns
- Prefer materialized views or duplicate tables for different access patterns
- B-Tree indexes, compound, text, geospatial, hashed
explain()to check index usage- Covered queries (all returned fields in index)
- No traditional indexes — uses columnar storage + partition pruning + clustering
- Partition by date/timestamp column; cluster by frequently filtered columns
Common Pitfalls
Too many indexes
Every index slows down writes. Audit with pg_stat_user_indexes (Postgres) for unused indexes.
Index on low-cardinality column
An index on a boolean column is rarely useful — the optimizer will prefer a full scan.