NoSQL & MongoDB

MongoDB is a document-oriented NoSQL database built for flexible schemas, horizontal scaling, and high-throughput workloads. This module introduces NoSQL concepts and explains where MongoDB fits in the data landscape.

Definitions & Core Concepts

NoSQL Databases

Definition: NoSQL (Not Only SQL) databases are non-relational data stores designed for distributed, high-volume workloads with flexible or schema-less data models.

What it means: - No fixed schema — add fields without migrations - Horizontal scaling — add more servers, not bigger ones - Designed for high availability with replication built-in - Trade strict ACID for performance and partition tolerance - Many types: Document, Key-Value, Column-Family, Graph, Search

Types of NoSQL Databases

Type	Examples	Best For
Document	MongoDB, Couchbase	Flexible JSON-like records, content, catalogs
Key-Value	Redis, DynamoDB	Sessions, caching, simple lookups
Column-Family	Cassandra, HBase	Time-series, IoT, wide-row analytics
Graph	Neo4j, Amazon Neptune	Relationships, social networks, fraud detection
Search	Elasticsearch, Solr	Full-text search, log analytics

MongoDB: Document Database

Definition: MongoDB stores data as BSON (Binary JSON) documents in collections, rather than rows in tables.

What it means: - Document ≈ a JSON object (like a row in a table, but flexible) - Collection ≈ a table (but no enforced schema) - Database ≈ a schema/namespace grouping collections - Documents can contain nested objects and arrays natively - Each document can have different fields (polymorphic)

MongoDB vs Relational Database

Aspect	Relational (PostgreSQL)	MongoDB
Data unit	Row in a table	BSON document
Schema	Fixed (DDL required)	Flexible (optional validation)
Joins	`JOIN` across tables	Embed or `$lookup` (use sparingly)
Transactions	Full ACID always	ACID (4.0+ replica set, 4.2+ sharded)
Scaling	Vertical (bigger server)	Horizontal (sharding)
Query Language	SQL	MQL (MongoDB Query Language)
Aggregation	GROUP BY, window functions	Aggregation Pipeline
Schema changes	`ALTER TABLE` (expensive)	Add field on next write (free)

Key Differences in Practice

1. No Fixed Schema

// SQL: must define schema first
// CREATE TABLE users (id INT, name VARCHAR(100), email VARCHAR(200));

// MongoDB: just insert — schema is inferred
db.users.insertOne({ name: "Alice", email: "alice@example.com", age: 30 });
db.users.insertOne({ name: "Bob", email: "bob@example.com", phone: "+1-555-0101" });
// Bob has 'phone', Alice doesn't — perfectly valid

2. No Joins — Embed Instead

// SQL: store addresses in a separate table, JOIN at query time
// SELECT u.name, a.city FROM users u JOIN addresses a ON u.id = a.user_id;

// MongoDB: embed address inside the user document
db.users.insertOne({
  name: "Alice",
  address: { city: "New York", zip: "10001" }  // embedded, no join needed
});
let user = db.users.findOne({ name: "Alice" });
print(user.address.city); // "New York" — single query, no join

3. Arrays are First-Class

// SQL: orders stored in separate table
// MongoDB: embed recent items directly in the order document
db.orders.insertOne({
  userId: ObjectId("..."),
  items: [
    { product: "Keyboard", qty: 1, price: 89.99 },
    { product: "Mouse",    qty: 2, price: 49.99 }
  ],
  total: 189.97
});

The CAP Theorem

The CAP Theorem states a distributed system can guarantee at most two of:

Consistency (C)

Every read returns the most recent write.
Example: You deposit $100; your next balance check always shows the updated amount.

Availability (A)

The system responds to every request, even if some nodes are down.
Example: Your app works even when a data center goes offline.

Partition Tolerance (P)

The system keeps running despite network partitions between nodes.
Required in any real distributed system — networks do fail.

MongoDB's Position

MongoDB is CP by default: - Writes go to the primary (single source of truth) - Reads from primary are always consistent - If the primary fails, a brief write outage occurs during election (~10s) - Reads from secondaries (with readPreference: secondary) can be stale → behaves like AP

Tunable via readConcern / writeConcern: - writeConcern: w:majority + readConcern: majority = strong consistency - readPreference: secondary + readConcern: local = higher availability, possible stale reads

When to Use MongoDB

✅ Good Use Cases

Content management: Articles, product catalogs, user profiles (flexible schema)
Real-time analytics: Event logging, clickstreams, activity feeds
Mobile & IoT: Variable device data, sensor readings
E-commerce: Product catalogs with varied attributes, order management
Gaming: Player state, leaderboards, session data
Search & recommendations: Faceted search, personalization

❌ Not Suitable For

Complex multi-table joins: Highly normalized relational data with many foreign keys
Strict ACID across many collections: Financial ledgers where partial failures are catastrophic
Already-relational data: If your data is naturally tabular and highly normalized, stay with SQL
Small datasets: Under a few GB, a simple PostgreSQL or SQLite is easier to operate

Real-World Examples

Example 1: E-Commerce Product Catalog

Products vary in attributes (a book has ISBN, a TV has resolution). MongoDB's flexible schema handles this naturally without NULL columns:

// Book
{ type: "book", name: "MongoDB Guide", isbn: "978-...", pages: 514, price: 39.99 }
// TV
{ type: "electronics", name: "OLED TV", resolution: "4K", hz: 120, price: 1299.99 }

Example 2: What NOT to Do

❌ WRONG: Using MongoDB for a bank's double-entry ledger
   - Every debit must match a credit
   - Multi-document ACID needed for every transaction
   - Better: PostgreSQL with proper constraints

✅ RIGHT: Using MongoDB for transaction audit logs
   - Append-only event documents
   - Flexible event schema per transaction type
   - Fast writes, easy time-based querying

Architecture Diagram

graph TD A["Application"] --> B["MongoDB Driver"] B --> C["mongos / Connection"] C --> D["Primary Node\n(Writes + Reads)"] D -->|"Replication\n(oplog)"| E["Secondary 1\n(Read scaling)"] D -->|"Replication\n(oplog)"| F["Secondary 2\n(HA failover)"] G["CAP Theorem"] --> H["Consistency"] G --> I["Availability"] G --> J["Partition\nTolerance"] D -->|"Primary = CP"| H D -->|"Secondary read = AP"| I

Summary

MongoDB stores flexible BSON documents in schema-free collections — no ALTER TABLE needed
No joins: embed frequently-accessed related data; use $lookup sparingly for runtime joins
CAP: MongoDB is CP by default (primary reads); tunable toward AP with secondary reads
Best for: flexible schemas, high write throughput, nested/array data, horizontal scaling
Not for: complex multi-table joins, small relational datasets, strict ACID everywhere

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search