Advanced Topics
This module covers advanced MongoDB features including replica sets for high availability, security best practices, monitoring and performance optimization.
Replica Sets
A replica set is a group of MongoDB instances that maintain the same data for redundancy and high availability.
Replica Set Architecture
┌─────────────────────────────────────────────────┐
│ MongoDB Replica Set │
├─────────────────────────────────────────────────┤
│ │
│ PRIMARY SECONDARY 1 SECONDARY 2 │
│ (Read+Write) (Read Only) (Read Only) │
│ │
│ Sync → Replication Lag (usually < 1sec) │
│ │
│ Elections: If primary fails, secondaries elect │
│ new primary automatically (failover) │
│ │
└─────────────────────────────────────────────────┘
Setting Up a Replica Set
// Initialize replica set with 3 members (primary + 2 secondaries)
// In each mongod config:
// replication:
// replSetName: "rs0"
// Connect to primary and initiate:
rs.initiate({
_id: "rs0", // Replica set name
members: [
{ _id: 0, host: "mongo1:27017", priority: 1 }, // Primary (highest priority)
{ _id: 1, host: "mongo2:27017", priority: 0.5 }, // Secondary
{ _id: 2, host: "mongo3:27017", priority: 0.5 } // Arbiter (no data, votes only)
]
});
// Check replica set status
rs.status();
// Current primary:
db.hello();
// Returns: { primary: "mongo1:27017", ... }
Replication & Write Concern
// Data is written to primary, then replicated to secondaries
// Write Concern: How many replicas must ACK
db.collection.insertOne(
{ data: "..." },
{ writeConcern: { w: 1 } } // Return after primary ACKs (fast)
);
db.collection.insertOne(
{ data: "..." },
{ writeConcern: { w: "majority" } } // Return after majority ACKs (safe)
);
// With write concern "majority":
// 1. Client sends write to primary
// 2. Primary writes to memory + journal
// 3. Primary replicates to secondaries
// 4. Secondaries ACK
// 5. Primary returns ACK to client once majority ACKs received
// If primary crashes before majority ACKs, write is rolled back on primary recovery
// If write has "majority" confirmation, it survives any failure
Failover & Elections
// Secondaries continuously monitor primary
// If primary is down/unresponsive for electionTimeoutMillis (default 10s):
// Secondaries hold an election to choose new primary
// Votes:
// - Each member votes
// - Highest priority wins (if reachable)
// - Ties broken by oldest data
// After election: existing connections to old primary fail
// Applications must retry (drivers auto-retry usually)
rs.stepDown(); // Force primary to step down (for maintenance)
// After stepDown, primary becomes secondary and triggers election
// No writes possible during election (usually < 10 seconds)
Security
MongoDB security layers from network isolation to encryption and access control.
Authentication & Authorization
// Create user with password
db.createUser({
user: "alice",
pwd: "securePassword123",
roles: [
{ role: "read", db: "myapp" },
{ role: "readWrite", db: "myapp" },
{ role: "admin", db: "admin" } // Admin on admin DB
]
});
// Update user password
db.changeUserPassword("alice", "newPassword456");
// Grant additional role
db.grantRolesToUser(
"alice",
[{ role: "backup", db: "admin" }]
);
// Revoke role
db.revokeRolesFromUser(
"alice",
[{ role: "admin", db: "admin" }]
);
// Create custom role
db.createRole({
role: "dataAnalyst",
privileges: [
{
resource: { db: "analytics", collection: "reports" },
actions: ["find", "aggregate"]
}
],
roles: []
});
Built-in Roles
// Database Roles (per DB):
// - read: Read data (find, aggregate, listIndexes, etc.)
// - readWrite: Read + insert/update/delete
// - dbAdmin: Administrative tasks (index management, profiling, etc.)
// - userAdmin: User/role management
// Admin Roles (admin DB only):
// - root: Full admin access to all databases
// - admin: Full admin on admin DB
// - backup: Backup/restore
// - clusterAdmin: Cluster administration (replica sets, sharding)
// - dbAdminAnyDatabase: dbAdmin on all databases
// - readAnyDatabase: read on all databases
// - readWriteAnyDatabase: readWrite on all databases
// - userAdminAnyDatabase: User management on all databases
Network Security: TLS/SSL
// mongod.conf with TLS enabled:
// net:
// tls:
// mode: requireTLS
// certificateKeyFile: /etc/mongodb/certs/server.pem
// CAFile: /etc/mongodb/certs/ca.pem
// Connection with TLS from client:
mongosh "mongodb://alice:password@mongo1:27017/?ssl=true&sslCAFile=/path/to/ca.pem"
// Encrypt data in transit and at rest
// Also enables certificate-based authentication (mTLS)
Field-Level Encryption (Automatic)
// MongoDB Automatic Client-Side Encryption (ACSE)
// Encrypts fields before sending to server
const client = new MongoClient(uri, {
schemaMap: {
"myapp.users": {
"bsonType": "object",
"properties": {
"ssn": {
"encrypt": {
"keyId": [keyId],
"algorithm": "AEAD_AES_256_CBC_HMAC_SHA_512"
}
}
}
}
}
});
// When you write:
db.users.insertOne({
name: "Alice",
ssn: "123-45-6789" // Automatically encrypted before sending
});
// Encrypted data in MongoDB:
// { name: "Alice", ssn: BinData(...) }
// When you read:
let user = db.users.findOne({ name: "Alice" });
// SSN automatically decrypted by client
print(user.ssn); // "123-45-6789"
Monitoring & Performance
Monitor MongoDB health and optimize query performance.
Server Metrics
// Get server stats
db.serverStatus();
// Returns: {
// host: "mongo1:27017",
// version: "7.0.0",
// opcounters: { insert: 1000, query: 5000, update: 2000, delete: 100 },
// connections: { current: 50, totalCreated: 200 },
// memory: { resident: 512, virtual: 2048 },
// locks: { Global: { acquireCount: { r: 10000, w: 5000 } } },
// ...
// }
// Replica set status
rs.status();
// Shows each member: state (PRIMARY, SECONDARY, ARBITER),
// health, lastHeartbeat, syncSourceHost, replication lag
// Current operations
db.currentOp();
// Shows all running operations with details on duration, lock waiting, etc.
// Slow queries (profiling)
// Enable profiling:
db.setProfilingLevel(1, { slowms: 100 }); // Log ops > 100ms
// Query profiling data:
db.system.profile.find({ millis: { $gt: 100 } }).limit(10);
Explain for Query Performance
// Detailed query execution plan
let plan = db.users.find({ email: "alice@example.com" })
.explain("executionStats");
// Key metrics:
// - executionStages.stage: "IXSCAN" (good) vs "COLLSCAN" (bad)
// - executionStages.nReturned: Documents matched
// - executionStages.totalDocsExamined: Documents scanned
// - executionStats.totalKeysExamined: Index entries scanned
// - executionStats.executionTimeMillis: Time taken
// Rule of thumb:
// totalDocsExamined ≈ nReturned (efficient)
// totalDocsExamined >> nReturned (inefficient, need index)
if (plan.executionStats.executionStages.stage === "COLLSCAN") {
print("WARNING: Full collection scan! Create an index.");
}
Index Usage & Optimization
// Find which indexes are being used
db.collection.aggregate([
{ $indexStats: {} }
]);
// Returns: { name: "idx_email_1", accesses: { ops: 50000, since: Date } }
// Unused indexes:
db.collection.aggregate([
{
$indexStats: {
$match: { "accesses.ops": 0 }
}
}
]);
// Drop unused indexes:
db.users.dropIndex("idx_unused_1");
// Find slow queries and add indexes:
db.system.profile.find({ millis: { $gt: 100 } })
.sort({ millis: -1 })
.limit(10);
// For each slow query, explain it and add appropriate index
Connection Pooling & Concurrency
// Connection pool settings in driver:
// maxPoolSize: 100 // Max connections to maintain
// minPoolSize: 10 // Min connections to maintain
// maxIdleTimeMS: 45000 // Close idle connections after this
// waitQueueTimeoutMS: 5000 // Fail if can't get connection in 5s
// High concurrency best practices:
// 1. Use appropriate pool size (typically 50-100)
// 2. Use write/read preferences correctly
// 3. Avoid blocking operations (use async/await)
// 4. Monitor serverStatus.connections
// Example with Node.js driver:
const client = new MongoClient(uri, {
maxPoolSize: 100,
minPoolSize: 10
});
Schema Validation
// Create collection with JSON schema validation
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "name"],
properties: {
email: {
bsonType: "string",
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
},
name: {
bsonType: "string",
minLength: 1
},
age: {
bsonType: "int",
minimum: 18,
maximum: 150
},
createdAt: {
bsonType: "date"
}
},
additionalProperties: false
}
}
});
// Insert must match schema:
db.users.insertOne({ email: "alice@example.com", name: "Alice", age: 30 });
// OK
db.users.insertOne({ email: "invalid", name: "Bob" });
// Error: Document failed validation
Sharding (Horizontal Scaling)
Distribute data across multiple servers for massive scale.
┌──────────────────────────────────────────────────────────┐
│ Sharded MongoDB Cluster │
├──────────────────────────────────────────────────────────┤
│ │
│ Config Servers Router (mongos) │
│ (Shard balance, metadata) (Route queries) │
│ │ │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ SHARD 1 SHARD 2 SHARD 3 │
│ (Keys A-H) (Keys I-P) (Keys Q-Z) │
│ │
│ Each shard is a replica set │
│ Data distributed by shard key │
│ │
└──────────────────────────────────────────────────────────┘
Sharding Setup
// 1. Enable sharding on database
sh.enableSharding("myapp");
// 2. Create index on shard key (required)
db.users.createIndex({ email: 1 });
// 3. Shard collection
sh.shardCollection("myapp.users", { email: 1 });
// Ranges of email values go to different shards
// 4. Check shard distribution
db.users.getShardDistribution();
// Shows which shard holds what ranges of data
Shard Key Considerations
// Good shard key:
// - High cardinality (many unique values)
// - Good distribution (no hotspots)
// - Commonly queried (avoid queries that hit all shards)
// Bad shard key:
// - Low cardinality (few unique values) → imbalanced shards
// - Status field (e.g., "active", "inactive") → most data in 1 shard
// - Ascending date → new data always writes to 1 shard
// - Random UUID → every write hits different shard (inefficient)
// Example: Choose wisely
db.orders.createIndex({ customerId: 1 });
sh.shardCollection("myapp.orders", { customerId: 1 }); // ✅ Good: many customers, distributes well
db.orders.createIndex({ status: 1 });
sh.shardCollection("myapp.orders", { status: 1 }); // ❌ Bad: few statuses, data imbalanced
Summary
Replica Sets: - High availability via automatic failover - Replication lag: data eventually consistent across secondaries - Write concern controls how many replicas must ACK - Elections: secondaries elect new primary if primary fails
Security: - Authentication: Username/password + role-based authorization - Network: TLS/SSL for encryption in transit - Field encryption: Automatic client-side encryption for sensitive data - Audit: Monitor access with profiling and audit logs
Monitoring:
- db.serverStatus(): CPU, memory, connections, operations
- db.currentOp(): Active operations
- explain(): Query execution plan
- $indexStats: Index usage metrics
Performance: - Use explain() to verify index usage - Drop unused indexes - Create indexes for frequent queries (ESR rule) - Connection pooling for high concurrency - Schema validation to prevent invalid data
Sharding: - Horizontal scaling by distributing data across shards - Shard key determines data distribution - Good shard key: high cardinality, even distribution, commonly queried - Router (mongos) transparently routes queries to shards