Cassandra Interview Prep Guide (Q&A)
This guide provides concise, high-yield Q&A for quick revision. For deeper reading, see the referenced theory topics.
1. NoSQL and Cassandra Basics
Q: What is NoSQL?
A: Non-relational, schema-less databases designed for distributed, scalable workloads. Read more
Q: How does Cassandra differ from RDBMS?
A: Decentralized, flexible schema, optimized for high write throughput and horizontal scaling. No joins or multi-row transactions. Read more
Q: What is the CAP theorem?
A: Distributed systems can only guarantee two of Consistency, Availability, and Partition tolerance. Cassandra is AP with tunable consistency. Read more
2. Core Cassandra Concepts
Q: What is a partition key?
A: Determines data distribution across nodes. Good choice ensures even distribution and avoids hot spots. Read more
Q: What are clustering columns?
A: Define row order within a partition, enabling efficient range queries. Read more
Q: What is replication factor?
A: Number of data copies per data center. Read more
Q: What are consistency levels?
A: Control how many replicas must acknowledge a read/write. Read more
3. Data Modeling and Querying
Q: Why query-first modeling?
A: Tables are designed for specific queries to ensure fast, predictable reads. Read more
Q: When do you denormalize?
A: To support multiple access patterns efficiently, create separate tables for each query. Read more
Q: What is the ALLOW FILTERING anti-pattern?
A: Can cause full table scans and poor performance. Read more
4. Indexes and Materialized Views
Q: When use a secondary index?
A: Only for low-cardinality columns and small partitions. Read more
Q: What are materialized views?
A: Denormalized copies for different query patterns, but can lag or become inconsistent. Read more
5. Consistency, LWT, and Batching
Q: What is a lightweight transaction (LWT)?
A: Paxos-based, provides linearizable consistency for conditional updates. Slower than normal writes. Read more
Q: When use batches?
A: For atomic multi-table writes. Avoid large batches. Read more
6. TTL, Tombstones, and Deletes
Q: What is TTL?
A: Automatically expires data after a set period. Read more
Q: What are tombstones?
A: Markers for deleted/expired data; too many degrade performance. Read more
7. Aggregation, Filtering, and Counters
Q: Why is aggregation limited?
A: Only efficient within a partition; cluster-wide requires a full scan. Read more
Q: How do counters work?
A: Distributed, eventually consistent, can be inaccurate under contention. Read more
8. Advanced Topics
Q: What is a hot partition?
A: Receives disproportionate traffic; avoid with high-cardinality, well-distributed keys. Read more
Q: How does multi-DC replication work?
A: Data is replicated to multiple data centers using NetworkTopologyStrategy. Read more
Q: How do you monitor and repair a cluster?
A: Use nodetool, logs, and monitoring tools. Read more
For deeper explanations, diagrams, and examples, see the linked theory topics.