Testing Strategies — Deep Dive
Level: Intermediate Pre-reading: 09 · Deployment & Infrastructure
The Testing Pyramid
The testing pyramid is a framework for balancing test types by speed, cost, and coverage:
/\
/ \
/E2E \ [Slow, Expensive, Few]
/----- \
/Contract\ [Medium, Moderate, More]
/----------\
/ Integration\ [Faster, Cheaper, More]
/--------------\
/ Unit Tests \ [Fast, Cheap, Many]
/------------------\
| Type | Speed | Cost | Coverage | Ratio |
|---|---|---|---|---|
| Unit | < 100ms | Cheap | Narrow (1 function) | 70% |
| Integration | 100ms–1s | Moderate | Medium (service) | 20% |
| Contract | < 1s | Moderate | API boundaries | 5% |
| E2E | 1s–10s | Expensive | Full system | 5% |
Anti-Pattern: Ice Cream Cone
This inverts the pyramid: expensive tests, slow feedback, high maintenance. Result: test suite becomes a burden; people skip tests locally; quality drops.
Unit Testing
Definition: Test a single function or class in isolation.
Characteristics
- Scope: One function, one behavior
- Speed: < 100ms per test
- Isolation: Mock all dependencies; no I/O, no network
- Framework: JUnit, Mockito (Java); Jest (JavaScript); pytest (Python)
Example: Order Service Discount Calculator
class DiscountCalculatorTest {
@Test
void applyDiscount_validCustomer_returnsDiscountedPrice() {
// Arrange
DiscountCalculator calc = new DiscountCalculator();
double price = 100.0;
Customer customer = new Customer("gold", 5); // gold tier, 5 years
// Act
double result = calc.applyDiscount(price, customer);
// Assert
assertEquals(85.0, result); // 15% off for gold tier
}
@Test
void applyDiscount_newCustomer_noDiscount() {
DiscountCalculator calc = new DiscountCalculator();
Customer customer = new Customer("bronze", 0);
assertEquals(100.0, calc.applyDiscount(100.0, customer));
}
}
Best Practices
- One assertion per test (or logically related assertions)
- AAA pattern: Arrange, Act, Assert
- Descriptive names:
applyDiscount_validCustomer_returnsDiscountedPrice()nottest1() - No test interdependencies: Each test should pass in isolation
- Mock external dependencies: Database, HTTP clients, message queues
When NOT to Unit Test
- Trivial getters/setters (unless enforcing validation)
- Code that only configures frameworks (Spring beans)
- Code already covered by integration tests
Integration Testing
Definition: Test multiple components working together; includes I/O and slow operations.
Characteristics
- Scope: Service-to-database, service-to-service via HTTP
- Speed: 100ms–1s per test
- Setup: Real database, in-memory test database, or testcontainers
- Framework: JUnit + Testcontainers (Java); pytest fixtures (Python); Jest + mock HTTP (JavaScript)
Example: Order Service Integration Test
@SpringBootTest
@Testcontainers
class OrderServiceIntegrationTest {
@Autowired
private OrderService orderService;
@Container
static PostgreSQLContainer<?> postgres =
new PostgreSQLContainer<>("postgres:15");
@Test
void createOrder_validData_persistsToDatabase() throws Exception {
// Arrange
OrderRequest request = new OrderRequest("PROD-123", 5);
// Act
Order order = orderService.createOrder(request);
// Assert
assertNotNull(order.getId());
assertEquals("PROD-123", order.getProductId());
// Verify persisted
Order fetched = orderRepository.findById(order.getId()).orElseThrow();
assertEquals(5, fetched.getQuantity());
}
@Test
void createOrder_insufficientInventory_throwsException() {
OrderRequest request = new OrderRequest("OUT-OF-STOCK", 1000);
assertThrows(InsufficientInventoryException.class,
() -> orderService.createOrder(request));
}
}
Testcontainers Benefits
- Real database: Postgres, MySQL, Redis, Kafka in Docker
- Parallel execution: Each test gets fresh container instance
- No external test DB: Safer; no data pollution across teams
- Auto-cleanup: Container stopped and removed after test
Contract Testing (Consumer-Driven)
Definition: Test the API contract between services before integration.
Characteristics
- Scope: Service A expects Service B to return specific JSON shape
- Speed: < 1s per contract
- Tool: Pact (JVM, JavaScript, Python, Go, .NET)
- Key insight: Consumer defines the contract; provider proves compliance
Problem Solved
Without contract tests:
Timeline:
Day 1: Service A deployed; calls Payment API
Day 2: Payment team refactors response: {price} → {amount}
Day 3: A's calls to Payment start failing in prod 🔥
With contract tests:
Timeline:
Day 1: A and Payment define contract (pact)
Day 2: Payment refactors; Pact tests fail in CI ✓ Caught!
Day 2b: Payment team updates contract; A's PR updated
Day 3: Both deployed; no surprises
Example: Order Service ↔ Payment Service Contract
Consumer (Order Service) defines:
@ExtendWith(PactConsumerTestExt.class)
@PactTestFor(providerName = "PaymentService", port = "8888")
class PaymentServicePactTest {
@Pact(consumer = "OrderService")
public V4Pact createPact(PactBuilder builder) {
return builder
.uponReceiving("a payment request")
.path("/api/v1/payments")
.method("POST")
.body(json()
.stringValue("orderId", "ORD-123")
.numberValue("amount", 99.99)
)
.willRespondWith()
.status(200)
.body(json()
.stringValue("transactionId", "TXN-abc")
.stringValue("status", "SUCCESS")
)
.toPact();
}
@Test
void orderService_callsPayment_succeeds(MockServer mockServer) {
PaymentClient client = new PaymentClient(mockServer.getUrl());
PaymentResponse resp = client.charge("ORD-123", 99.99);
assertEquals("SUCCESS", resp.getStatus());
}
}
Provider (Payment Service) verifies:
# Run in Payment Service CI
mvn pact:verify
# Loads pact from consumer; verifies Payment API satisfies it
When to Use Contract Tests
- Multiple teams; service A and B evolve independently
- Preventing silent API breaking changes
- Building confidence in service boundaries
End-to-End (E2E) Testing
Definition: Test the full user journey through the entire system (UI → all backends → database).
Characteristics
- Scope: Browser + all services + real databases
- Speed: 1s–10s per test (slow!)
- Tool: Selenium, Cypress, Playwright (browser automation)
- Cost: Expensive to maintain; brittle (UI changes break tests)
Example: E2E Checkout Flow
describe('Checkout flow', () => {
it('User completes purchase end-to-end', async () => {
// Navigate to store
await page.goto('https://store.example.com');
// Search for product
await page.fill('[data-testid="search"]', 'laptop');
await page.click('button:has-text("Search")');
// Add to cart
await page.click('button:has-text("Add to Cart")');
// Checkout
await page.click('button:has-text("Proceed to Checkout")');
// Enter payment
await page.fill('[data-testid="card-number"]', '4111-1111-1111-1111');
await page.fill('[data-testid="expiry"]', '12/26');
await page.click('button:has-text("Pay Now")');
// Verify order confirmation
await expect(page).toHaveURL(/\/order-confirmation\/\d+/);
await expect(page.locator('text=Order Confirmed')).toBeVisible();
});
});
Best Practices
- Minimize E2E tests: Only happy path + critical flows
- Use test data: Pre-seed database; don't rely on manual setup
- Parallel execution: Spin up multiple browser instances
- Retry flaky tests: Network hiccups happen; retry 2–3 times
- Screenshots on failure: Capture UI state for debugging
Anti-Patterns
- E2E tests for every code path (too slow; use unit tests)
- Tests dependent on UI HTML structure (use data-testid attributes)
- Tests that rely on exact timing (add waits for element visibility)
Load Testing
Definition: Verify system behaves correctly under high load (high RPS, high concurrency).
Types
| Type | Load | Purpose | Tool |
|---|---|---|---|
| Load Test | Realistic traffic volume | Baseline perf; identify bottleneck | JMeter, Locust, k6 |
| Stress Test | Push until breaking point | Find failure threshold | Same tools |
| Spike Test | Sudden load increase | Test auto-scaling response | Same tools |
| Soak Test | Sustained load for hours | Detect memory leaks, connection pool exhaustion | Same tools |
Example: JMeter Load Test
Test Plan:
- Ramp-up: 0–100 concurrent users over 5 min
- Sustained: 100 users for 10 min
- Ramp-down: 100–0 over 2 min
Assertions:
- P95 latency < 500ms
- Error rate < 0.1%
- Throughput > 500 RPS
Endpoints:
- GET /api/products (40%)
- GET /api/products/{id} (40%)
- POST /api/orders (20%)
Observations
Results:
Throughput: 400 RPS
P95 latency: 800ms ← exceeds 500ms target
Bottleneck: Database connection pool (max 20, depleted at ~300 RPS)
Action: Increase pool size from 20 → 50; re-test
Post-fix:
Throughput: 600 RPS ✓
P95 latency: 300ms ✓
Error rate: 0.05% ✓
Tools Comparison
| Tool | Best For | Learning Curve |
|---|---|---|
| JMeter | Enterprise; complex scenarios | High |
| Locust | Pythonic; large-scale | Medium |
| k6 | Developer-friendly; cloud-native | Low |
| Artillery | Quick baseline tests | Low |
Chaos Engineering
Definition: Intentionally inject failures to verify system handles them gracefully.
Principles
- Steady state: Define normal system behavior (latency, error rate)
- Hypothesis: Assume system remains in steady state despite failure
- Experiment: Inject failure (kill pod, add network latency)
- Observe: Measure if steady state is maintained
- Learn: If hypothesis breaks, fix it
Example: Chaos Experiment — Pod Crash
Hypothesis: If one pod crashes, traffic failover to remaining replicas; error rate stays < 0.1%.
Experiment:
# Tool: Gremlin, Chaos Monkey, Kyverno
1. Baseline: Order service has 3 replicas, P95 latency 100ms, error rate 0.001%
2. Kill 1 pod
3. Monitor: Does error rate stay < 0.1%? Does latency spike > 200ms?
4. Observation: Error rate jumps to 5% for 10 seconds ✗ Hypothesis broken
5. Root cause: No graceful shutdown (SIGTERM handler); requests drop mid-flight
6. Fix: Add SIGTERM handler; drain in-flight requests before exit
7. Retest: Error rate stays 0.001% ✓ Hypothesis confirmed
Chaos Experiments to Run (Order of Importance)
| Experiment | What Breaks | Fix |
|---|---|---|
| Kill random pod | Graceful shutdown, PDB settings | Add SIGTERM handler; update PodDisruptionBudget |
| Latency injection (500ms) | Timeouts, cascading failure | Increase timeout; add bulkhead/circuit breaker |
| Package loss (10%) | Network retries, connection pooling | Tune retry backoff; increase pool size |
| Database slowness (5s queries) | Query queue, thread starvation | Add query timeout; optimize slow queries |
| Service dependency unavailable | Fallback logic | Implement fallback; test it regularly |
Tools
| Tool | Scope | Ease |
|---|---|---|
| Gremlin | Cloud-agnostic; SaaS dashboard | Easy; cost |
| Chaos Monkey | AWS-native; random termination | Easy; basic |
| Kyverno | Kubernetes-native; policy-based | Medium; free |
| Locust | Load + chaos in code | Medium; flexible |
| Pumba | Docker; random container killing | Easy; local |
Test Coverage Metrics
Code coverage (% of lines executed by tests) is a useful signal but not a goal:
| Coverage | Interpretation |
|---|---|
| < 30% | Tests are afterthought; likely missing critical paths |
| 30–60% | Decent; focus on critical business logic paths |
| 60–80% | Good; add E2E for critical user flows; don't obsess over edge cases |
| > 90% | Diminishing returns; focus on mutation testing instead |
Mutation Testing
Instead of "did we run this line?", ask "does changing this line break a test?"
// Original
public int calculateDiscount(int years) {
return years > 5 ? 10 : 0;
}
// Mutation 1
public int calculateDiscount(int years) {
return years > 4 ? 10 : 0; // Changed 5 → 4
}
// Mutation 2
public int calculateDiscount(int years) {
return years >= 5 ? 10 : 0; // Changed > to >=
}
If tests don't catch these mutations, coverage is false confidence.
Tool: PIT (Java), Stryker (JavaScript).
TDD vs BDD
Test-Driven Development (TDD)
Red-Green-Refactor cycle:
- Red: Write a failing test for the feature you want to build
- Green: Write minimal code to make the test pass
- Refactor: Improve code quality without changing behavior
Example: TDD for Discount Calculator
// Step 1: Red — Write test first (test fails)
@Test
void applyDiscount_goldCustomer_returns15Percent() {
DiscountCalculator calc = new DiscountCalculator();
Customer customer = new Customer("gold", 5);
double result = calc.applyDiscount(100.0, customer);
assertEquals(85.0, result); // Test fails: method doesn't exist yet
}
// Step 2: Green — Write minimal code to pass
public class DiscountCalculator {
public double applyDiscount(double price, Customer customer) {
if ("gold".equals(customer.getTier())) {
return price * 0.85; // Minimal code: hardcoded for gold
}
return price;
}
}
// Step 3: Refactor — Add more tests; improve implementation
@Test
void applyDiscount_silverCustomer_returns10Percent() {
// Test for silver tier
}
// Refactor: Extract to method
private double getDiscount(String tier) {
return switch(tier) {
case "gold" -> 0.15;
case "silver" -> 0.10;
default -> 0.0;
};
}
Benefits of TDD:
- Forces you to think about API design before implementation
- 100% test coverage naturally
- Refactoring confidence: tests catch regressions
- Documents expected behavior via tests
Behavior-Driven Development (BDD)
BDD extends TDD by writing tests in business language, not code language.
Key principle: Tests describe behavior (what the system should do), not implementation (how it works).
Format: Given-When-Then (Gherkin syntax)
Feature: Order checkout discount
Scenario: Gold customer gets 15% discount
Given a gold-tier customer with 5 years tenure
When they checkout with $100 order
Then they pay $85
Scenario: New customer gets no discount
Given a new customer with no history
When they checkout with $100 order
Then they pay $100
Automated with Cucumber/Behave/Gherkin:
public class OrderSteps {
private Customer customer;
private double totalPrice;
private double discountedPrice;
@Given("a gold-tier customer with {int} years tenure")
public void createGoldCustomer(int years) {
customer = new Customer("gold", years);
}
@When("they checkout with ${double} order")
public void checkout(double price) {
totalPrice = price;
DiscountCalculator calc = new DiscountCalculator();
discountedPrice = calc.applyDiscount(price, customer);
}
@Then("they pay ${double}")
public void verifyPrice(double expected) {
assertEquals(expected, discountedPrice);
}
}
BDD Benefits:
- Business-readable: Non-technical stakeholders understand tests
- Living documentation: Tests describe current behavior
- Collaboration: Developers, QA, and product managers write scenarios together
- Catches misunderstandings: Ambiguities surface in Given-When-Then discussions
TDD vs BDD:
| Aspect | TDD | BDD |
|---|---|---|
| Focus | How to implement | What behavior matters |
| Language | Code (Assert, etc) | Business English (Gherkin) |
| Test level | Unit/Integration | Scenario-based (varies) |
| Audience | Developers | Developers + QA + Business |
| Granularity | Fine (single function) | Coarse (user story) |
Fundamental Testing Types (By Scope)
The testing pyramid shows types (unit, integration, E2E). But there are other dimensions:
Component Testing
Definition: Test a single component (service, module, class cluster) in isolation.
Scope: Larger than unit; smaller than integration.
Example: Payment Service Component Test
@SpringBootTest
class PaymentServiceComponentTest {
@Autowired
private PaymentService paymentService;
@MockBean
private PaymentGatewayClient gatewayClient; // Mock external API
@Autowired
private PaymentRepository repository;
@Test
void processPayment_validCard_recordsTransaction() {
// Test the entire Payment Service; external calls mocked
when(gatewayClient.charge(any())).thenReturn(ChargeResponse.success());
Payment payment = paymentService.process(
new PaymentRequest("4111-1111-1111-1111", 99.99)
);
assertNotNull(payment.getId());
verify(repository).save(any()); // Verify DB interaction
}
}
What's tested:
- Service logic: discount logic + payment flow
- Database: transactions are persisted
- Mocked: external payment gateway
What's NOT tested:
- How payment gateway actually works (mocked)
- How other services integrate (mocked)
Functional Testing
Definition: Verify that a feature works as specified; any test that checks functionality qualifies.
Scope: Can be unit, component, integration, or E2E.
Example: Functional Test for Order Discount
@Test
void orderCheckout_appliesLoyaltyDiscount_correctly() {
// Functional: Does the discount feature work end-to-end?
// Setup
Customer customer = new Customer("gold", 5);
Order order = new Order(100.0);
// Execute
OrderProcessor processor = new OrderProcessor();
double finalPrice = processor.checkout(customer, order);
// Verify
assertEquals(85.0, finalPrice); // 15% discount applied
}
Key point: Functional testing doesn't care about implementation details; only that the feature works.
Regression Testing
Definition: Verify that changes don't break existing features.
Scope: Usually re-run existing test suite after code changes.
Example:
You add a new feature: "Silver customers get 10% discount"
Regression tests:
✓ Gold customers still get 15% (didn't break)
✓ New customers get 0% (didn't break)
✓ Silver customers get 10% (new feature)
When to run: After every code change, before release.
Tools: Automated test suites (unit, integration, E2E).
Smoke Testing
Definition: Quick sanity check; verify the system is basically working after deployment.
Scope: Small subset of critical paths; fast to run.
Example: Smoke Tests for E-commerce
# Smoke test suite (runs in < 2 minutes)
POST /api/login → 200 OK
GET /api/products → 200 OK (not empty)
POST /api/orders → 201 OK (order created)
GET /api/orders/{id} → 200 OK (order retrieved)
When to run: After deploying to staging/prod; before running full test suite.
Typical time: < 5 minutes for entire smoke test suite.
Example in code:
@SpringBootTest
class SmokeTests {
@Autowired
private WebTestClient webClient;
@Test
void systemIsUp_returnsHealthOK() {
webClient.get()
.uri("/actuator/health")
.exchange()
.expectStatus().isOk();
}
@Test
void canCreateOrder() {
webClient.post()
.uri("/api/orders")
.bodyValue(new CreateOrderRequest("PROD-123", 5))
.exchange()
.expectStatus().isCreated();
}
}
Manual Testing
Definition: Human testers interact with the system; not automated.
When used:
- Exploratory testing (finding unexpected issues)
- UX testing (does the UI feel right?)
- Complex scenarios that are hard to automate
- Ad-hoc testing after major changes
Differs from automation:
| Aspect | Manual | Automated |
|---|---|---|
| Speed | Slow; limited coverage | Fast; broad coverage |
| Cost | High (tester time) | High (initial); amortized |
| Maintenance | Low (scenarios change) | High (tests break with UI changes) |
| Repeatability | Variable (human error) | Consistent (exact same steps) |
| Insight | Creative; finds edge cases | Deterministic; tests spec |
Best for:
- New products (scenarios unknown)
- UI/UX validation (feels good?)
- Complex edge cases (hard to automate)
- Accessibility testing (screen readers, keyboard nav)
Test Scope Examples by Layer
Here's a concrete example showing different test types for an Order Service:
┌─────────────────────────────────────────────────────┐
│ E2E Test: Full checkout flow (UI → API → DB) │
│ "Customer adds item, checks out, sees confirmation"│
│ Tool: Selenium, Cypress │
│ Speed: 5–10 seconds │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Contract Test: Order Service ↔ Payment Service │
│ "Order expects Payment to return {txnId, status}" │
│ Tool: Pact │
│ Speed: < 1 second │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Component Test: Order Service (Payment mocked) │
│ "Given order with gold customer, discount applied" │
│ Tool: Spring Boot Test │
│ Speed: 100ms–1s │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Unit Test: Discount calculation │
│ "Gold tier + $100 = $85" │
│ Tool: JUnit + Mockito │
│ Speed: < 100ms │
└─────────────────────────────────────────────────────┘
Each layer tests a different scope:
- Unit: Single function
- Component: Service + internal logic
- Contract: Service boundaries
- E2E: Entire user flow
Test Strategy by Architecture Style
Monolith
- High % unit tests (70–80%)
- Integration tests for critical flows (15–20%)
- E2E for user journeys (5–10%)
- No contract tests needed (single codebase)
Microservices
- Unit tests: 60–70% (smaller services)
- Integration tests: 15–20% (database + local dependencies)
- Contract tests: 5–10% (enforce API boundaries)
- E2E tests: 3–5% (expensive; only critical paths)
- Load tests: Baseline for each service
- Smoke tests: Before releases (quick sanity check)
What's the difference between TDD and BDD?
TDD writes code-level tests first (red-green-refactor). BDD extends TDD by writing business-readable scenarios (Given-When-Then) that non-technical people understand. TDD is developer-focused; BDD is team-focused (devs + QA + business).
When should I use BDD vs regular unit tests?
Use BDD for complex business logic and user-facing features where clarity matters. Use unit tests for utility functions and low-level logic. Often use both: BDD at scenario level; unit tests for implementation details.
What's the difference between component and integration tests?
Component tests mock external dependencies (payment gateway, other services). Integration tests use real dependencies (real database). Component tests run faster; integration tests catch more issues. Use component tests first; integration tests for critical paths.
How much test coverage should we aim for?
60–80% of critical business logic. Avoid chasing 100%; diminishing returns after 80%. Use mutation testing to validate quality over quantity.
Who writes tests: developers or QA?
Developers write unit, integration, and contract tests. QA writes E2E, load, and exploratory tests. With good unit/integration tests, QA focus shifts from regression to edge cases and UX.
Should we run all tests in CI or just fast ones?
Run unit + integration (< 5 min) on every commit. Run E2E + load on release branches or nightly. Parallel execution helps; aim for total CI time < 10 min. Always run smoke tests after deployment.