Testing Strategies — Deep Dive

Level: Intermediate Pre-reading: 09 · Deployment & Infrastructure


The Testing Pyramid

The testing pyramid is a framework for balancing test types by speed, cost, and coverage:

         /\
        /  \
       /E2E \        [Slow, Expensive, Few]
      /----- \
     /Contract\      [Medium, Moderate, More]
    /----------\
   / Integration\    [Faster, Cheaper, More]
  /--------------\
 /   Unit Tests   \  [Fast, Cheap, Many]
/------------------\
Type Speed Cost Coverage Ratio
Unit < 100ms Cheap Narrow (1 function) 70%
Integration 100ms–1s Moderate Medium (service) 20%
Contract < 1s Moderate API boundaries 5%
E2E 1s–10s Expensive Full system 5%

Anti-Pattern: Ice Cream Cone

         /\
        /  \
       /E2E \     [Many slow tests]
      /------\
     /  Few   \   [Expensive, brittle]
    /----------\
   /    Few     \
  /              \

This inverts the pyramid: expensive tests, slow feedback, high maintenance. Result: test suite becomes a burden; people skip tests locally; quality drops.


Unit Testing

Definition: Test a single function or class in isolation.

Characteristics

  • Scope: One function, one behavior
  • Speed: < 100ms per test
  • Isolation: Mock all dependencies; no I/O, no network
  • Framework: JUnit, Mockito (Java); Jest (JavaScript); pytest (Python)

Example: Order Service Discount Calculator

class DiscountCalculatorTest {

    @Test
    void applyDiscount_validCustomer_returnsDiscountedPrice() {
        // Arrange
        DiscountCalculator calc = new DiscountCalculator();
        double price = 100.0;
        Customer customer = new Customer("gold", 5); // gold tier, 5 years

        // Act
        double result = calc.applyDiscount(price, customer);

        // Assert
        assertEquals(85.0, result); // 15% off for gold tier
    }

    @Test
    void applyDiscount_newCustomer_noDiscount() {
        DiscountCalculator calc = new DiscountCalculator();
        Customer customer = new Customer("bronze", 0);
        assertEquals(100.0, calc.applyDiscount(100.0, customer));
    }
}

Best Practices

  • One assertion per test (or logically related assertions)
  • AAA pattern: Arrange, Act, Assert
  • Descriptive names: applyDiscount_validCustomer_returnsDiscountedPrice() not test1()
  • No test interdependencies: Each test should pass in isolation
  • Mock external dependencies: Database, HTTP clients, message queues

When NOT to Unit Test

  • Trivial getters/setters (unless enforcing validation)
  • Code that only configures frameworks (Spring beans)
  • Code already covered by integration tests

Integration Testing

Definition: Test multiple components working together; includes I/O and slow operations.

Characteristics

  • Scope: Service-to-database, service-to-service via HTTP
  • Speed: 100ms–1s per test
  • Setup: Real database, in-memory test database, or testcontainers
  • Framework: JUnit + Testcontainers (Java); pytest fixtures (Python); Jest + mock HTTP (JavaScript)

Example: Order Service Integration Test

@SpringBootTest
@Testcontainers
class OrderServiceIntegrationTest {

    @Autowired
    private OrderService orderService;

    @Container
    static PostgreSQLContainer<?> postgres = 
        new PostgreSQLContainer<>("postgres:15");

    @Test
    void createOrder_validData_persistsToDatabase() throws Exception {
        // Arrange
        OrderRequest request = new OrderRequest("PROD-123", 5);

        // Act
        Order order = orderService.createOrder(request);

        // Assert
        assertNotNull(order.getId());
        assertEquals("PROD-123", order.getProductId());

        // Verify persisted
        Order fetched = orderRepository.findById(order.getId()).orElseThrow();
        assertEquals(5, fetched.getQuantity());
    }

    @Test
    void createOrder_insufficientInventory_throwsException() {
        OrderRequest request = new OrderRequest("OUT-OF-STOCK", 1000);

        assertThrows(InsufficientInventoryException.class,
            () -> orderService.createOrder(request));
    }
}

Testcontainers Benefits

  • Real database: Postgres, MySQL, Redis, Kafka in Docker
  • Parallel execution: Each test gets fresh container instance
  • No external test DB: Safer; no data pollution across teams
  • Auto-cleanup: Container stopped and removed after test

Contract Testing (Consumer-Driven)

Definition: Test the API contract between services before integration.

Characteristics

  • Scope: Service A expects Service B to return specific JSON shape
  • Speed: < 1s per contract
  • Tool: Pact (JVM, JavaScript, Python, Go, .NET)
  • Key insight: Consumer defines the contract; provider proves compliance

Problem Solved

Without contract tests:

Timeline:
  Day 1: Service A deployed; calls Payment API
  Day 2: Payment team refactors response: {price} → {amount}
  Day 3: A's calls to Payment start failing in prod 🔥

With contract tests:

Timeline:
  Day 1: A and Payment define contract (pact)
  Day 2: Payment refactors; Pact tests fail in CI ✓ Caught!
  Day 2b: Payment team updates contract; A's PR updated
  Day 3: Both deployed; no surprises

Example: Order Service ↔ Payment Service Contract

Consumer (Order Service) defines:

@ExtendWith(PactConsumerTestExt.class)
@PactTestFor(providerName = "PaymentService", port = "8888")
class PaymentServicePactTest {

    @Pact(consumer = "OrderService")
    public V4Pact createPact(PactBuilder builder) {
        return builder
            .uponReceiving("a payment request")
            .path("/api/v1/payments")
            .method("POST")
            .body(json()
                .stringValue("orderId", "ORD-123")
                .numberValue("amount", 99.99)
            )
            .willRespondWith()
            .status(200)
            .body(json()
                .stringValue("transactionId", "TXN-abc")
                .stringValue("status", "SUCCESS")
            )
            .toPact();
    }

    @Test
    void orderService_callsPayment_succeeds(MockServer mockServer) {
        PaymentClient client = new PaymentClient(mockServer.getUrl());
        PaymentResponse resp = client.charge("ORD-123", 99.99);

        assertEquals("SUCCESS", resp.getStatus());
    }
}

Provider (Payment Service) verifies:

# Run in Payment Service CI
mvn pact:verify
# Loads pact from consumer; verifies Payment API satisfies it

When to Use Contract Tests

  • Multiple teams; service A and B evolve independently
  • Preventing silent API breaking changes
  • Building confidence in service boundaries

End-to-End (E2E) Testing

Definition: Test the full user journey through the entire system (UI → all backends → database).

Characteristics

  • Scope: Browser + all services + real databases
  • Speed: 1s–10s per test (slow!)
  • Tool: Selenium, Cypress, Playwright (browser automation)
  • Cost: Expensive to maintain; brittle (UI changes break tests)

Example: E2E Checkout Flow

describe('Checkout flow', () => {

    it('User completes purchase end-to-end', async () => {
        // Navigate to store
        await page.goto('https://store.example.com');

        // Search for product
        await page.fill('[data-testid="search"]', 'laptop');
        await page.click('button:has-text("Search")');

        // Add to cart
        await page.click('button:has-text("Add to Cart")');

        // Checkout
        await page.click('button:has-text("Proceed to Checkout")');

        // Enter payment
        await page.fill('[data-testid="card-number"]', '4111-1111-1111-1111');
        await page.fill('[data-testid="expiry"]', '12/26');
        await page.click('button:has-text("Pay Now")');

        // Verify order confirmation
        await expect(page).toHaveURL(/\/order-confirmation\/\d+/);
        await expect(page.locator('text=Order Confirmed')).toBeVisible();
    });
});

Best Practices

  • Minimize E2E tests: Only happy path + critical flows
  • Use test data: Pre-seed database; don't rely on manual setup
  • Parallel execution: Spin up multiple browser instances
  • Retry flaky tests: Network hiccups happen; retry 2–3 times
  • Screenshots on failure: Capture UI state for debugging

Anti-Patterns

  • E2E tests for every code path (too slow; use unit tests)
  • Tests dependent on UI HTML structure (use data-testid attributes)
  • Tests that rely on exact timing (add waits for element visibility)

Load Testing

Definition: Verify system behaves correctly under high load (high RPS, high concurrency).

Types

Type Load Purpose Tool
Load Test Realistic traffic volume Baseline perf; identify bottleneck JMeter, Locust, k6
Stress Test Push until breaking point Find failure threshold Same tools
Spike Test Sudden load increase Test auto-scaling response Same tools
Soak Test Sustained load for hours Detect memory leaks, connection pool exhaustion Same tools

Example: JMeter Load Test

Test Plan:
  - Ramp-up: 0–100 concurrent users over 5 min
  - Sustained: 100 users for 10 min
  - Ramp-down: 100–0 over 2 min

Assertions:
  - P95 latency < 500ms
  - Error rate < 0.1%
  - Throughput > 500 RPS

Endpoints:
  - GET /api/products (40%)
  - GET /api/products/{id} (40%)
  - POST /api/orders (20%)

Observations

Results:
  Throughput: 400 RPS
  P95 latency: 800ms ← exceeds 500ms target

Bottleneck: Database connection pool (max 20, depleted at ~300 RPS)

Action: Increase pool size from 20 → 50; re-test

Post-fix:
  Throughput: 600 RPS ✓
  P95 latency: 300ms ✓
  Error rate: 0.05% ✓

Tools Comparison

Tool Best For Learning Curve
JMeter Enterprise; complex scenarios High
Locust Pythonic; large-scale Medium
k6 Developer-friendly; cloud-native Low
Artillery Quick baseline tests Low

Chaos Engineering

Definition: Intentionally inject failures to verify system handles them gracefully.

Principles

  1. Steady state: Define normal system behavior (latency, error rate)
  2. Hypothesis: Assume system remains in steady state despite failure
  3. Experiment: Inject failure (kill pod, add network latency)
  4. Observe: Measure if steady state is maintained
  5. Learn: If hypothesis breaks, fix it

Example: Chaos Experiment — Pod Crash

Hypothesis: If one pod crashes, traffic failover to remaining replicas; error rate stays < 0.1%.

Experiment:

# Tool: Gremlin, Chaos Monkey, Kyverno

1. Baseline: Order service has 3 replicas, P95 latency 100ms, error rate 0.001%
2. Kill 1 pod
3. Monitor: Does error rate stay < 0.1%? Does latency spike > 200ms?
4. Observation: Error rate jumps to 5% for 10 seconds  Hypothesis broken
5. Root cause: No graceful shutdown (SIGTERM handler); requests drop mid-flight
6. Fix: Add SIGTERM handler; drain in-flight requests before exit
7. Retest: Error rate stays 0.001%  Hypothesis confirmed

Chaos Experiments to Run (Order of Importance)

Experiment What Breaks Fix
Kill random pod Graceful shutdown, PDB settings Add SIGTERM handler; update PodDisruptionBudget
Latency injection (500ms) Timeouts, cascading failure Increase timeout; add bulkhead/circuit breaker
Package loss (10%) Network retries, connection pooling Tune retry backoff; increase pool size
Database slowness (5s queries) Query queue, thread starvation Add query timeout; optimize slow queries
Service dependency unavailable Fallback logic Implement fallback; test it regularly

Tools

Tool Scope Ease
Gremlin Cloud-agnostic; SaaS dashboard Easy; cost
Chaos Monkey AWS-native; random termination Easy; basic
Kyverno Kubernetes-native; policy-based Medium; free
Locust Load + chaos in code Medium; flexible
Pumba Docker; random container killing Easy; local

Test Coverage Metrics

Code coverage (% of lines executed by tests) is a useful signal but not a goal:

Coverage Interpretation
< 30% Tests are afterthought; likely missing critical paths
30–60% Decent; focus on critical business logic paths
60–80% Good; add E2E for critical user flows; don't obsess over edge cases
> 90% Diminishing returns; focus on mutation testing instead

Mutation Testing

Instead of "did we run this line?", ask "does changing this line break a test?"

// Original
public int calculateDiscount(int years) {
    return years > 5 ? 10 : 0;
}

// Mutation 1
public int calculateDiscount(int years) {
    return years > 4 ? 10 : 0;  // Changed 5 → 4
}

// Mutation 2
public int calculateDiscount(int years) {
    return years >= 5 ? 10 : 0;  // Changed > to >=
}

If tests don't catch these mutations, coverage is false confidence.

Tool: PIT (Java), Stryker (JavaScript).


TDD vs BDD

Test-Driven Development (TDD)

Red-Green-Refactor cycle:

  1. Red: Write a failing test for the feature you want to build
  2. Green: Write minimal code to make the test pass
  3. Refactor: Improve code quality without changing behavior

Example: TDD for Discount Calculator

// Step 1: Red — Write test first (test fails)
@Test
void applyDiscount_goldCustomer_returns15Percent() {
    DiscountCalculator calc = new DiscountCalculator();
    Customer customer = new Customer("gold", 5);

    double result = calc.applyDiscount(100.0, customer);

    assertEquals(85.0, result);  // Test fails: method doesn't exist yet
}

// Step 2: Green — Write minimal code to pass
public class DiscountCalculator {
    public double applyDiscount(double price, Customer customer) {
        if ("gold".equals(customer.getTier())) {
            return price * 0.85;  // Minimal code: hardcoded for gold
        }
        return price;
    }
}

// Step 3: Refactor — Add more tests; improve implementation
@Test
void applyDiscount_silverCustomer_returns10Percent() {
    // Test for silver tier
}

// Refactor: Extract to method
private double getDiscount(String tier) {
    return switch(tier) {
        case "gold" -> 0.15;
        case "silver" -> 0.10;
        default -> 0.0;
    };
}

Benefits of TDD:

  • Forces you to think about API design before implementation
  • 100% test coverage naturally
  • Refactoring confidence: tests catch regressions
  • Documents expected behavior via tests

Behavior-Driven Development (BDD)

BDD extends TDD by writing tests in business language, not code language.

Key principle: Tests describe behavior (what the system should do), not implementation (how it works).

Format: Given-When-Then (Gherkin syntax)

Feature: Order checkout discount
  Scenario: Gold customer gets 15% discount
    Given a gold-tier customer with 5 years tenure
    When they checkout with $100 order
    Then they pay $85

  Scenario: New customer gets no discount
    Given a new customer with no history
    When they checkout with $100 order
    Then they pay $100

Automated with Cucumber/Behave/Gherkin:

public class OrderSteps {

    private Customer customer;
    private double totalPrice;
    private double discountedPrice;

    @Given("a gold-tier customer with {int} years tenure")
    public void createGoldCustomer(int years) {
        customer = new Customer("gold", years);
    }

    @When("they checkout with ${double} order")
    public void checkout(double price) {
        totalPrice = price;
        DiscountCalculator calc = new DiscountCalculator();
        discountedPrice = calc.applyDiscount(price, customer);
    }

    @Then("they pay ${double}")
    public void verifyPrice(double expected) {
        assertEquals(expected, discountedPrice);
    }
}

BDD Benefits:

  • Business-readable: Non-technical stakeholders understand tests
  • Living documentation: Tests describe current behavior
  • Collaboration: Developers, QA, and product managers write scenarios together
  • Catches misunderstandings: Ambiguities surface in Given-When-Then discussions

TDD vs BDD:

Aspect TDD BDD
Focus How to implement What behavior matters
Language Code (Assert, etc) Business English (Gherkin)
Test level Unit/Integration Scenario-based (varies)
Audience Developers Developers + QA + Business
Granularity Fine (single function) Coarse (user story)

Fundamental Testing Types (By Scope)

The testing pyramid shows types (unit, integration, E2E). But there are other dimensions:

Component Testing

Definition: Test a single component (service, module, class cluster) in isolation.

Scope: Larger than unit; smaller than integration.

Example: Payment Service Component Test

@SpringBootTest
class PaymentServiceComponentTest {

    @Autowired
    private PaymentService paymentService;

    @MockBean
    private PaymentGatewayClient gatewayClient;  // Mock external API

    @Autowired
    private PaymentRepository repository;

    @Test
    void processPayment_validCard_recordsTransaction() {
        // Test the entire Payment Service; external calls mocked
        when(gatewayClient.charge(any())).thenReturn(ChargeResponse.success());

        Payment payment = paymentService.process(
            new PaymentRequest("4111-1111-1111-1111", 99.99)
        );

        assertNotNull(payment.getId());
        verify(repository).save(any());  // Verify DB interaction
    }
}

What's tested:

  • Service logic: discount logic + payment flow
  • Database: transactions are persisted
  • Mocked: external payment gateway

What's NOT tested:

  • How payment gateway actually works (mocked)
  • How other services integrate (mocked)

Functional Testing

Definition: Verify that a feature works as specified; any test that checks functionality qualifies.

Scope: Can be unit, component, integration, or E2E.

Example: Functional Test for Order Discount

@Test
void orderCheckout_appliesLoyaltyDiscount_correctly() {
    // Functional: Does the discount feature work end-to-end?

    // Setup
    Customer customer = new Customer("gold", 5);
    Order order = new Order(100.0);

    // Execute
    OrderProcessor processor = new OrderProcessor();
    double finalPrice = processor.checkout(customer, order);

    // Verify
    assertEquals(85.0, finalPrice);  // 15% discount applied
}

Key point: Functional testing doesn't care about implementation details; only that the feature works.


Regression Testing

Definition: Verify that changes don't break existing features.

Scope: Usually re-run existing test suite after code changes.

Example:

You add a new feature: "Silver customers get 10% discount"

Regression tests:
  ✓ Gold customers still get 15% (didn't break)
  ✓ New customers get 0% (didn't break)
  ✓ Silver customers get 10% (new feature)

When to run: After every code change, before release.

Tools: Automated test suites (unit, integration, E2E).


Smoke Testing

Definition: Quick sanity check; verify the system is basically working after deployment.

Scope: Small subset of critical paths; fast to run.

Example: Smoke Tests for E-commerce

# Smoke test suite (runs in < 2 minutes)
POST /api/login  200 OK
GET /api/products  200 OK (not empty)
POST /api/orders  201 OK (order created)
GET /api/orders/{id}  200 OK (order retrieved)

When to run: After deploying to staging/prod; before running full test suite.

Typical time: < 5 minutes for entire smoke test suite.

Example in code:

@SpringBootTest
class SmokeTests {

    @Autowired
    private WebTestClient webClient;

    @Test
    void systemIsUp_returnsHealthOK() {
        webClient.get()
            .uri("/actuator/health")
            .exchange()
            .expectStatus().isOk();
    }

    @Test
    void canCreateOrder() {
        webClient.post()
            .uri("/api/orders")
            .bodyValue(new CreateOrderRequest("PROD-123", 5))
            .exchange()
            .expectStatus().isCreated();
    }
}

Manual Testing

Definition: Human testers interact with the system; not automated.

When used:

  • Exploratory testing (finding unexpected issues)
  • UX testing (does the UI feel right?)
  • Complex scenarios that are hard to automate
  • Ad-hoc testing after major changes

Differs from automation:

Aspect Manual Automated
Speed Slow; limited coverage Fast; broad coverage
Cost High (tester time) High (initial); amortized
Maintenance Low (scenarios change) High (tests break with UI changes)
Repeatability Variable (human error) Consistent (exact same steps)
Insight Creative; finds edge cases Deterministic; tests spec

Best for:

  • New products (scenarios unknown)
  • UI/UX validation (feels good?)
  • Complex edge cases (hard to automate)
  • Accessibility testing (screen readers, keyboard nav)

Test Scope Examples by Layer

Here's a concrete example showing different test types for an Order Service:

┌─────────────────────────────────────────────────────┐
│  E2E Test: Full checkout flow (UI → API → DB)       │
│  "Customer adds item, checks out, sees confirmation"│
│  Tool: Selenium, Cypress                            │
│  Speed: 5–10 seconds                                │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│  Contract Test: Order Service ↔ Payment Service     │
│  "Order expects Payment to return {txnId, status}"  │
│  Tool: Pact                                         │
│  Speed: < 1 second                                  │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│  Component Test: Order Service (Payment mocked)     │
│  "Given order with gold customer, discount applied" │
│  Tool: Spring Boot Test                             │
│  Speed: 100ms–1s                                    │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│  Unit Test: Discount calculation                    │
│  "Gold tier + $100 = $85"                           │
│  Tool: JUnit + Mockito                              │
│  Speed: < 100ms                                     │
└─────────────────────────────────────────────────────┘

Each layer tests a different scope:
- Unit: Single function
- Component: Service + internal logic
- Contract: Service boundaries
- E2E: Entire user flow

Test Strategy by Architecture Style

Monolith

  • High % unit tests (70–80%)
  • Integration tests for critical flows (15–20%)
  • E2E for user journeys (5–10%)
  • No contract tests needed (single codebase)

Microservices

  • Unit tests: 60–70% (smaller services)
  • Integration tests: 15–20% (database + local dependencies)
  • Contract tests: 5–10% (enforce API boundaries)
  • E2E tests: 3–5% (expensive; only critical paths)
  • Load tests: Baseline for each service
  • Smoke tests: Before releases (quick sanity check)

What's the difference between TDD and BDD?

TDD writes code-level tests first (red-green-refactor). BDD extends TDD by writing business-readable scenarios (Given-When-Then) that non-technical people understand. TDD is developer-focused; BDD is team-focused (devs + QA + business).

When should I use BDD vs regular unit tests?

Use BDD for complex business logic and user-facing features where clarity matters. Use unit tests for utility functions and low-level logic. Often use both: BDD at scenario level; unit tests for implementation details.

What's the difference between component and integration tests?

Component tests mock external dependencies (payment gateway, other services). Integration tests use real dependencies (real database). Component tests run faster; integration tests catch more issues. Use component tests first; integration tests for critical paths.

How much test coverage should we aim for?

60–80% of critical business logic. Avoid chasing 100%; diminishing returns after 80%. Use mutation testing to validate quality over quantity.

Who writes tests: developers or QA?

Developers write unit, integration, and contract tests. QA writes E2E, load, and exploratory tests. With good unit/integration tests, QA focus shifts from regression to edge cases and UX.

Should we run all tests in CI or just fast ones?

Run unit + integration (< 5 min) on every commit. Run E2E + load on release branches or nightly. Parallel execution helps; aim for total CI time < 10 min. Always run smoke tests after deployment.