Load Testing Methodology

The Systematic Approach

Performance testing isn't random. It's a disciplined process:

1. PLAN: Define what you're testing and why
2. DESIGN: Create realistic scenarios
3. IMPLEMENT: Code simulations in Gatling
4. EXECUTE: Run tests systematically
5. ANALYZE: Understand the results
6. OPTIMIZE: Fix bottlenecks
7. RE-TEST: Verify improvements
8. DEPLOY: With confidence

Phase 1: PLAN (Define Objectives)

Step 1a: Document Success Criteria (SLAs)

Before you write any code, define what "success" looks like:

Question: What must be true for the test to PASS?

Examples:

1. E-commerce checkout
   ├─ p95 latency: <500ms
   ├─ p99 latency: <1000ms
   ├─ Success rate: >99.5% (max 0.5% errors)
   └─ Achieve: 2,000 concurrent users

2. Mobile API
   ├─ p95 latency: <200ms
   ├─ p99 latency: <500ms
   ├─ Success rate: >99.9%
   └─ Achieve: 5,000 RPS

3. Real-time analytics service
   ├─ p95 latency: <50ms
   ├─ p99 latency: <100ms
   ├─ Success rate: >99.99%
   └─ Achieve: 10,000 RPS

Pro tip: Involve stakeholders in defining SLAs! - Product team: What do users tolerate? - Ops team: What can infrastructure support? - Engineering: What's reasonable to achieve?

Step 1b: Identify Critical User Paths

List the most important scenarios to test:

E-commerce example:

Rank 1: Browse & Purchase (most revenue impact)
├─ Search products
├─ View product details
├─ Add to cart
├─ Checkout
└─ Payment

Rank 2: Account Management
├─ Login
├─ View order history
├─ Update profile
└─ Change password

Rank 3: Customer Support
├─ View FAQ
├─ Submit ticket
└─ Chat

→ Focus load testing on Rank 1 paths first!

Step 1c: Estimate Expected Load

Question: How many concurrent users will we have?

Methods:

1. Historical data
   └─ "We had 10,000 concurrent users on Black Friday"

2. Capacity planning
   └─ "We want to grow to 50,000 users/day = 500 concurrent"

3. Industry benchmarks
   └─ "Industry average is 10% simultaneous"

4. Peak calculation
   └─ "100,000 daily active users × 5% = 5,000 peak concurrent"

→ TEST TO 2-3x EXPECTED LOAD (safety margin)

Phase 2: DESIGN (Create Scenarios)

Step 2a: Define Realistic User Behavior

Bad: 1,000 users all doing the exact same thing at the exact same time - Unrealistic - Tests a specific bottleneck, not general behavior - Doesn't reveal cascade failures

Good: Users follow realistic patterns with variety

Realistic user behavior:
├─ Not all users do the same thing
│  └─ 70% browse
│  └─ 20% purchase
│  └─ 10% use support
├─ Users have think-time (don't hammer continuously)
│  └─ Read product description: 10-30 seconds
│  └─ Check reviews: 5-15 seconds
│  └─ Decide to purchase or not: 30-60 seconds
├─ Request patterns vary
│  └─ Sometimes cache hit (fast)
│  └─ Sometimes cache miss (slow)
│  └─ Occasional errors (network, external service)
└─ Data varies
   └─ Different users, products, regions
   └─ Different request payloads

Implementation in Gatling: Use feeders + pauses

scenario("Realistic User")
    .feed(userFeeder)  // Different user each iteration
    .exec(http("Search").get("/search?q=#{query}"))
    .pause(10, 30)     // 10-30 sec think-time
    .exec(http("View Details").get("/product/#{productId}"))
    .pause(5, 15)
    // ... more realistic actions ...

Step 2b: Design Multiple Scenarios

Scenario 1: Load Test (Baseline)
├─ Constant load: 1,000 users for 15 minutes
├─ Objective: "Does it meet SLAs under normal load?"
├─ Success: p95 <500ms, success rate >99%
└─ Duration: ~20 minutes

Scenario 2: Ramp Test (Find Limits)
├─ Start: 100 users, ramp to 5,000 users over 30 minutes
├─ Objective: "At what load do SLAs break?"
├─ Watch for: Where does p95 spike? Where do errors start?
└─ Duration: ~40 minutes

Scenario 3: Step Test (Threshold Analysis)
├─ Step 1: 1,000 users for 5 min
├─ Step 2: 2,000 users for 5 min
├─ Step 3: 3,000 users for 5 min
├─ Step 4: 4,000 users for 5 min
├─ Objective: "At what level does each metric degrade?"
├─ Watch for: Cache warmup, JVM behavior at each level
└─ Duration: ~25 minutes

Scenario 4: Spike Test (Recovery)
├─ Normal: 500 users for 5 min
├─ Spike: Jump to 5,000 users for 5 min
├─ Recovery: Back to 500 users for 10 min
├─ Objective: "Can the system recover from sudden load?"
└─ Duration: ~25 minutes

Scenario 5: Soak Test (Long-term Stability)
├─ Constant: 1,000 users for 24 hours
├─ Objective: "Does the system stay stable overnight?"
├─ Watch for: Memory leaks, connection leaks, degradation
└─ Duration: 24 hours (run overnight)

Step 2c: Data Strategy

Question: What data will you use in the test?

Options:

1. Random data (auto-generated)
   └─ No setup needed
   └─ Sometimes unrealistic (fake emails, nonsense values)
   └─ Good for: Basic load testing

2. CSV file with real data
   └─ Realistic values (real user IDs, emails, products)
   └─ Can simulate real data patterns
   └─ Good for: Realistic testing

3. Custom generator (Java code)
   └─ Full control over data generation
   └─ Can create complex, business-logic-aware data
   └─ Good for: Advanced testing

Example: Testing purchase flow

Option 1 (Bad): Random product ID
├─ productId: "xyz123random456"
└─ API returns: "Product not found" error
   └─ Test is invalid!

Option 2 (Good): CSV file
├─ CSV has 1,000 valid product IDs
├─ Test uses: "P123", "P456", "P789", etc.
└─ API returns: Valid product data

Option 3 (Best): Custom generator
├─ Java code knows: "Products are P0001 to P9999"
├─ Java code also knows: "Premium users can buy any product"
├─ Java code also knows: "Normal users can only buy available items"
└─ Test simulates: Real business logic constraints

Phase 3: IMPLEMENT (Code Simulations)

This is where you write Gatling code. (See Labs 1-8 for detailed examples.)

Quick template:

public class MyLoadTest extends Simulation {

  // 1. Protocol (where to send requests)
  HttpProtocolBuilder httpProtocol = http
      .baseUrl("https://api.example.com")
      .header("Content-Type", "application/json");

  // 2. Scenario (what users do)
  ScenarioBuilder scenario = scenario("User Journey")
      .feed(csvFeeder)  // Inject data
      .exec(http("GET products").get("/products"))
      .pause(5)  // Think time
      .exec(http("POST purchase").post("/purchase")
          .body(StringBody("{...}")));

  // 3. Setup (how many users, how fast)
  {
    setUp(
        scenario.injectOpen(constantUsersPerSec(100).during(600))
    )
    .protocols(httpProtocol)
    .assertions(
        global().responseTime().p95().lt(500),  // SLA
        global().successfulRequests().percent().gt(99.0)
    );
  }
}

Phase 4: EXECUTE (Run Tests)

Step 4a: Pre-Test Checklist

Before you hit "run":

Infrastructure:
  ☐ Staging environment is clean (no other traffic)
  ☐ Databases are reset to known state
  ☐ Caches are cleared (or warmed up, depending on scenario)
  ☐ Monitoring is enabled (Datadog, APM, etc.)
  ☐ Log level is set appropriately (ERROR level to avoid log spam)

Gatling:
  ☐ Simulation compiles with no errors
  ☐ Smoke test passes (1 user, single iteration)
  ☐ Test data is available (CSV files, feeders)
  ☐ Assertions are set correctly (SLAs)
  ☐ Duration is reasonable (don't run 24-hour soak on first try)

Team:
  ☐ Stakeholders are informed (don't surprise ops team)
  ☐ On-call engineer is available if needed
  ☐ Slack/email channel is open for communication
  ☐ Plan to cancel test if something goes wrong

Step 4b: Execution Pattern

1. Smoke Test (warm-up)
   └─ Run with 1 user for 1-2 minutes
   └─ Verifies: Code compiles, API responds, no crashes
   └─ If fails: Fix and retry before real test

2. Wait 10 minutes
   └─ Let system settle
   └─ Let logs clear
   └─ Let monitoring reset

3. Run Load Test
   └─ Your actual test (constant, ramp, step, spike)
   └─ Record all metrics and logs

4. Wait 10 minutes
   └─ Let system cool down
   └─ Monitor for memory leaks or slow recovery

5. If all good: Run next scenario
   └─ E.g., after load test, run ramp test
   └─ Or, schedule for next day

Don't run back-to-back tests without cool-down!

Step 4c: During Test: Monitor Everything

Gatling console (live):
├─ Active users (ramping up?)
├─ Throughput (RPS increasing or plateau?)
├─ Response times (p95, p99 trending?)
├─ Errors (any appearing?)
└─ Success rate (dropping?)

Server monitoring (Datadog/APM):
├─ CPU usage (climbing?)
├─ Memory (stable or growing?)
├─ Database connections (free or exhausted?)
├─ Disk I/O (at ceiling?)
├─ Network (saturated?)
└─ Any errors in logs?

Be ready to abort if:
  ❌ Server crashes
  ❌ Error rate suddenly spikes >10%
  ❌ System doesn't recover (GC pauses endless)
  ❌ Cascading failures detected

Phase 5: ANALYZE (Review Results)

Step 5a: Read Gatling Report

After test completes, open:
  target/gatling/[simulation-name-timestamp]/index.html

Key sections:

1. Global Stats (top-level summary)
   ├─ Total requests sent
   ├─ Success/failure counts
   ├─ Min, mean, p50, p75, p95, p99 latencies
   └─ Requests per second

2. Request Detail (by endpoint)
   ├─ GET /products: p95=150ms
   ├─ POST /purchase: p95=500ms ← Slower!
   └─ GET /order: p95=200ms

3. Scenario (load ramp-up timeline)
   ├─ Shows: How many users active at each second
   └─ Validates: Did load ramp as expected?

4. Response Time Distribution (histogram)
   ├─ Visual: Most requests are fast
   └─ Tail: Few requests are very slow

5. Errors (if any)
   ├─ Error type: 500 Internal Server Error
   ├─ Count: 50 times
   └─ Timeline: Appeared after 10 minutes

Step 5b: Compare Against SLAs

Example: E-commerce load test

SLAs defined:
├─ p95 latency < 500ms
├─ p99 latency < 1000ms
├─ Success rate > 99%
└─ Support 2,000 concurrent users

Results:
├─ p95 latency: 480ms ✓ PASS
├─ p99 latency: 950ms ✓ PASS
├─ Success rate: 99.2% ✓ PASS
└─ Handled 2,000 concurrent ✓ PASS

Final result: ✅ TEST PASSED

Step 5c: Find Bottlenecks

If test failed, diagnose why:

Symptom 1: Latency increases linearly with load
├─ Likely cause: Resource exhaustion (CPU, memory, disk)
├─ Evidence: Check Datadog CPU/memory graphs
└─ Fix: Optimize code, add caching, scale infrastructure

Symptom 2: Latency increases exponentially
├─ Likely cause: Queue buildup (request backlog)
├─ Evidence: Response times spike, then don't recover
└─ Fix: Increase capacity or reduce incoming load

Symptom 3: Error rate increases suddenly
├─ Likely cause: Something overflowing (connection pool, memory)
├─ Evidence: Check error types in Gatling report
└─ Fix: Increase pool size, fix memory leak, scale service

Symptom 4: Certain endpoints slow, others fast
├─ Likely cause: Specific bottleneck (slow DB query)
├─ Evidence: Datadog traces show slow query in one endpoint
└─ Fix: Optimize that specific query

Symptom 5: p95 OK but p99 bad
├─ Likely cause: GC pauses, occasional queuing
├─ Evidence: p99 spikes, but p95 stable
└─ Fix: JVM tuning, improve code efficiency

Step 5d: Use Datadog to Drill Down

When Gatling says "latency is bad", Datadog tells you WHY:

Query in Datadog:
  trace.web.request.duration{service:my-api}
  by resource_name

Results:
├─ GET /api/products: p95=150ms ✓ Fast
├─ POST /api/purchase: p95=800ms ⚠ Slow
│  └─ Drill down into traces
│     └─ Find: Call to external payment service (600ms)
│        └─ Root cause: Payment API is slow
│           └─ Fix: Add timeout, use async, add circuit breaker
└─ GET /api/user/profile: p95=200ms ✓ Fast

Phase 6: OPTIMIZE

Based on analysis, fix bottlenecks:

Examples:

1. Slow database query
   ├─ Evidence: Datadog shows 400ms in SELECT query
   ├─ Fixes:
   │  ├─ Add index on WHERE column
   │  ├─ Fetch only needed columns (not *)
   │  ├─ Add database caching
   │  └─ Use read replica for high-traffic endpoints
   └─ Re-test: Verify improvement

2. CPU-bound processing
   ├─ Evidence: CPU 100%, latency high
   ├─ Fixes:
   │  ├─ Profile with JFR (Java Flight Recorder)
   │  ├─ Find hot methods
   │  ├─ Optimize algorithm or use caching
   │  └─ Consider moving to async processing
   └─ Re-test: Verify improvement

3. Exhausted connection pool
   ├─ Evidence: Errors "Too many connections"
   ├─ Fixes:
   │  ├─ Increase pool size
   │  ├─ Fix connection leaks (ensure close() called)
   │  ├─ Use connection pooling library (HikariCP)
   │  └─ Reduce query time (so connections released faster)
   └─ Re-test: Verify improvement

4. Memory leak
   ├─ Evidence: Heap grows 1GB → 8GB over test
   ├─ Fixes:
   │  ├─ Use heap dump analyzer (JProfiler, YourKit)
   │  ├─ Find object holding references
   │  ├─ Fix leak (remove listener, close resource)
   │  └─ Verify leak is gone
   └─ Re-test: Verify improvement

Phase 7: RE-TEST

After optimization:

1. Run same test scenario again
   └─ Compare metrics to baseline
   └─ Verify improvement (e.g., p95 dropped from 500ms → 300ms)

2. Run to new load level
   └─ If you optimized, try higher load
   └─ If p95 was 500ms at 1000 users, test 2000 users now

3. Run soak test
   └─ Ensure optimization didn't introduce memory leak
   └─ Run 2-8 hours at normal load

Success criteria:
✅ Metrics improved
✅ New SLAs are met
✅ No new issues introduced

Phase 8: DEPLOY

With confidence:

Before production deployment:
├─ Load test passed ✓
├─ Soak test passed ✓
├─ Code reviewed ✓
├─ Rollback plan ready ✓
└─ Team prepared ✓

Deployment strategy:
├─ Canary: 10% traffic to new version
├─ Monitor: Watch metrics, error rate, latency
├─ Expand: 50% traffic if metrics good
├─ Expand: 100% traffic if still good
└─ Rollback plan: If issues detected, revert instantly

Best Practices Summary

✅ DO:

  • Define SLAs before testing
  • Test realistic user behavior (think-time, variety)
  • Test multiple scenarios (load, ramp, soak, spike)
  • Monitor system resources during test
  • Use Datadog/APM to find bottlenecks
  • Re-test after optimization
  • Test in staging, not production
  • Document findings

❌ DON'T:

  • Run test without clear objective
  • Test with unrealistic data or behavior
  • Assume one test tells the whole story
  • Ignore spike in error rate during test
  • Fire all load from one machine (becomes bottleneck)
  • Deploy without testing (or with minimal testing)
  • Test at same load level each time (no learning)

Next Steps

Read next: Open Load Patterns - Specific load pattern techniques