Distributed Testing
Overview
When a single machine can't generate enough load, distribute testing across multiple machines.
Single Machine Limitation
1 machine: 10,000 concurrent users max
Need: 50,000 concurrent users
Solution: Distribute across 5 machines
Architecture
βββββββββββββββββββββββββββββββββββββββββββ
β Gatling Enterprise Controller β
β (Coordinates and aggregates results) β
βββββββββββββββββββββββββββββββββββββββββββ
β β β
ββββββββββ ββββββββββ ββββββββββ
βAgent 1 β βAgent 2 β βAgent 3 β
β3,000 β β3,000 β β3,000 β
βusers β βusers β βusers β
β10Gbps β β10Gbps β β10Gbps β
ββββββββββ ββββββββββ ββββββββββ
β β β
ββββββββββββββββββββββββββββββββ
β Target System Under Test β
β (Receives 30,000 users) β
ββββββββββββββββββββββββββββββββ
Setup Options
Option 1: Gatling Enterprise (Recommended)
β Managed platform
β Automatic agent coordination
β Built-in result aggregation
β Visual reporting
β Compliance features
β Requires subscription
Option 2: Open Source Distributed Setup
β Free
β Full control
β Manual coordination
β Manual result aggregation
β More operational overhead
Open Source Setup
Step 1: Prepare Agents
On each agent machine:
# Machine 1, 2, 3 (Ubuntu servers)
curl -X PUT -d -u admin:admin http://localhost:8080/gatling/data/simulation \
-H "Content-Type: application/json" \
-d @simulation.json
Step 2: Configure Simulation
// Reference simulation on agents
// Each agent runs same simulation with different user offset
public class Sim_DistributedLoad extends Simulation {
// Get agent ID (0, 1, 2, etc.)
String agentId = System.getProperty("gatling.agentId", "0");
// Distribute users across agents
// Agent 0: users 0-9999
// Agent 1: users 10000-19999
// Agent 2: users 20000-29999
int userOffset = Integer.parseInt(agentId) * 10000;
ScenarioBuilder scenario = scenario("Distributed Load")
.feed(userFeeder.offset(userOffset))
.exec(http("Request").get("/api"));
}
Step 3: Run on Each Agent
# Agent 1
mvn gatling:test \
-Dgatling.simulationClass=Sim_DistributedLoad \
-Dgatling.agentId=0
# Agent 2
mvn gatling:test \
-Dgatling.simulationClass=Sim_DistributedLoad \
-Dgatling.agentId=1
# Agent 3
mvn gatling:test \
-Dgatling.simulationClass=Sim_DistributedLoad \
-Dgatling.agentId=2
Step 4: Aggregate Results
# Collect results from each agent
# Manually merge CSV files:
cat agent1/results.csv agent2/results.csv agent3/results.csv > combined.csv
# Calculate aggregated metrics
# Total requests = sum of all agents
# P95 latency = p95 of combined data
Synchronization Challenges
Problem 1: Agents Start at Different Times
Agent 1: starts at 10:00:00
Agent 2: starts at 10:00:05 β 5 second delay
Agent 3: starts at 10:00:10 β 10 second delay
Result: Load is staggered, not simultaneous
Solution: Synchronized Start
// Use barrier to wait for all agents
Barrier barrier = new Barrier(3); // 3 agents
// All agents wait at barrier
barrier.await(); // Blocks until all 3 reach this point
// Then start simultaneously
setUp(scenario.injectOpen(...))
Data Collection & Aggregation
Per-Agent Results
Agent 1: 10,000 requests, p95=450ms, errors=2
Agent 2: 10,000 requests, p95=480ms, errors=3
Agent 3: 10,000 requests, p95=520ms, errors=1
Aggregated Results
Total: 30,000 requests
P95: ((450*10000 + 480*10000 + 520*10000) / 30000) = 483ms
Errors: 2 + 3 + 1 = 6 (0.02% error rate)
Network Bandwidth Considerations
Bandwidth Required
Per user: ~1MB data per second
Per machine (1000 concurrent users): ~1Gbps
Per machine (10,000 concurrent users): ~10Gbps
3 machines with 10,000 users each:
ββ Each machine: 10Gbps
ββ Total to target: 30Gbps
ββ Network: Must have β₯30Gbps capacity
Network Planning
Datacenter network: Typically 10Gbps per server
3 servers: 30Gbps total available
3 servers hitting target: 30Gbps required
Result: Perfect fit (but no headroom)
Better: Use 5 machines with 6,000 users each
ββ Per machine: 6Gbps
ββ Total: 30Gbps (same)
ββ Headroom: Yes, less contention
Best Practices
1. Network Isolation
Agents and target on same network
ββ Minimize latency
Avoid routing through internet
ββ Variable latency ruins test
2. Time Synchronization
# All machines must have synchronized clocks
ntpdate -u ntp.ubuntu.com # Sync to NTP
# Verify
timedatectl # Check clock is synchronized
3. Resource Sizing
Per agent machine:
ββ CPU: 16 cores (for 10,000 users)
ββ RAM: 32GB (for 10,000 users)
ββ Network: 10Gbps+ NIC
ββ Storage: Fast SSD for logging
4. Monitoring Agents
Monitor each agent during test:
ββ CPU: Should not exceed 80%
ββ Memory: Should not exceed 80%
ββ Network: Should not exceed 90%
If exceeded: Add more agents, reduce per-agent users
Troubleshooting
Issue: Uneven Load Distribution
Agent 1: 10,000 requests
Agent 2: 8,000 requests
Agent 3: 9,000 requests
Problem: Agents started at different times
Solution: Add synchronization barrier
Issue: Agent Runs Out of Memory
Issue: Network Bandwidth Maxed
Gatling Enterprise Alternative
For production-grade distributed testing:
Pros:
β Automatic scaling (0-100,000+ users)
β Managed cloud infrastructure
β Built-in reporting
β Real-time dashboards
β Compliance features
Cons:
β Cost ($$$)
β Less control
β Vendor lock-in
Use when: Load >50,000 users, team size >5, budget available
Key Takeaways
- Distributed testing = Multiple machines generating load
- Coordination = Synchronize start, aggregate results
- Network bandwidth = Plan for 10-30Gbps
- Agent sizing = 10,000 users per 16-core machine
- Monitoring = Watch CPU, memory, network on each agent
- Gatling Enterprise = Simplified alternative for large tests
Navigation
β Previous: Optimization Tips
β Next: [Quick Reference]](01-quick-reference.md)
β Up: Documentation Index