ehhuang
|
2c06b24c77
|
test: benchmark scripts (#3160)
# What does this PR do?
1. Add our own benchmark script instead of locust (doesn't support
measuring streaming latency well)
2. Simplify k8s deployment
3. Add a simple profile script for locally running server
## Test Plan
❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10
============================================================
BENCHMARK RESULTS
============================================================
Total time: 180.00s
Concurrent users: 10
Total requests: 1636
Successful requests: 1636
Failed requests: 0
Success rate: 100.0%
Requests per second: 9.09
Response Time Statistics:
Mean: 1.095s
Median: 1.721s
Min: 0.136s
Max: 3.218s
Std Dev: 0.762s
Percentiles:
P50: 1.721s
P90: 1.751s
P95: 1.756s
P99: 1.796s
Time to First Token (TTFT) Statistics:
Mean: 0.037s
Median: 0.037s
Min: 0.023s
Max: 0.211s
Std Dev: 0.011s
TTFT Percentiles:
P50: 0.037s
P90: 0.040s
P95: 0.044s
P99: 0.055s
Streaming Statistics:
Mean chunks per response: 64.0
Total chunks received: 104775
|
2025-08-15 11:24:29 -07:00 |
|