test: benchmark scripts (#3160)

# What does this PR do?
1. Add our own benchmark script instead of locust (doesn't support
measuring streaming latency well)
2. Simplify k8s deployment
3. Add a simple profile script for locally running server

## Test Plan
❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10

============================================================
BENCHMARK RESULTS
============================================================
Total time: 180.00s
Concurrent users: 10
Total requests: 1636
Successful requests: 1636
Failed requests: 0
Success rate: 100.0%
Requests per second: 9.09

Response Time Statistics:
  Mean: 1.095s
  Median: 1.721s
  Min: 0.136s
  Max: 3.218s
  Std Dev: 0.762s

Percentiles:
  P50: 1.721s
  P90: 1.751s
  P95: 1.756s
  P99: 1.796s

Time to First Token (TTFT) Statistics:
  Mean: 0.037s
  Median: 0.037s
  Min: 0.023s
  Max: 0.211s
  Std Dev: 0.011s

TTFT Percentiles:
  P50: 0.037s
  P90: 0.040s
  P95: 0.044s
  P99: 0.055s

Streaming Statistics:
  Mean chunks per response: 64.0
  Total chunks received: 104775
This commit is contained in:
ehhuang 2025-08-15 11:24:29 -07:00 committed by GitHub
parent 2114214fe3
commit 2c06b24c77
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
13 changed files with 633 additions and 328 deletions

View file

@ -23,6 +23,11 @@ new_vector_database
```{include} ../../../tests/README.md
```
## Benchmarking
```{include} ../../../docs/source/distributions/k8s-benchmark/README.md
```
### Advanced Topics
For developers who need deeper understanding of the testing system internals: