test: benchmark scripts (#3160)

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-05 18:27:22 +00:00

# What does this PR do?
1. Add our own benchmark script instead of locust (doesn't support
measuring streaming latency well)
2. Simplify k8s deployment
3. Add a simple profile script for locally running server

## Test Plan
❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10

============================================================
BENCHMARK RESULTS
============================================================
Total time: 180.00s
Concurrent users: 10
Total requests: 1636
Successful requests: 1636
Failed requests: 0
Success rate: 100.0%
Requests per second: 9.09

Response Time Statistics:
  Mean: 1.095s
  Median: 1.721s
  Min: 0.136s
  Max: 3.218s
  Std Dev: 0.762s

Percentiles:
  P50: 1.721s
  P90: 1.751s
  P95: 1.756s
  P99: 1.796s

Time to First Token (TTFT) Statistics:
  Mean: 0.037s
  Median: 0.037s
  Min: 0.023s
  Max: 0.211s
  Std Dev: 0.011s

TTFT Percentiles:
  P50: 0.037s
  P90: 0.040s
  P95: 0.044s
  P99: 0.055s

Streaming Statistics:
  Mean chunks per response: 64.0
  Total chunks received: 104775

This commit is contained in:

ehhuang

2025-08-15 11:24:29 -07:00

• committed by

GitHub

parent 2114214fe3

commit 2c06b24c77

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

13 changed files with 633 additions and 328 deletions

5

docs/source/contributing/index.md

View file

 @ -23,6 +23,11 @@ new_vector_database
 ```{include} ../../../tests/README.md
 ```
 ## Benchmarking
 ```{include} ../../../docs/source/distributions/k8s-benchmark/README.md
 ```
 ### Advanced Topics
 For developers who need deeper understanding of the testing system internals:

Rows
Columns

test: benchmark scripts (#3160)

5 docs/source/contributing/index.md Unescape Escape View file

5

docs/source/contributing/index.md

View file