llama-stack-mirror/docs/source/contributing/index.md at c66ebae9b62ca17702177ad5345063b07b77c42a

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-04 20:14:13 +00:00

# What does this PR do?
1. Add our own benchmark script instead of locust (doesn't support
measuring streaming latency well)
2. Simplify k8s deployment
3. Add a simple profile script for locally running server

## Test Plan
❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10

============================================================
BENCHMARK RESULTS
============================================================
Total time: 180.00s
Concurrent users: 10
Total requests: 1636
Successful requests: 1636
Failed requests: 0
Success rate: 100.0%
Requests per second: 9.09

Response Time Statistics:
  Mean: 1.095s
  Median: 1.721s
  Min: 0.136s
  Max: 3.218s
  Std Dev: 0.762s

Percentiles:
  P50: 1.721s
  P90: 1.751s
  P95: 1.756s
  P99: 1.796s

Time to First Token (TTFT) Statistics:
  Mean: 0.037s
  Median: 0.037s
  Min: 0.023s
  Max: 0.211s
  Std Dev: 0.011s

TTFT Percentiles:
  P50: 0.037s
  P90: 0.040s
  P95: 0.044s
  P99: 0.055s

Streaming Statistics:
  Mean chunks per response: 64.0
  Total chunks received: 104775

2025-08-15 11:24:29 -07:00

808 B

Raw Blame History

Adding a New Provider

See:

Adding a New API Provider Page which describes how to add new API providers to the Stack.
Vector Database Page which describes how to add a new vector databases with Llama Stack.
External Provider Page which describes how to add external providers to the Stack.

:maxdepth: 1
:hidden:

new_api_provider
new_vector_database

Testing

Benchmarking

Advanced Topics

For developers who need deeper understanding of the testing system internals:

:maxdepth: 1

testing/record-replay

808 B Raw Blame History

Adding a New Provider

Testing

Benchmarking

Advanced Topics

808 B

Raw Blame History