llama-stack-mirror/benchmarking/k8s-benchmark/results/guidellm-benchmark-stack-s1-sw2-v1-20250922-104457.txt
ehhuang 48a551ecbc
Some checks failed
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m9s
chore(perf): run guidellm benchmarks (#3421)
# What does this PR do?
- Mostly AI-generated scripts to run guidellm
(https://github.com/vllm-project/guidellm) benchmarks on k8s setup
- Stack is using image built from main on 9/11


## Test Plan
See updated README.md
2025-09-24 10:18:33 -07:00

171 lines
13 KiB
Text

Collecting uv
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.9/20.9 MB 149.3 MB/s eta 0:00:00
Installing collected packages: uv
Successfully installed uv-0.8.19
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: pip install --upgrade pip
Using Python 3.11.13 environment at: /usr/local
Resolved 61 packages in 494ms
Downloading pandas (11.8MiB)
Downloading tokenizers (3.1MiB)
Downloading pygments (1.2MiB)
Downloading aiohttp (1.7MiB)
Downloading transformers (11.1MiB)
Downloading numpy (16.2MiB)
Downloading pillow (6.3MiB)
Downloading pydantic-core (1.9MiB)
Downloading hf-xet (3.0MiB)
Downloading pyarrow (40.8MiB)
Downloading pydantic-core
Downloading aiohttp
Downloading tokenizers
Downloading hf-xet
Downloading pillow
Downloading pygments
Downloading numpy
Downloading pandas
Downloading pyarrow
Downloading transformers
Prepared 61 packages in 1.24s
Installed 61 packages in 126ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.12.15
+ aiosignal==1.4.0
+ annotated-types==0.7.0
+ anyio==4.10.0
+ attrs==25.3.0
+ certifi==2025.8.3
+ charset-normalizer==3.4.3
+ click==8.1.8
+ datasets==4.1.1
+ dill==0.4.0
+ filelock==3.19.1
+ frozenlist==1.7.0
+ fsspec==2025.9.0
+ ftfy==6.3.1
+ guidellm==0.3.0
+ h11==0.16.0
+ h2==4.3.0
+ hf-xet==1.1.10
+ hpack==4.1.0
+ httpcore==1.0.9
+ httpx==0.28.1
+ huggingface-hub==0.35.0
+ hyperframe==6.1.0
+ idna==3.10
+ loguru==0.7.3
+ markdown-it-py==4.0.0
+ mdurl==0.1.2
+ multidict==6.6.4
+ multiprocess==0.70.16
+ numpy==2.3.3
+ packaging==25.0
+ pandas==2.3.2
+ pillow==11.3.0
+ propcache==0.3.2
+ protobuf==6.32.1
+ pyarrow==21.0.0
+ pydantic==2.11.9
+ pydantic-core==2.33.2
+ pydantic-settings==2.10.1
+ pygments==2.19.2
+ python-dateutil==2.9.0.post0
+ python-dotenv==1.1.1
+ pytz==2025.2
+ pyyaml==6.0.2
+ regex==2025.9.18
+ requests==2.32.5
+ rich==14.1.0
+ safetensors==0.6.2
+ six==1.17.0
+ sniffio==1.3.1
+ tokenizers==0.22.1
+ tqdm==4.67.1
+ transformers==4.56.2
+ typing-extensions==4.15.0
+ typing-inspection==0.4.1
+ tzdata==2025.2
+ urllib3==2.5.0
+ wcwidth==0.2.14
+ xxhash==3.5.0
+ yarl==1.20.1
Using Python 3.11.13 environment at: /usr/local
Audited 1 package in 3ms
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Creating backend...
Backend openai_http connected to http://llama-stack-benchmark-service:8323/v1/openai for model meta-llama/Llama-3.2-3B-Instruct.
Creating request loader...
Created loader with 1000 unique requests from prompt_tokens=512,output_tokens=256.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [17:45:18] ⠋ 100% concurrent@1 (complete) Req: 0.3 req/s, 3.42s Lat, 1.0 Conc, 17 Comp, 1 Inc, 0 Err │
│ Tok: 73.9 gen/s, 233.7 tot/s, 50.2ms TTFT, 13.4ms ITL, 547 Prompt, 253 Gen │
│ [17:46:23] ⠋ 100% concurrent@2 (complete) Req: 0.6 req/s, 3.42s Lat, 2.0 Conc, 34 Comp, 2 Inc, 0 Err │
│ Tok: 134.7 gen/s, 447.4 tot/s, 50.8ms TTFT, 14.3ms ITL, 546 Prompt, 235 Gen │
│ [17:47:28] ⠋ 100% concurrent@4 (complete) Req: 1.1 req/s, 3.55s Lat, 3.9 Conc, 66 Comp, 4 Inc, 0 Err │
│ Tok: 268.7 gen/s, 873.1 tot/s, 54.9ms TTFT, 14.4ms ITL, 547 Prompt, 243 Gen │
│ [17:48:33] ⠋ 100% concurrent@8 (complete) Req: 2.2 req/s, 3.56s Lat, 7.8 Conc, 130 Comp, 8 Inc, 0 Err │
│ Tok: 526.1 gen/s, 1728.4 tot/s, 60.6ms TTFT, 14.7ms ITL, 547 Prompt, 239 Gen │
│ [17:49:38] ⠋ 100% concurrent@16 (complete) Req: 4.1 req/s, 3.79s Lat, 15.7 Conc, 246 Comp, 16 Inc, 0 Err │
│ Tok: 1006.9 gen/s, 3268.6 tot/s, 74.8ms TTFT, 15.3ms ITL, 547 Prompt, 243 Gen │
│ [17:50:44] ⠋ 100% concurrent@32 (complete) Req: 7.8 req/s, 3.95s Lat, 30.9 Conc, 467 Comp, 32 Inc, 0 Err │
│ Tok: 1912.0 gen/s, 6191.6 tot/s, 119.1ms TTFT, 15.7ms ITL, 547 Prompt, 244 Gen │
│ [17:51:50] ⠋ 100% concurrent@64 (complete) Req: 13.0 req/s, 4.75s Lat, 61.8 Conc, 776 Comp, 64 Inc, 0 Err │
│ Tok: 3154.3 gen/s, 10273.3 tot/s, 339.1ms TTFT, 18.3ms ITL, 547 Prompt, 242 Gen │
│ [17:52:58] ⠋ 100% concurrent@128 (complete) Req: 15.1 req/s, 7.82s Lat, 117.7 Conc, 898 Comp, 127 Inc, 0 Err │
│ Tok: 3617.4 gen/s, 11843.9 tot/s, 1393.8ms TTFT, 26.8ms ITL, 547 Prompt, 240 Gen │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (8/8) [ 0:08:41 < 0:00:00 ]
Benchmarks Metadata:
Run id:f73d408e-256a-4c32-aa40-05e8d7098b66
Duration:529.2 seconds
Profile:type=concurrent, strategies=['concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent'], streams=[1, 2, 4, 8, 16, 32, 64, 128]
Args:max_number=None, max_duration=60.0, warmup_number=None, warmup_duration=3.0, cooldown_number=None, cooldown_duration=None
Worker:type_='generative_requests_worker' backend_type='openai_http' backend_target='http://llama-stack-benchmark-service:8323/v1/openai' backend_model='meta-llama/Llama-3.2-3B-Instruct'
backend_info={'max_output_tokens': 16384, 'timeout': 300, 'http2': True, 'follow_redirects': True, 'headers': {}, 'text_completions_path': '/v1/completions', 'chat_completions_path':
'/v1/chat/completions'}
Request Loader:type_='generative_request_loader' data='prompt_tokens=512,output_tokens=256' data_args=None processor='meta-llama/Llama-3.2-3B-Instruct' processor_args=None
Extras:None
Benchmarks Info:
=====================================================================================================================================================
Metadata |||| Requests Made ||| Prompt Tok/Req ||| Output Tok/Req ||| Prompt Tok Total||| Output Tok Total ||
Benchmark| Start Time| End Time| Duration (s)| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err
--------------|-----------|---------|-------------|------|-----|-----|------|------|----|------|------|----|-------|------|----|--------|------|-----
concurrent@1| 17:45:23| 17:46:23| 60.0| 17| 1| 0| 546.6| 512.0| 0.0| 252.8| 136.0| 0.0| 9292| 512| 0| 4298| 136| 0
concurrent@2| 17:46:28| 17:47:28| 60.0| 34| 2| 0| 546.4| 512.0| 0.0| 235.4| 130.0| 0.0| 18577| 1024| 0| 8003| 260| 0
concurrent@4| 17:47:33| 17:48:33| 60.0| 66| 4| 0| 546.5| 512.0| 0.0| 243.0| 97.5| 0.0| 36072| 2048| 0| 16035| 390| 0
concurrent@8| 17:48:38| 17:49:38| 60.0| 130| 8| 0| 546.6| 512.0| 0.0| 239.2| 146.0| 0.0| 71052| 4096| 0| 31090| 1168| 0
concurrent@16| 17:49:43| 17:50:43| 60.0| 246| 16| 0| 546.6| 512.0| 0.0| 243.3| 112.3| 0.0| 134456| 8192| 0| 59862| 1797| 0
concurrent@32| 17:50:49| 17:51:49| 60.0| 467| 32| 0| 546.6| 512.0| 0.0| 244.2| 147.3| 0.0| 255242| 16384| 0| 114038| 4714| 0
concurrent@64| 17:51:55| 17:52:55| 60.0| 776| 64| 0| 546.5| 512.0| 0.0| 242.2| 106.1| 0.0| 424115| 32768| 0| 187916| 6788| 0
concurrent@128| 17:53:03| 17:54:03| 60.0| 898| 127| 0| 546.5| 512.0| 0.0| 240.3| 69.8| 0.0| 490789| 65024| 0| 215810| 8864| 0
=====================================================================================================================================================
Benchmarks Stats:
======================================================================================================================================================
Metadata | Request Stats || Out Tok/sec| Tot Tok/sec| Req Latency (sec)||| TTFT (ms) ||| ITL (ms) ||| TPOT (ms) ||
Benchmark| Per Second| Concurrency| mean| mean| mean| median| p99| mean| median| p99| mean| median| p99| mean| median| p99
--------------|-----------|------------|------------|------------|-----|-------|------|-------|-------|-------|-----|-------|-----|-----|-------|-----
concurrent@1| 0.29| 1.00| 73.9| 233.7| 3.42| 3.45| 3.50| 50.2| 50.9| 62.5| 13.4| 13.4| 13.5| 13.3| 13.3| 13.5
concurrent@2| 0.57| 1.96| 134.7| 447.4| 3.42| 3.67| 4.12| 50.8| 49.2| 79.8| 14.3| 14.2| 15.9| 14.3| 14.2| 15.9
concurrent@4| 1.11| 3.92| 268.7| 873.1| 3.55| 3.72| 3.80| 54.9| 51.7| 101.3| 14.4| 14.4| 14.5| 14.4| 14.4| 14.5
concurrent@8| 2.20| 7.82| 526.1| 1728.4| 3.56| 3.78| 3.93| 60.6| 49.8| 189.5| 14.7| 14.7| 14.8| 14.6| 14.6| 14.8
concurrent@16| 4.14| 15.66| 1006.9| 3268.6| 3.79| 3.94| 4.25| 74.8| 54.3| 328.4| 15.3| 15.3| 16.1| 15.2| 15.2| 16.0
concurrent@32| 7.83| 30.91| 1912.0| 6191.6| 3.95| 4.07| 4.53| 119.1| 80.5| 674.0| 15.7| 15.6| 17.4| 15.7| 15.6| 17.3
concurrent@64| 13.03| 61.85| 3154.3| 10273.3| 4.75| 4.93| 5.43| 339.1| 321.1| 1146.6| 18.3| 18.4| 19.3| 18.2| 18.3| 19.2
concurrent@128| 15.05| 117.71| 3617.4| 11843.9| 7.82| 8.58| 13.35| 1393.8| 1453.0| 5232.2| 26.8| 26.7| 36.0| 26.7| 26.6| 35.9
======================================================================================================================================================
Saving benchmarks report...
Benchmarks report saved to /benchmarks.json
Benchmarking complete.