llama-stack-mirror/benchmarking/k8s-benchmark/results/guidellm-benchmark-stack-s1-sw1-v1-20250922-103408.txt
Eric Huang 4dfb379a42 guidellm, runs, charts
# What does this PR do?


## Test Plan
# What does this PR do?


## Test Plan
# What does this PR do?


## Test Plan
2025-09-23 16:10:25 -07:00

171 lines
13 KiB
Text

Collecting uv
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.9/20.9 MB 144.3 MB/s eta 0:00:00
Installing collected packages: uv
Successfully installed uv-0.8.19
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: pip install --upgrade pip
Using Python 3.11.13 environment at: /usr/local
Resolved 61 packages in 551ms
Downloading pillow (6.3MiB)
Downloading hf-xet (3.0MiB)
Downloading tokenizers (3.1MiB)
Downloading pygments (1.2MiB)
Downloading pandas (11.8MiB)
Downloading aiohttp (1.7MiB)
Downloading pydantic-core (1.9MiB)
Downloading numpy (16.2MiB)
Downloading transformers (11.1MiB)
Downloading pyarrow (40.8MiB)
Downloading pydantic-core
Downloading aiohttp
Downloading tokenizers
Downloading hf-xet
Downloading pygments
Downloading pillow
Downloading numpy
Downloading pandas
Downloading transformers
Downloading pyarrow
Prepared 61 packages in 1.23s
Installed 61 packages in 114ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.12.15
+ aiosignal==1.4.0
+ annotated-types==0.7.0
+ anyio==4.10.0
+ attrs==25.3.0
+ certifi==2025.8.3
+ charset-normalizer==3.4.3
+ click==8.1.8
+ datasets==4.1.1
+ dill==0.4.0
+ filelock==3.19.1
+ frozenlist==1.7.0
+ fsspec==2025.9.0
+ ftfy==6.3.1
+ guidellm==0.3.0
+ h11==0.16.0
+ h2==4.3.0
+ hf-xet==1.1.10
+ hpack==4.1.0
+ httpcore==1.0.9
+ httpx==0.28.1
+ huggingface-hub==0.35.0
+ hyperframe==6.1.0
+ idna==3.10
+ loguru==0.7.3
+ markdown-it-py==4.0.0
+ mdurl==0.1.2
+ multidict==6.6.4
+ multiprocess==0.70.16
+ numpy==2.3.3
+ packaging==25.0
+ pandas==2.3.2
+ pillow==11.3.0
+ propcache==0.3.2
+ protobuf==6.32.1
+ pyarrow==21.0.0
+ pydantic==2.11.9
+ pydantic-core==2.33.2
+ pydantic-settings==2.10.1
+ pygments==2.19.2
+ python-dateutil==2.9.0.post0
+ python-dotenv==1.1.1
+ pytz==2025.2
+ pyyaml==6.0.2
+ regex==2025.9.18
+ requests==2.32.5
+ rich==14.1.0
+ safetensors==0.6.2
+ six==1.17.0
+ sniffio==1.3.1
+ tokenizers==0.22.1
+ tqdm==4.67.1
+ transformers==4.56.2
+ typing-extensions==4.15.0
+ typing-inspection==0.4.1
+ tzdata==2025.2
+ urllib3==2.5.0
+ wcwidth==0.2.14
+ xxhash==3.5.0
+ yarl==1.20.1
Using Python 3.11.13 environment at: /usr/local
Audited 1 package in 3ms
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Creating backend...
Backend openai_http connected to http://llama-stack-benchmark-service:8323/v1/openai for model meta-llama/Llama-3.2-3B-Instruct.
Creating request loader...
Created loader with 1000 unique requests from prompt_tokens=512,output_tokens=256.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [17:34:30] ⠋ 100% concurrent@1 (complete) Req: 0.3 req/s, 3.32s Lat, 1.0 Conc, 18 Comp, 1 Inc, 0 Err │
│ Tok: 74.0 gen/s, 238.6 tot/s, 40.2ms TTFT, 13.4ms ITL, 546 Prompt, 246 Gen │
│ [17:35:35] ⠋ 100% concurrent@2 (complete) Req: 0.6 req/s, 3.46s Lat, 2.0 Conc, 34 Comp, 2 Inc, 0 Err │
│ Tok: 139.6 gen/s, 454.0 tot/s, 48.0ms TTFT, 14.1ms ITL, 546 Prompt, 243 Gen │
│ [17:36:40] ⠋ 100% concurrent@4 (complete) Req: 1.1 req/s, 3.44s Lat, 3.9 Conc, 68 Comp, 4 Inc, 0 Err │
│ Tok: 273.2 gen/s, 900.4 tot/s, 50.7ms TTFT, 14.3ms ITL, 546 Prompt, 238 Gen │
│ [17:37:45] ⠋ 100% concurrent@8 (complete) Req: 2.2 req/s, 3.55s Lat, 7.7 Conc, 129 Comp, 8 Inc, 0 Err │
│ Tok: 519.1 gen/s, 1699.8 tot/s, 66.0ms TTFT, 14.6ms ITL, 547 Prompt, 240 Gen │
│ [17:38:50] ⠋ 100% concurrent@16 (complete) Req: 4.1 req/s, 3.76s Lat, 15.5 Conc, 247 Comp, 16 Inc, 0 Err │
│ Tok: 1005.5 gen/s, 3256.7 tot/s, 101.0ms TTFT, 15.0ms ITL, 547 Prompt, 244 Gen │
│ [17:39:56] ⠋ 100% concurrent@32 (complete) Req: 8.1 req/s, 3.84s Lat, 30.9 Conc, 483 Comp, 32 Inc, 0 Err │
│ Tok: 1926.3 gen/s, 6327.2 tot/s, 295.7ms TTFT, 14.8ms ITL, 547 Prompt, 239 Gen │
│ [17:41:03] ⠋ 100% concurrent@64 (complete) Req: 9.9 req/s, 6.05s Lat, 59.7 Conc, 576 Comp, 58 Inc, 0 Err │
│ Tok: 2381.0 gen/s, 7774.5 tot/s, 1196.2ms TTFT, 20.2ms ITL, 547 Prompt, 241 Gen │
│ [17:42:10] ⠋ 100% concurrent@128 (complete) Req: 9.2 req/s, 11.59s Lat, 107.2 Conc, 514 Comp, 117 Inc, 0 Err │
│ Tok: 2233.4 gen/s, 7286.3 tot/s, 2403.9ms TTFT, 38.2ms ITL, 547 Prompt, 242 Gen │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (8/8) [ 0:08:41 < 0:00:00 ]
Benchmarks Metadata:
Run id:511a14fd-ba11-4ffa-92ef-7cc23db4dd38
Duration:528.5 seconds
Profile:type=concurrent, strategies=['concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent'], streams=[1, 2, 4, 8, 16, 32, 64, 128]
Args:max_number=None, max_duration=60.0, warmup_number=None, warmup_duration=3.0, cooldown_number=None, cooldown_duration=None
Worker:type_='generative_requests_worker' backend_type='openai_http' backend_target='http://llama-stack-benchmark-service:8323/v1/openai' backend_model='meta-llama/Llama-3.2-3B-Instruct'
backend_info={'max_output_tokens': 16384, 'timeout': 300, 'http2': True, 'follow_redirects': True, 'headers': {}, 'text_completions_path': '/v1/completions', 'chat_completions_path':
'/v1/chat/completions'}
Request Loader:type_='generative_request_loader' data='prompt_tokens=512,output_tokens=256' data_args=None processor='meta-llama/Llama-3.2-3B-Instruct' processor_args=None
Extras:None
Benchmarks Info:
===================================================================================================================================================
Metadata |||| Requests Made ||| Prompt Tok/Req ||| Output Tok/Req ||| Prompt Tok Total||| Output Tok Total||
Benchmark| Start Time| End Time| Duration (s)| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err
--------------|-----------|---------|-------------|------|-----|-----|------|------|----|------|------|----|-------|------|----|-------|------|----
concurrent@1| 17:34:35| 17:35:35| 60.0| 18| 1| 0| 546.4| 512.0| 0.0| 246.0| 14.0| 0.0| 9835| 512| 0| 4428| 14| 0
concurrent@2| 17:35:40| 17:36:40| 60.0| 34| 2| 0| 546.4| 512.0| 0.0| 242.7| 80.0| 0.0| 18577| 1024| 0| 8253| 160| 0
concurrent@4| 17:36:45| 17:37:45| 60.0| 68| 4| 0| 546.4| 512.0| 0.0| 238.1| 103.2| 0.0| 37156| 2048| 0| 16188| 413| 0
concurrent@8| 17:37:50| 17:38:50| 60.0| 129| 8| 0| 546.7| 512.0| 0.0| 240.3| 180.0| 0.0| 70518| 4096| 0| 31001| 1440| 0
concurrent@16| 17:38:55| 17:39:55| 60.0| 247| 16| 0| 546.6| 512.0| 0.0| 244.1| 142.6| 0.0| 135002| 8192| 0| 60300| 2281| 0
concurrent@32| 17:40:01| 17:41:01| 60.0| 483| 32| 0| 546.5| 512.0| 0.0| 239.2| 123.2| 0.0| 263972| 16384| 0| 115540| 3944| 0
concurrent@64| 17:41:08| 17:42:08| 60.0| 576| 58| 0| 546.6| 512.0| 0.0| 241.3| 13.9| 0.0| 314817| 29696| 0| 138976| 807| 0
concurrent@128| 17:42:15| 17:43:15| 60.0| 514| 117| 0| 546.5| 512.0| 0.0| 241.6| 143.9| 0.0| 280911| 59904| 0| 124160| 16832| 0
===================================================================================================================================================
Benchmarks Stats:
=======================================================================================================================================================
Metadata | Request Stats || Out Tok/sec| Tot Tok/sec| Req Latency (sec) ||| TTFT (ms) ||| ITL (ms) ||| TPOT (ms) ||
Benchmark| Per Second| Concurrency| mean| mean| mean| median| p99| mean| median| p99| mean| median| p99| mean| median| p99
--------------|-----------|------------|------------|------------|------|-------|------|-------|-------|-------|-----|-------|-----|-----|-------|-----
concurrent@1| 0.30| 1.00| 74.0| 238.6| 3.32| 3.43| 3.61| 40.2| 39.3| 51.2| 13.4| 13.3| 14.0| 13.3| 13.2| 13.9
concurrent@2| 0.58| 1.99| 139.6| 454.0| 3.46| 3.64| 3.74| 48.0| 45.8| 72.0| 14.1| 14.1| 14.5| 14.0| 14.0| 14.4
concurrent@4| 1.15| 3.95| 273.2| 900.4| 3.44| 3.69| 3.74| 50.7| 47.2| 118.6| 14.3| 14.3| 14.4| 14.2| 14.2| 14.4
concurrent@8| 2.16| 7.67| 519.1| 1699.8| 3.55| 3.76| 3.87| 66.0| 48.8| 208.2| 14.6| 14.5| 14.8| 14.5| 14.5| 14.8
concurrent@16| 4.12| 15.48| 1005.5| 3256.7| 3.76| 3.90| 4.18| 101.0| 65.6| 396.7| 15.0| 15.0| 15.9| 15.0| 15.0| 15.9
concurrent@32| 8.05| 30.89| 1926.3| 6327.2| 3.84| 4.04| 4.39| 295.7| 265.6| 720.4| 14.8| 14.9| 15.5| 14.8| 14.8| 15.3
concurrent@64| 9.87| 59.74| 2381.0| 7774.5| 6.05| 6.18| 9.94| 1196.2| 1122.5| 4295.3| 20.2| 20.0| 25.8| 20.1| 19.9| 25.8
concurrent@128| 9.25| 107.16| 2233.4| 7286.3| 11.59| 12.04| 14.46| 2403.9| 2322.3| 4001.5| 38.2| 38.5| 53.0| 38.0| 38.3| 52.7
=======================================================================================================================================================
Saving benchmarks report...
Benchmarks report saved to /benchmarks.json
Benchmarking complete.