(docs) Add docs on load testing benchmarks (#7499)

* docs benchmarks

* docs benchmarks
This commit is contained in:
Ishaan Jaff 2025-01-01 18:33:20 -08:00 committed by GitHub
parent 38bfefa6ef
commit e1fcd3ee43
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 47 additions and 11 deletions

View file

@ -1,21 +1,51 @@
import Image from '@theme/IdealImage';
# Benchmarks # Benchmarks
Benchmarks for LiteLLM Gateway (Proxy Server) Benchmarks for LiteLLM Gateway (Proxy Server) tested against a fake OpenAI endpoint.
Locust Settings: ## 1 Instance LiteLLM Proxy
- 2500 Users
- 100 user Ramp Up
## Basic Benchmarks | Metric | Litellm Proxy (1 Instance) |
|--------|------------------------|
| Median Latency (ms) | 110 |
| RPS | 68.2 |
Overhead when using a Deployed Proxy vs Direct to LLM <Image img={require('../img/1_instance_proxy.png')} />
- Latency overhead added by LiteLLM Proxy: 107ms
| Metric | Direct to Fake Endpoint | Basic Litellm Proxy | ## **Horizontal Scaling**
|--------|------------------------|---------------------|
| RPS | 1196 | 1133.2 | <Image img={require('../img/instances_vs_rps.png')} />
| Median Latency (ms) | 33 | 140 |
#### Key Findings
- Single instance: 68.2 RPS @ 100ms latency
- 10 instances: 4.3% efficiency loss (653 RPS vs expected 682 RPS), latency stable at `100ms`
- For 10,000 RPS: Need ~154 instances @ 95.7% efficiency, `100ms latency`
### 2 Instances
**Adding 1 instance, will double the RPS and maintain the `100ms-110ms` median latency.**
| Metric | Litellm Proxy (2 Instances) |
|--------|------------------------|
| Median Latency (ms) | 100 |
| RPS | 142 |
<Image img={require('../img/2_instance_proxy.png')} />
### 10 Instances
| Metric | Litellm Proxy (10 Instances) |
|--------|------------------------|
| Median Latency (ms) | 110 |
| RPS | 653 |
<Image img={require('../img/10_instance_proxy.png')} />
## Logging Callbacks ## Logging Callbacks
@ -39,3 +69,9 @@ Using LangSmith has **no impact on latency, RPS compared to Basic Litellm Proxy*
| RPS | 1133.2 | 1135 | | RPS | 1133.2 | 1135 |
| Median Latency (ms) | 140 | 132 | | Median Latency (ms) | 140 | 132 |
## Locust Settings
- 2500 Users
- 100 user Ramp Up

Binary file not shown.

After

Width:  |  Height:  |  Size: 158 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 156 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 158 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 150 KiB