litellm-mirror/docs/my-website/docs/benchmarks.md
2025-01-14 10:48:43 -08:00

2.2 KiB

import Image from '@theme/IdealImage';

Benchmarks

Benchmarks for LiteLLM Gateway (Proxy Server) tested against a fake OpenAI endpoint.

Use this config for testing:

Note: we're currently migrating to aiohttp which has 10x higher throughput. We recommend using the aiohttp_openai/ provider for load testing.

model_list:
  - model_name: "fake-openai-endpoint"
    litellm_params:
      model: aiohttp_openai/any
      api_base: https://your-fake-openai-endpoint.com/chat/completions
      api_key: "test"

1 Instance LiteLLM Proxy

In these tests the median latency of directly calling the fake-openai-endpoint is 60ms.

Metric Litellm Proxy (1 Instance)
RPS 475
Median Latency (ms) 100
Latency overhead added by LiteLLM Proxy 40ms

Key Findings

  • Single instance: 475 RPS @ 100ms latency
  • 2 LiteLLM instances: 950 RPS @ 100ms latency
  • 4 LiteLLM instances: 1900 RPS @ 100ms latency

2 Instances

Adding 1 instance, will double the RPS and maintain the 100ms-110ms median latency.

Metric Litellm Proxy (2 Instances)
Median Latency (ms) 100
RPS 950

Machine Spec used for testing

Each machine deploying LiteLLM had the following specs:

  • 2 CPU
  • 4GB RAM

Logging Callbacks

GCS Bucket Logging

Using GCS Bucket has no impact on latency, RPS compared to Basic Litellm Proxy

Metric Basic Litellm Proxy LiteLLM Proxy with GCS Bucket Logging
RPS 1133.2 1137.3
Median Latency (ms) 140 138

LangSmith logging

Using LangSmith has no impact on latency, RPS compared to Basic Litellm Proxy

Metric Basic Litellm Proxy LiteLLM Proxy with LangSmith
RPS 1133.2 1135
Median Latency (ms) 140 132

Locust Settings

  • 2500 Users
  • 100 user Ramp Up