(docs) load balancing on proxy

2024-03-01 12:28:49 -08:00 · 2024-03-01 12:28:49 -08:00 · edab94c0a6
commit edab94c0a6
parent 64754dad52
1 changed files with 27 additions and 1 deletions
--- a/docs/my-website/docs/proxy/load_balancing.md
+++ b/docs/my-website/docs/proxy/load_balancing.md
@ -1,4 +1,4 @@
-# Multiple Instances of 1 model
+# Load Balancing - Config Setup
 Load balance multiple instances of the same model

 The proxy will handle routing requests (using LiteLLM's Router). **Set `rpm` in the config if you want maximize throughput**
@ -79,6 +79,32 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \
 '
 ```

+## Load Balancing using multiple litellm instances (Kubernetes, Auto Scaling)
+
+LiteLLM Proxy supports sharing rpm/tpm shared across multiple litellm instances, pass `redis_host`, `redis_password` and `redis_port` to enable this. (LiteLLM will use Redis to track rpm/tpm usage )
+
+Example config
+
+```yaml
+model_list:
+  - model_name: gpt-3.5-turbo
+    litellm_params:
+      model: azure/<your-deployment-name>
+      api_base: <your-azure-endpoint>
+      api_key: <your-azure-api-key>
+      rpm: 6      # Rate limit for this deployment: in requests per minute (rpm)
+  - model_name: gpt-3.5-turbo
+    litellm_params:
+      model: azure/gpt-turbo-small-ca
+      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
+      api_key: <your-azure-api-key>
+      rpm: 6
+router_settings:
+  redis_host: <your redis host>
+  redis_password: <your redis password>
+  redis_port: 1992
+```
+
 ## Router settings on config - routing_strategy, model_group_alias

 litellm.Router() settings can be set under `router_settings`. You can set `model_group_alias`, `routing_strategy`, `num_retries`,`timeout` . See all Router supported params [here](https://github.com/BerriAI/litellm/blob/1b942568897a48f014fa44618ec3ce54d7570a46/litellm/router.py#L64)