(docs) simple proxy

This commit is contained in:
ishaan-jaff 2023-11-29 16:44:39 -08:00
parent c2f642dbec
commit 69eca78000

View file

@ -460,12 +460,10 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \
``` ```
### Load Balancing - Multiple Instances of 1 model ### Load Balancing - Multiple Instances of 1 model
**LiteLLM Proxy can handle 1k+ requests/second**. Use this config to load balance between multiple instances of the same model. Use this config to load balance between multiple instances of the same model. The proxy will handle routing requests (using LiteLLM's Router). **Set `rpm` in the config if you want maximize throughput**
The proxy will handle routing requests (using LiteLLM's Router).
In the config below requests with `model=gpt-3.5-turbo` will be routed across multiple instances of `azure/gpt-3.5-turbo`
#### Example config
requests with `model=gpt-3.5-turbo` will be routed across multiple instances of `azure/gpt-3.5-turbo`
```yaml ```yaml
model_list: model_list:
- model_name: gpt-3.5-turbo - model_name: gpt-3.5-turbo