diff --git a/docs/my-website/docs/proxy/load_balancing.md b/docs/my-website/docs/proxy/load_balancing.md index e2e3a7ee6..ee861f94d 100644 --- a/docs/my-website/docs/proxy/load_balancing.md +++ b/docs/my-website/docs/proxy/load_balancing.md @@ -72,6 +72,31 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \ ' ``` +## Router settings on config.yaml - routing_strategy, model_group_alias + +litellm.Router() settings can be set under `router_settings`. You can set `model_group_alias`, `routing_strategy`, `num_retries`,`timeout` . See all Router supported params [here](https://github.com/BerriAI/litellm/blob/1b942568897a48f014fa44618ec3ce54d7570a46/litellm/router.py#L64) + +Example config with `router_settings` +```yaml +model_list: + - model_name: gpt-3.5-turbo + litellm_params: + model: azure/ + api_base: + api_key: + rpm: 6 # Rate limit for this deployment: in requests per minute (rpm) + - model_name: gpt-3.5-turbo + litellm_params: + model: azure/gpt-turbo-small-ca + api_base: https://my-endpoint-canada-berri992.openai.azure.com/ + api_key: + rpm: 6 +router_settings: + model_group_alias: {"gpt-4": "gpt-3.5-turbo"} # all requests with `gpt-4` will be routed to models with `gpt-3.5-turbo` + routing_strategy: least-busy # Literal["simple-shuffle", "least-busy", "usage-based-routing", "latency-based-routing"] + num_retries: 2 + timeout: 30 # 30 seconds +``` ## Fallbacks + Cooldowns + Retries + Timeouts