docs(configs.md): added docs on how to configure routing strategy on proxy

2023-12-09 16:31:31 -08:00 · 2023-12-09 16:31:31 -08:00 · 149c1b3557
commit 149c1b3557
parent 6ef0e8485e
1 changed files with 25 additions and 1 deletions
--- a/docs/my-website/docs/proxy/configs.md
+++ b/docs/my-website/docs/proxy/configs.md
@ -8,7 +8,8 @@ Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`m
 | Param Name           | Description                                                   |
 |----------------------|---------------------------------------------------------------|
 | `model_list`         | List of supported models on the server, with model-specific configs |
-| `litellm_settings`   | litellm Module settings, example `litellm.drop_params=True`, `litellm.set_verbose=True`, `litellm.api_base`, `litellm.cache` |
+| `router_settings`   | litellm Router settings, example `routing_strategy="least-busy"` [**see all**](https://github.com/BerriAI/litellm/blob/6ef0e8485e0e720c0efa6f3075ce8119f2f62eea/litellm/router.py#L64)|
+| `litellm_settings`   | litellm Module settings, example `litellm.drop_params=True`, `litellm.set_verbose=True`, `litellm.api_base`, `litellm.cache` [**see all**](https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py)|
 | `general_settings`   | Server settings, example setting `master_key: sk-my_special_key` |
 | `environment_variables`   | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` |

@ -306,6 +307,29 @@ model_list:
 $ litellm --config /path/to/config.yaml
 ```

+## Router Settings 
+
+Use this to configure things like routing strategy. 
+
+```yaml
+router_settings:
+  routing_strategy: "least-busy"
+
+model_list: # will route requests to the least busy ollama model
+  - model_name: ollama-models
+    litellm_params: 
+      model: "ollama/mistral"
+      api_base: "http://127.0.0.1:8001"
+  - model_name: ollama-models
+    litellm_params: 
+      model: "ollama/codellama"
+      api_base: "http://127.0.0.1:8002"
+  - model_name: ollama-models
+    litellm_params: 
+      model: "ollama/llama2"
+      api_base: "http://127.0.0.1:8003"
+```
+
 ## Max Parallel Requests

 To rate limit a user based on the number of parallel requests, e.g.: