docs(routing.md): add max parallel requests to router docs

This commit is contained in:
Krrish Dholakia 2024-04-29 15:37:48 -07:00
parent a978f2d881
commit cef2d95bb4

View file

@ -443,6 +443,35 @@ asyncio.run(router_acompletion())
## Basic Reliability ## Basic Reliability
### Max Parallel Requests (ASYNC)
Used in semaphore for async requests on router. Limit the max concurrent calls made to a deployment. Useful in high-traffic scenarios.
If tpm/rpm is set, and no max parallel request limit given, we use the RPM or calculated RPM (tpm/1000/6) as the max parallel request limit.
```python
from litellm import Router
model_list = [{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4",
...
"max_parallel_requests": 10 # 👈 SET PER DEPLOYMENT
}
}]
### OR ###
router = Router(model_list=model_list, default_max_parallel_requests=20) # 👈 SET DEFAULT MAX PARALLEL REQUESTS
# deployment max parallel requests > default max parallel requests
```
[**See Code**](https://github.com/BerriAI/litellm/blob/a978f2d8813c04dad34802cb95e0a0e35a3324bc/litellm/utils.py#L5605)
### Timeouts ### Timeouts
The timeout set in router is for the entire length of the call, and is passed down to the completion() call level as well. The timeout set in router is for the entire length of the call, and is passed down to the completion() call level as well.