diff --git a/docs/my-website/docs/routing.md b/docs/my-website/docs/routing.md index 76d3acb7b..028b40b6f 100644 --- a/docs/my-website/docs/routing.md +++ b/docs/my-website/docs/routing.md @@ -443,6 +443,35 @@ asyncio.run(router_acompletion()) ## Basic Reliability +### Max Parallel Requests (ASYNC) + +Used in semaphore for async requests on router. Limit the max concurrent calls made to a deployment. Useful in high-traffic scenarios. + +If tpm/rpm is set, and no max parallel request limit given, we use the RPM or calculated RPM (tpm/1000/6) as the max parallel request limit. + + +```python +from litellm import Router + +model_list = [{ + "model_name": "gpt-4", + "litellm_params": { + "model": "azure/gpt-4", + ... + "max_parallel_requests": 10 # 👈 SET PER DEPLOYMENT + } +}] + +### OR ### + +router = Router(model_list=model_list, default_max_parallel_requests=20) # 👈 SET DEFAULT MAX PARALLEL REQUESTS + + +# deployment max parallel requests > default max parallel requests +``` + +[**See Code**](https://github.com/BerriAI/litellm/blob/a978f2d8813c04dad34802cb95e0a0e35a3324bc/litellm/utils.py#L5605) + ### Timeouts The timeout set in router is for the entire length of the call, and is passed down to the completion() call level as well.