forked from phoenix/litellm-mirror
docs(routing.md): add max parallel requests to router docs
This commit is contained in:
parent
a978f2d881
commit
cef2d95bb4
1 changed files with 29 additions and 0 deletions
|
@ -443,6 +443,35 @@ asyncio.run(router_acompletion())
|
||||||
|
|
||||||
## Basic Reliability
|
## Basic Reliability
|
||||||
|
|
||||||
|
### Max Parallel Requests (ASYNC)
|
||||||
|
|
||||||
|
Used in semaphore for async requests on router. Limit the max concurrent calls made to a deployment. Useful in high-traffic scenarios.
|
||||||
|
|
||||||
|
If tpm/rpm is set, and no max parallel request limit given, we use the RPM or calculated RPM (tpm/1000/6) as the max parallel request limit.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
from litellm import Router
|
||||||
|
|
||||||
|
model_list = [{
|
||||||
|
"model_name": "gpt-4",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "azure/gpt-4",
|
||||||
|
...
|
||||||
|
"max_parallel_requests": 10 # 👈 SET PER DEPLOYMENT
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
|
||||||
|
### OR ###
|
||||||
|
|
||||||
|
router = Router(model_list=model_list, default_max_parallel_requests=20) # 👈 SET DEFAULT MAX PARALLEL REQUESTS
|
||||||
|
|
||||||
|
|
||||||
|
# deployment max parallel requests > default max parallel requests
|
||||||
|
```
|
||||||
|
|
||||||
|
[**See Code**](https://github.com/BerriAI/litellm/blob/a978f2d8813c04dad34802cb95e0a0e35a3324bc/litellm/utils.py#L5605)
|
||||||
|
|
||||||
### Timeouts
|
### Timeouts
|
||||||
|
|
||||||
The timeout set in router is for the entire length of the call, and is passed down to the completion() call level as well.
|
The timeout set in router is for the entire length of the call, and is passed down to the completion() call level as well.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue