add batch completion rate limits docs

This commit is contained in:
ishaan-jaff 2023-10-04 14:38:00 -07:00
parent 69a0a775f8
commit c4a595d352

View file

@ -1,9 +1,36 @@
# Batching Completion() Calls
# Batching Completion(), Handling Rate Limits
LiteLLM allows you to:
* Send many completion calls to 1 model
* Send many completion calls to 1 model [while handling rate limits]
* Send 1 completion call to many models: Return Fastest Response
* Send 1 completion call to many models: Return All Responses
## Handling Rate Limits with batch completion
## Batch Completion with only 1 model
### Usage
```python
import asyncio
from litellm import batch_completion_rate_limits
# kwargs to litellm.completion
jobs = [
{"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*500, "role": "user"}]},
{"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*800, "role": "user"}]},
{"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*900, "role": "user"}]},
{"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*900, "role": "user"}]},
{"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*900, "role": "user"}]}
]
asyncio.run(
batch_completion_rate_limits(
jobs = jobs,
api_key=os.environ['OPENAI_API_KEY'], # pass your api key for your selected model
max_requests_per_minute=60,
max_tokens_per_minute=40000
)
)
```
## Send multiple completion calls to 1 model
In the batch_completion method, you provide a list of `messages` where each sub-list of messages is passed to `litellm.completion()`, allowing you to process multiple prompts efficiently in a single API call.