add batch completion rate limits docs

2023-10-04 14:38:00 -07:00 · 2023-10-04 14:38:00 -07:00 · c4a595d352
commit c4a595d352
parent 69a0a775f8
1 changed files with 29 additions and 2 deletions
--- a/docs/my-website/docs/completion/batching.md
+++ b/docs/my-website/docs/completion/batching.md
@ -1,9 +1,36 @@
-# Batching Completion() Calls 
+# Batching Completion(), Handling Rate Limits
 LiteLLM allows you to:
-* Send many completion calls to 1 model
+* Send many completion calls to 1 model [while handling rate limits]
 * Send 1 completion call to many models: Return Fastest Response
 * Send 1 completion call to many models: Return All Responses

+## Handling Rate Limits with batch completion
+## Batch Completion with only 1 model
+### Usage 
+```python
+import asyncio
+from litellm import batch_completion_rate_limits
+
+# kwargs to litellm.completion
+jobs = [
+    {"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*500, "role": "user"}]},
+    {"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*800, "role": "user"}]},
+    {"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*900, "role": "user"}]},
+    {"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*900, "role": "user"}]},
+    {"model": "gpt-4", "messages": [{"content": "Please provide a summary of the latest scientific discoveries."*900, "role": "user"}]}
+]
+
+asyncio.run(
+        batch_completion_rate_limits(
+            jobs = jobs,
+            api_key=os.environ['OPENAI_API_KEY'], # pass your api key for your selected model
+            max_requests_per_minute=60,
+            max_tokens_per_minute=40000
+        )
+)
+
+```
+
 ## Send multiple completion calls to 1 model

 In the batch_completion method, you provide a list of `messages` where each sub-list of messages is passed to `litellm.completion()`, allowing you to process multiple prompts efficiently in a single API call.