docs(rate limit aware acompletion calls): docs

2025-04-27 03:34:10 +00:00 · 2023-10-06 20:48:51 -07:00 · 2023-10-06 20:48:51 -07:00 · d42cd8fa07
commit d42cd8fa07
parent 498f9aece6
1 changed files with 55 additions and 1 deletions
--- a/docs/my-website/docs/rate_limit_manager.md
+++ b/docs/my-website/docs/rate_limit_manager.md
@ -4,7 +4,61 @@ import TabItem from '@theme/TabItem';
 # Rate Limit Manager
 `RateLimitManager` allows you to maximize throughput while staying under rate limits. You can use RateLimitManager to submit a batch of completion jobs to execute
-## Quick start 
+## Rate Limit Aware - acompletion()
 ### Usage
 ```python
 handler = RateLimitManager(
    max_requests_per_minute = 60,
    max_tokens_per_minute = 20000
 )
 response =  await handler.acompletion(
    model="gpt-3.5-turbo", 
    messages=[{
        "content": "Please provide a summary of the latest scientific discoveries."*10, 
        "role": "user"
    }]
 )
 ```
 ### Using Rate Limit Aware Completion to make 5 async calls
 ```python
 import asyncio
 from litellm import RateLimitManager
 ## init RateLimitManager
 handler = RateLimitManager(
    max_requests_per_minute = 60,
    max_tokens_per_minute = 200
 )
 # helper 
 async def send_request():
    response =  await handler.acompletion(
        model="gpt-3.5-turbo", 
        messages=[{
            "content": "Please provide a summary of the latest scientific discoveries."*10, 
            "role": "user"
        }]
    )
    print("got a response", response)
    return response
 # creating async tasks
 tasks = []
 for _ in range(4):
    tasks.append(send_request())
 responses = await asyncio.gather(*tasks)
 for response in responses:
    print(response)
 ```
 ## Batch Completions
 ### Usage
 ```python 
 import asyncio