async for line in resp.content.iter_any() will return incomplete lines when the lines are long, and that results in an exception being thrown by json.loads() when it tries to parse the incomplete JSON The default behavior of the stream reader for aiohttp response objects is to iterate over lines, so just removing .iter_any() fixes the bug |
||
---|---|---|
.circleci | ||
.github | ||
cookbook | ||
dist | ||
docs/my-website | ||
litellm | ||
.env.example | ||
.flake8 | ||
.gitattributes | ||
.gitignore | ||
.pre-commit-config.yaml | ||
docker-compose.example.yml | ||
Dockerfile | ||
LICENSE | ||
model_prices_and_context_window.json | ||
poetry.lock | ||
proxy_server_config.yaml | ||
pyproject.toml | ||
README.md | ||
requirements.txt | ||
template.yaml |
🚅 LiteLLM
Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]
OpenAI Proxy Server
LiteLLM manages
- Translating inputs to the provider's
completion
andembedding
endpoints - Guarantees consistent output, text responses will always be available at
['choices'][0]['message']['content']
- Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
- Load-balance across multiple deployments (e.g. Azure/OpenAI) -
Router
1k+ requests/second
Usage (Docs)
Important
LiteLLM v1.0.0 now requires
openai>=1.0.0
. Migration guide here
pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)
Async (Docs)
from litellm import acompletion
import asyncio
async def test_get_response():
user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]
response = await acompletion(model="gpt-3.5-turbo", messages=messages)
return response
response = asyncio.run(test_get_response())
print(response)
Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True
to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
# claude 2
response = completion('claude-2', messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
OpenAI Proxy - (Docs)
LiteLLM Proxy manages:
- Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
- Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
- Authentication & Spend Tracking Virtual Keys
Step 1: Start litellm proxy
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:8000
Step 2: Replace openai base
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
Logging Observability (Docs)
LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack
from litellm import completion
## set env variables for logging tools
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
os.environ["OPENAI_API_KEY"]
# set callbacks
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase
#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
Supported Provider (Docs)
Provider | Completion | Streaming | Async Completion | Async Streaming |
---|---|---|---|---|
openai | ✅ | ✅ | ✅ | ✅ |
azure | ✅ | ✅ | ✅ | ✅ |
aws - sagemaker | ✅ | ✅ | ✅ | ✅ |
aws - bedrock | ✅ | ✅ | ✅ | ✅ |
cohere | ✅ | ✅ | ✅ | ✅ |
anthropic | ✅ | ✅ | ✅ | ✅ |
huggingface | ✅ | ✅ | ✅ | ✅ |
replicate | ✅ | ✅ | ✅ | ✅ |
together_ai | ✅ | ✅ | ✅ | ✅ |
openrouter | ✅ | ✅ | ✅ | ✅ |
google - vertex_ai | ✅ | ✅ | ✅ | ✅ |
google - palm | ✅ | ✅ | ✅ | ✅ |
mistral ai api | ✅ | ✅ | ✅ | ✅ |
ai21 | ✅ | ✅ | ✅ | ✅ |
baseten | ✅ | ✅ | ✅ | ✅ |
vllm | ✅ | ✅ | ✅ | ✅ |
nlp_cloud | ✅ | ✅ | ✅ | ✅ |
aleph alpha | ✅ | ✅ | ✅ | ✅ |
petals | ✅ | ✅ | ✅ | ✅ |
ollama | ✅ | ✅ | ✅ | ✅ |
deepinfra | ✅ | ✅ | ✅ | ✅ |
perplexity-ai | ✅ | ✅ | ✅ | ✅ |
anyscale | ✅ | ✅ | ✅ | ✅ |
Contributing
To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.
Here's how to modify the repo locally: Step 1: Clone the repo
git clone https://github.com/BerriAI/litellm.git
Step 2: Navigate into the project, and install dependencies:
cd litellm
poetry install
Step 3: Test your change:
cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .
Step 4: Submit a PR with your changes! 🚀
- push your fork to your GitHub repo
- submit a PR from there
Support / talk with founders
- Schedule Demo 👋
- Community Discord 💭
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
Why did we build this
- Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.