LiteLLM fork

Find a file

Joel Eriksson e214e6ab47 Fix bug when iterating over lines in ollama response async for line in resp.content.iter_any() will return incomplete lines when the lines are long, and that results in an exception being thrown by json.loads() when it tries to parse the incomplete JSON The default behavior of the stream reader for aiohttp response objects is to iterate over lines, so just removing .iter_any() fixes the bug		2023-12-17 20:23:26 +02:00
.circleci	(ci/cd) yml	2023-12-12 12:33:03 -08:00
.github	Update ghcr_deploy.yml	2023-12-02 18:33:51 -08:00
cookbook	(docs) add embedding() profile	2023-11-30 19:04:51 -08:00
dist	fix(utils.py): improved togetherai exception mapping	2023-12-14 15:28:11 -08:00
docs/my-website	Merge pull request #1162 from nirga/patch-2	2023-12-16 21:31:56 -08:00
litellm	Fix bug when iterating over lines in ollama response	2023-12-17 20:23:26 +02:00
.env.example	feat: added support for OPENAI_API_BASE	2023-08-28 14:57:34 +02:00
.flake8	refactor(all-files): removing all print statements; adding pre-commit + flake8 to prevent future regressions	2023-11-04 12:50:15 -07:00
.gitattributes	ignore ipynbs	2023-08-31 16:58:54 -07:00
.gitignore	build(Dockerfile): fixing build requirements	2023-12-16 17:52:30 -08:00
.pre-commit-config.yaml	fix(proxy_server.py): fix pydantic version errors	2023-12-09 12:09:49 -08:00
docker-compose.example.yml	(docs) update docker compose docs	2023-12-06 10:37:45 +05:30
Dockerfile	build(Dockerfile): fixing build requirements	2023-12-16 17:52:30 -08:00
LICENSE	Initial commit	2023-07-26 17:09:52 -07:00
model_prices_and_context_window.json	(feat) add gemini pro vision	2023-12-16 18:35:28 +05:30
poetry.lock	changes	2023-12-12 12:02:29 +01:00
proxy_server_config.yaml	Update proxy_server_config.yaml	2023-11-30 09:30:41 -08:00
pyproject.toml	bump: version 1.15.0 → 1.15.1	2023-12-16 19:23:03 +05:30
README.md	Update README.md	2023-12-15 07:48:54 +05:30
requirements.txt	build(Dockerfile): fixing build requirements	2023-12-16 17:52:30 -08:00
template.yaml	Use -function for naming.	2023-11-23 02:09:09 -05:00

🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]

OpenAI Proxy Server

LiteLLM manages

Translating inputs to the provider's completion and embedding endpoints
Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
Load-balance across multiple deployments (e.g. Azure/OpenAI) - Router 1k+ requests/second

Usage (Docs)

Important

LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here

pip install litellm

from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 
os.environ["COHERE_API_KEY"] = "your-cohere-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)

Async (Docs)

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="gpt-3.5-turbo", messages=messages)
    return response

response = asyncio.run(test_get_response())
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

# claude 2
response = completion('claude-2', messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

OpenAI Proxy - (Docs)

LiteLLM Proxy manages:

Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
Authentication & Spend Tracking Virtual Keys

Step 1: Start litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000

Step 2: Replace openai base

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

Logging Observability (Docs)

LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack

from litellm import completion

## set env variables for logging tools
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"

os.environ["OPENAI_API_KEY"]

# set callbacks
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase

#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

Supported Provider (Docs)

Provider	Completion	Streaming	Async Completion	Async Streaming
openai	✅	✅	✅	✅
azure	✅	✅	✅	✅
aws - sagemaker	✅	✅	✅	✅
aws - bedrock	✅	✅	✅	✅
cohere	✅	✅	✅	✅
anthropic	✅	✅	✅	✅
huggingface	✅	✅	✅	✅
replicate	✅	✅	✅	✅
together_ai	✅	✅	✅	✅
openrouter	✅	✅	✅	✅
google - vertex_ai	✅	✅	✅	✅
google - palm	✅	✅	✅	✅
mistral ai api	✅	✅	✅	✅
ai21	✅	✅	✅	✅
baseten	✅	✅	✅	✅
vllm	✅	✅	✅	✅
nlp_cloud	✅	✅	✅	✅
aleph alpha	✅	✅	✅	✅
petals	✅	✅	✅	✅
ollama	✅	✅	✅	✅
deepinfra	✅	✅	✅	✅
perplexity-ai	✅	✅	✅	✅
anyscale	✅	✅	✅	✅

Read the Docs

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Step 1: Clone the repo

git clone https://github.com/BerriAI/litellm.git

Step 2: Navigate into the project, and install dependencies:

cd litellm
poetry install

Step 3: Test your change:

cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .

Step 4: Submit a PR with your changes! 🚀

push your fork to your GitHub repo
submit a PR from there

Support / talk with founders

Schedule Demo 👋
Community Discord 💭
Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
Our emails ✉️ ishaan@berri.ai / krrish@berri.ai

Why did we build this

Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

README.md Unescape Escape