LiteLLM fork
Find a file
Joel Eriksson e214e6ab47 Fix bug when iterating over lines in ollama response
async for line in resp.content.iter_any() will return
incomplete lines when the lines are long, and that
results in an exception being thrown by json.loads()
when it tries to parse the incomplete JSON

The default behavior of the stream reader for aiohttp
response objects is to iterate over lines, so just
removing .iter_any() fixes the bug
2023-12-17 20:23:26 +02:00
.circleci (ci/cd) yml 2023-12-12 12:33:03 -08:00
.github Update ghcr_deploy.yml 2023-12-02 18:33:51 -08:00
cookbook (docs) add embedding() profile 2023-11-30 19:04:51 -08:00
dist fix(utils.py): improved togetherai exception mapping 2023-12-14 15:28:11 -08:00
docs/my-website Merge pull request #1162 from nirga/patch-2 2023-12-16 21:31:56 -08:00
litellm Fix bug when iterating over lines in ollama response 2023-12-17 20:23:26 +02:00
.env.example feat: added support for OPENAI_API_BASE 2023-08-28 14:57:34 +02:00
.flake8 refactor(all-files): removing all print statements; adding pre-commit + flake8 to prevent future regressions 2023-11-04 12:50:15 -07:00
.gitattributes ignore ipynbs 2023-08-31 16:58:54 -07:00
.gitignore build(Dockerfile): fixing build requirements 2023-12-16 17:52:30 -08:00
.pre-commit-config.yaml fix(proxy_server.py): fix pydantic version errors 2023-12-09 12:09:49 -08:00
docker-compose.example.yml (docs) update docker compose docs 2023-12-06 10:37:45 +05:30
Dockerfile build(Dockerfile): fixing build requirements 2023-12-16 17:52:30 -08:00
LICENSE Initial commit 2023-07-26 17:09:52 -07:00
model_prices_and_context_window.json (feat) add gemini pro vision 2023-12-16 18:35:28 +05:30
poetry.lock changes 2023-12-12 12:02:29 +01:00
proxy_server_config.yaml Update proxy_server_config.yaml 2023-11-30 09:30:41 -08:00
pyproject.toml bump: version 1.15.0 → 1.15.1 2023-12-16 19:23:03 +05:30
README.md Update README.md 2023-12-15 07:48:54 +05:30
requirements.txt build(Dockerfile): fixing build requirements 2023-12-16 17:52:30 -08:00
template.yaml Use -function for naming. 2023-11-23 02:09:09 -05:00

🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]

OpenAI Proxy Server

PyPI Version CircleCI Y Combinator W23 Whatsapp Discord

LiteLLM manages

  • Translating inputs to the provider's completion and embedding endpoints
  • Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
  • Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
  • Load-balance across multiple deployments (e.g. Azure/OpenAI) - Router 1k+ requests/second

Usage (Docs)

Important

LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here

Open In Colab
pip install litellm
from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 
os.environ["COHERE_API_KEY"] = "your-cohere-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)

Async (Docs)

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="gpt-3.5-turbo", messages=messages)
    return response

response = asyncio.run(test_get_response())
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

# claude 2
response = completion('claude-2', messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

OpenAI Proxy - (Docs)

LiteLLM Proxy manages:

  • Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
  • Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
  • Authentication & Spend Tracking Virtual Keys

Step 1: Start litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000

Step 2: Replace openai base

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

Logging Observability (Docs)

LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack

from litellm import completion

## set env variables for logging tools
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"

os.environ["OPENAI_API_KEY"]

# set callbacks
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase

#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

Supported Provider (Docs)

Provider Completion Streaming Async Completion Async Streaming
openai
azure
aws - sagemaker
aws - bedrock
cohere
anthropic
huggingface
replicate
together_ai
openrouter
google - vertex_ai
google - palm
mistral ai api
ai21
baseten
vllm
nlp_cloud
aleph alpha
petals
ollama
deepinfra
perplexity-ai
anyscale

Read the Docs

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Step 1: Clone the repo

git clone https://github.com/BerriAI/litellm.git

Step 2: Navigate into the project, and install dependencies:

cd litellm
poetry install

Step 3: Test your change:

cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .

Step 4: Submit a PR with your changes! 🚀

  • push your fork to your GitHub repo
  • submit a PR from there

Support / talk with founders

Why did we build this

  • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

Contributors