🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]

OpenAI Proxy Server

PyPI Version CircleCI Y Combinator W23 Whatsapp Discord

LiteLLM manages: - Translate inputs to provider's `completion` and `embedding` endpoints - [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']` - Load-balance multiple deployments (e.g. Azure/OpenAI) - `Router` **1k+ requests/second** [**Jump to OpenAI Proxy Docs**](https://github.com/BerriAI/litellm?tab=readme-ov-file#openai-proxy---docs) # Installation 🚀 > [!IMPORTANT] > LiteLLM v1.0.0 now requires `openai>=1.0.0`. Migration guide [here](https://docs.litellm.ai/docs/migration) Open In Colab ``` pip install litellm ``` # Usage ([**Docs**](https://docs.litellm.ai/docs/)) ```python from litellm import completion import os ## set ENV variables os.environ["OPENAI_API_KEY"] = "your-openai-key" os.environ["COHERE_API_KEY"] = "your-cohere-key" messages = [{ "content": "Hello, how are you?","role": "user"}] # openai call response = completion(model="gpt-3.5-turbo", messages=messages) # cohere call response = completion(model="command-nightly", messages=messages) print(response) ``` ## Async ([Docs](https://docs.litellm.ai/docs/completion/stream#async-completion)) ```python from litellm import acompletion import asyncio async def test_get_response(): user_message = "Hello, how are you?" messages = [{"content": user_message, "role": "user"}] response = await acompletion(model="gpt-3.5-turbo", messages=messages) return response response = asyncio.run(test_get_response()) print(response) ``` ## Streaming ([Docs](https://docs.litellm.ai/docs/completion/stream)) liteLLM supports streaming the model response back, pass `stream=True` to get a streaming iterator in response. Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.) ```python from litellm import completion response = completion(model="gpt-3.5-turbo", messages=messages, stream=True) for part in response: print(part.choices[0].delta.content or "") # claude 2 response = completion('claude-2', messages, stream=True) for part in response: print(part.choices[0].delta.content or "") ``` ## Logging Observability ([Docs](https://docs.litellm.ai/docs/observability/callbacks)) LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack ```python from litellm import completion ## set env variables for logging tools os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id" os.environ["OPENAI_API_KEY"] # set callbacks litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase #openai call response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}]) ``` # OpenAI Proxy - ([Docs](https://docs.litellm.ai/docs/simple_proxy)) Track spend across multiple projects/people. ### Step 1: Start litellm proxy ```shell $ litellm --model huggingface/bigcode/starcoder #INFO: Proxy running on http://0.0.0.0:8000 ``` ### Step 2: Replace openai base ```python import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url # request sent to model set on litellm proxy, `litellm --model` response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ]) print(response) ``` ## Supported Provider ([Docs](https://docs.litellm.ai/docs/providers)) | Provider | [Completion](https://docs.litellm.ai/docs/#basic-usage) | [Streaming](https://docs.litellm.ai/docs/completion/stream#streaming-responses) | [Async Completion](https://docs.litellm.ai/docs/completion/stream#async-completion) | [Async Streaming](https://docs.litellm.ai/docs/completion/stream#async-streaming) | | ------------- | ------------- | ------------- | ------------- | ------------- | | [openai](https://docs.litellm.ai/docs/providers/openai) | ✅ | ✅ | ✅ | ✅ | | [azure](https://docs.litellm.ai/docs/providers/azure) | ✅ | ✅ | ✅ | ✅ | | [aws - sagemaker](https://docs.litellm.ai/docs/providers/aws_sagemaker) | ✅ | ✅ | ✅ | ✅ | | [aws - bedrock](https://docs.litellm.ai/docs/providers/bedrock) | ✅ | ✅ | ✅ | ✅ | | [cohere](https://docs.litellm.ai/docs/providers/cohere) | ✅ | ✅ | ✅ | ✅ | | [anthropic](https://docs.litellm.ai/docs/providers/anthropic) | ✅ | ✅ | ✅ | ✅ | | [huggingface](https://docs.litellm.ai/docs/providers/huggingface) | ✅ | ✅ | ✅ | ✅ | | [replicate](https://docs.litellm.ai/docs/providers/replicate) | ✅ | ✅ | ✅ | ✅ | | [together_ai](https://docs.litellm.ai/docs/providers/togetherai) | ✅ | ✅ | ✅ | ✅ | | [openrouter](https://docs.litellm.ai/docs/providers/openrouter) | ✅ | ✅ | ✅ | ✅ | | [google - vertex_ai](https://docs.litellm.ai/docs/providers/vertex) | ✅ | ✅ | ✅ | ✅ | | [google - palm](https://docs.litellm.ai/docs/providers/palm) | ✅ | ✅ | ✅ | ✅ | | [mistral ai api](https://docs.litellm.ai/docs/providers/mistral) | ✅ | ✅ | ✅ | ✅ | | [ai21](https://docs.litellm.ai/docs/providers/ai21) | ✅ | ✅ | ✅ | ✅ | | [baseten](https://docs.litellm.ai/docs/providers/baseten) | ✅ | ✅ | ✅ | ✅ | | [vllm](https://docs.litellm.ai/docs/providers/vllm) | ✅ | ✅ | ✅ | ✅ | | [nlp_cloud](https://docs.litellm.ai/docs/providers/nlp_cloud) | ✅ | ✅ | ✅ | ✅ | | [aleph alpha](https://docs.litellm.ai/docs/providers/aleph_alpha) | ✅ | ✅ | ✅ | ✅ | | [petals](https://docs.litellm.ai/docs/providers/petals) | ✅ | ✅ | ✅ | ✅ | | [ollama](https://docs.litellm.ai/docs/providers/ollama) | ✅ | ✅ | ✅ | ✅ | | [deepinfra](https://docs.litellm.ai/docs/providers/deepinfra) | ✅ | ✅ | ✅ | ✅ | | [perplexity-ai](https://docs.litellm.ai/docs/providers/perplexity) | ✅ | ✅ | ✅ | ✅ | | [anyscale](https://docs.litellm.ai/docs/providers/anyscale) | ✅ | ✅ | ✅ | ✅ | [**Read the Docs**](https://docs.litellm.ai/docs/) ## Contributing To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change. Here's how to modify the repo locally: Step 1: Clone the repo ``` git clone https://github.com/BerriAI/litellm.git ``` Step 2: Navigate into the project, and install dependencies: ``` cd litellm poetry install ``` Step 3: Test your change: ``` cd litellm/tests # pwd: Documents/litellm/litellm/tests poetry run flake8 poetry run pytest . ``` Step 4: Submit a PR with your changes! 🚀 - push your fork to your GitHub repo - submit a PR from there # Support / talk with founders - [Schedule Demo 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) - [Community Discord 💭](https://discord.gg/wuPM9dRgDw) - Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬ - Our emails ✉️ ishaan@berri.ai / krrish@berri.ai # Why did we build this - **Need for simplicity**: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere. # Contributors