import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# LiteLLM - Getting Started
https://github.com/BerriAI/litellm
## **Call 100+ LLMs using the OpenAI Input/Output Format**
- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
- Track spend & set budgets per project [LiteLLM Proxy Server](https://docs.litellm.ai/docs/simple_proxy)
## How to use LiteLLM
You can use litellm through either:
1. [LiteLLM Proxy Server](#litellm-proxy-server-llm-gateway) - Server (LLM Gateway) to call 100+ LLMs, load balance, cost tracking across projects
2. [LiteLLM python SDK](#basic-usage) - Python Client to call 100+ LLMs, load balance, cost tracking
### **When to use LiteLLM Proxy Server (LLM Gateway)**
:::tip
Use LiteLLM Proxy Server if you want a **central service (LLM Gateway) to access multiple LLMs**
Typically used by Gen AI Enablement / ML PLatform Teams
:::
- LiteLLM Proxy gives you a unified interface to access multiple LLMs (100+ LLMs)
- Track LLM Usage and setup guardrails
- Customize Logging, Guardrails, Caching per project
### **When to use LiteLLM Python SDK**
:::tip
Use LiteLLM Python SDK if you want to use LiteLLM in your **python code**
Typically used by developers building llm projects
:::
- LiteLLM SDK gives you a unified interface to access multiple LLMs (100+ LLMs)
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
## **LiteLLM Python SDK**
### Basic usage
```shell
pip install litellm
```
```python
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model="gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
```
```python
from litellm import completion
import os
## set ENV variables
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = completion(
model="claude-2",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
```
```python
from litellm import completion
import os
# auth: run 'gcloud auth application-default'
os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
os.environ["VERTEX_LOCATION"] = "us-central1"
response = completion(
model="chat-bison",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
```
```python
from litellm import completion
import os
os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(
model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://my-endpoint.huggingface.cloud"
)
print(response)
```
```python
from litellm import completion
import os
## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""
# azure call
response = completion(
"azure/",
messages = [{ "content": "Hello, how are you?","role": "user"}]
)
```
```python
from litellm import completion
response = completion(
model="ollama/llama2",
messages = [{ "content": "Hello, how are you?","role": "user"}],
api_base="http://localhost:11434"
)
```
```python
from litellm import completion
import os
## set ENV variables
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
response = completion(
model="openrouter/google/palm-2-chat-bison",
messages = [{ "content": "Hello, how are you?","role": "user"}],
)
```
### Streaming
Set `stream=True` in the `completion` args.
```python
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model="gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
```
```python
from litellm import completion
import os
## set ENV variables
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = completion(
model="claude-2",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
```
```python
from litellm import completion
import os
# auth: run 'gcloud auth application-default'
os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
os.environ["VERTEX_LOCATION"] = "us-central1"
response = completion(
model="chat-bison",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
```
```python
from litellm import completion
import os
os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(
model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://my-endpoint.huggingface.cloud",
stream=True,
)
print(response)
```
```python
from litellm import completion
import os
## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""
# azure call
response = completion(
"azure/",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
```
```python
from litellm import completion
response = completion(
model="ollama/llama2",
messages = [{ "content": "Hello, how are you?","role": "user"}],
api_base="http://localhost:11434",
stream=True,
)
```
```python
from litellm import completion
import os
## set ENV variables
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
response = completion(
model="openrouter/google/palm-2-chat-bison",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
```
### Exception handling
LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
```python
from openai.error import OpenAIError
from litellm import completion
os.environ["ANTHROPIC_API_KEY"] = "bad-key"
try:
# some code
completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
except OpenAIError as e:
print(e)
```
### Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks))
LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, Helicone, Promptlayer, Traceloop, Slack
```python
from litellm import completion
## set env variables for logging tools
os.environ["HELICONE_API_KEY"] = "your-helicone-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["OPENAI_API_KEY"]
# set callbacks
litellm.success_callback = ["lunary", "langfuse", "helicone"] # log input/output to lunary, langfuse, supabase, helicone
#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
```
### Track Costs, Usage, Latency for streaming
Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
```python
import litellm
# track_cost_callback
def track_cost_callback(
kwargs, # kwargs to completion
completion_response, # response from completion
start_time, end_time # start/end time
):
try:
response_cost = kwargs.get("response_cost", 0)
print("streaming response_cost", response_cost)
except:
pass
# set callback
litellm.success_callback = [track_cost_callback] # set custom callback function
# litellm.completion() call
response = completion(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Hi 👋 - i'm openai"
}
],
stream=True
)
```
## **LiteLLM Proxy Server (LLM Gateway)**
Track spend across multiple projects/people

The proxy provides:
1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth)
2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class)
3. [Cost tracking](https://docs.litellm.ai/docs/proxy/virtual_keys#tracking-spend)
4. [Rate Limiting](https://docs.litellm.ai/docs/proxy/users#set-rate-limits)
### 📖 Proxy Endpoints - [Swagger Docs](https://litellm-api.up.railway.app/)
Go here for a complete tutorial with keys + rate limits - [**here**](./proxy/docker_quick_start.md)
### Quick Start Proxy - CLI
```shell
pip install 'litellm[proxy]'
```
#### Step 1: Start litellm proxy
```shell
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:4000
```
Step 1. CREATE config.yaml
Example `litellm_config.yaml`
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/
api_base: os.environ/AZURE_API_BASE # runs os.getenv("AZURE_API_BASE")
api_key: os.environ/AZURE_API_KEY # runs os.getenv("AZURE_API_KEY")
api_version: "2023-07-01-preview"
```
Step 2. RUN Docker Image
```shell
docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug
```
#### Step 2: Make ChatCompletions Request to Proxy
```python
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
```
## More details
- [exception mapping](./exception_mapping.md)
- [retries + model fallbacks for completion()](./completion/reliable_completions.md)
- [proxy virtual keys & spend management](./proxy/virtual_keys.md)
- [E2E Tutorial for LiteLLM Proxy Server](./proxy/docker_quick_start.md)