mirror of https://github.com/BerriAI/litellm.git synced 2025-04-24 18:24:20 +00:00

History

Krish Dholakia 8a599e8053 Update README.md		2023-10-28 15:41:27 -07:00
..
tests	refactor(openai_proxy-->-litellm_server): renaming project for simplicity	2023-10-25 14:14:39 -07:00
.env.template	refactor(openai_proxy-->-litellm_server): renaming project for simplicity	2023-10-25 14:14:39 -07:00
__init__.py	refactor(openai_proxy-->-litellm_server): renaming project for simplicity	2023-10-25 14:14:39 -07:00
Dockerfile	refactor(openai_proxy-->-litellm_server): renaming project for simplicity	2023-10-25 14:14:39 -07:00
main.py	build(litellm_server/main.py): azure bug fixes	2023-10-28 14:12:51 -07:00
openapi.json	(fix) proxy server openapi.json and main.py	2023-10-25 15:01:29 -07:00
README.md	Update README.md	2023-10-28 15:41:27 -07:00
requirements.txt	refactor(openai_proxy-->-litellm_server): renaming project for simplicity	2023-10-25 14:14:39 -07:00
utils.py	build(litellm_server/main.py): build fixes	2023-10-28 13:52:05 -07:00

README.md

litellm-server

A simple, fast, and lightweight OpenAI-compatible server to call 100+ LLM APIs.

LiteLLM Server supports:

LLM API Calls in the OpenAI ChatCompletions format
Caching + Logging capabilities (Redis and Langfuse, respectively)
Setting API keys in the request headers or in the .env

Usage

docker run -e PORT=8000 -e OPENAI_API_KEY=<your-openai-key> -p 8000:8000 ghcr.io/berriai/litellm:latest

OpenAI Proxy running on http://0.0.0.0:8000

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

See how to call Huggingface,Bedrock,TogetherAI,Anthropic, etc.

Endpoints:

/chat/completions - chat completions endpoint to call 100+ LLMs
/models - available models on server

Save Model-specific params (API Base, API Keys, Temperature, etc.)

Use the router_config_template.yaml to save model-specific information like api_base, api_key, temperature, max_tokens, etc.

Create a config.yaml file

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
      model: azure/chatgpt-v-2 # azure/<your-deployment-name>
      api_key: your_azure_api_key
      api_version: your_azure_api_version
      api_base: your_azure_api_base
  - model_name: mistral-7b
    litellm_params:
      model: ollama/mistral
      api_base: your_ollama_api_base

Start the server

docker run -e PORT=8000 -p 8000:8000 -v $(pwd)/config.yaml:/app/config.yaml ghcr.io/berriai/litellm:latest

Caching

Add Redis Caching to your server via environment variables

### REDIS
REDIS_HOST = "" 
REDIS_PORT = "" 
REDIS_PASSWORD = ""

Docker command:

docker run -e REDIST_HOST=<your-redis-host> -e REDIS_PORT=<your-redis-port> -e REDIS_PASSWORD=<your-redis-password> -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest

Logging

Debug Logs Print the input/output params by setting SET_VERBOSE = "True".

Docker command:

docker run -e SET_VERBOSE="True" -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest

Add Langfuse Logging to your server via environment variables

### LANGFUSE
LANGFUSE_PUBLIC_KEY = ""
LANGFUSE_SECRET_KEY = ""
# Optional, defaults to https://cloud.langfuse.com
LANGFUSE_HOST = "" # optional

Docker command:

docker run -e LANGFUSE_PUBLIC_KEY=<your-public-key> -e LANGFUSE_SECRET_KEY=<your-secret-key> -e LANGFUSE_HOST=<your-langfuse-host> -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest

Running Locally

$ git clone https://github.com/BerriAI/litellm.git

$ cd ./litellm/litellm_server

$ uvicorn main:app --host 0.0.0.0 --port 8000

Custom Config

Create + Modify router_config.yaml (save your azure/openai/etc. deployment info)

cp ./router_config_template.yaml ./router_config.yaml

Build Docker Image

docker build -t litellm_server . --build-arg CONFIG_FILE=./router_config.yaml

Run Docker Image

docker run --name litellm_server -e PORT=8000 -p 8000:8000 litellm_server