litellm-mirror/litellm-proxy
2023-10-21 14:04:28 -07:00
..
tests (fix) proxy server dockerfile 2023-10-21 13:50:48 -07:00
main.py (fix) litellm proxy for railways deploys ` 2023-10-21 14:04:28 -07:00
README.md (feat) add litellm proxy dir 2023-10-21 10:31:03 -07:00

litellm-proxy

A local, fast, and lightweight OpenAI-compatible server to call 100+ LLM APIs.

usage

$ pip install litellm
$ litellm --model ollama/codellama 

#INFO: Ollama running on http://0.0.0.0:8000

replace openai base

import openai 

openai.api_base = "http://0.0.0.0:8000"

print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))

See how to call Huggingface,Bedrock,TogetherAI,Anthropic, etc.

configure proxy

To save API Keys, change model prompt, etc. you'll need to create a local instance of it:

$ litellm --create-proxy

This will create a local project called litellm-proxy in your current directory, that has:

  • proxy_cli.py: Runs the proxy
  • proxy_server.py: Contains the API calling logic
    • /chat/completions: receives openai.ChatCompletion.create call.
    • /completions: receives openai.Completion.create call.
    • /models: receives openai.Model.list() call
  • secrets.toml: Stores your api keys, model configs, etc.

Run it by doing:

$ cd litellm-proxy
$ python proxy_cli.py --model ollama/llama # replace with your model name