diff --git a/docs/my-website/docs/proxy_server.md b/docs/my-website/docs/proxy_server.md index ab976514e..13fe5243c 100644 --- a/docs/my-website/docs/proxy_server.md +++ b/docs/my-website/docs/proxy_server.md @@ -6,46 +6,23 @@ import TabItem from '@theme/TabItem'; CLI Tool to create a LLM Proxy Server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.) 100+ models [Provider List](https://docs.litellm.ai/docs/providers). ## Quick start -Call Huggingface models through your OpenAI proxy. +Call Ollama models through your OpenAI proxy. ### Start Proxy ```shell $ pip install litellm ``` ```shell -$ litellm --model huggingface/bigcode/starcoder +$ litellm --model ollama/llama2 #INFO: Uvicorn running on http://0.0.0.0:8000 ``` This will host a local proxy api at: **http://0.0.0.0:8000** -### Test Proxy -Make a test ChatCompletion Request to your proxy - - - -```shell -litellm --test http://0.0.0.0:8000 -``` - - - - -```python -import openai - -openai.api_base = "http://0.0.0.0:8000" - -print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}])) -``` - - - - - -```curl -curl --location 'http://0.0.0.0:8000/chat/completions' \ +Let's see if it works +```shell +$ curl --location 'http://0.0.0.0:8000/chat/completions' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ @@ -56,20 +33,32 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \ ], }' ``` - - + +### Replace OpenAI Base + +```python +import openai + +openai.api_base = "http://0.0.0.0:8000" + +print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}])) +``` #### Other supported models: - + +Assuming you're running vllm locally ```shell -$ export ANTHROPIC_API_KEY=my-api-key -$ litellm --model claude-instant-1 +$ litellm --model vllm/facebook/opt-125m ``` - + +```shell +$ litellm --model openai/ --api_base +``` + ```shell @@ -77,6 +66,14 @@ $ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL] $ litellm --model claude-instant-1 ``` + + + +```shell +$ export ANTHROPIC_API_KEY=my-api-key +$ litellm --model claude-instant-1 +``` + @@ -120,9 +117,8 @@ $ litellm --model palm/chat-bison ```shell $ export AZURE_API_KEY=my-api-key $ export AZURE_API_BASE=my-api-base -$ export AZURE_API_VERSION=my-api-version -$ litellm --model azure/my-deployment-id +$ litellm --model azure/my-deployment-name ``` @@ -149,8 +145,23 @@ $ litellm --model command-nightly [**Jump to Code**](https://github.com/BerriAI/litellm/blob/fef4146396d5d87006259e00095a62e3900d6bb4/litellm/proxy.py#L36) +## Configure Model -### Deploy Proxy +To save api keys and/or customize model prompt, run: +```shell +$ litellm --config +``` +This will open a .env file that will store these values locally. + +To set api base, temperature, and max tokens, add it to your cli command +```shell +litellm --model ollama/llama2 \ + --api_base http://localhost:11434 \ + --max_tokens 250 \ + --temperature 0.5 +``` + +## Deploy Proxy @@ -193,141 +204,9 @@ $ litellm --model claude-instant-1 --deploy ``` This will host a ChatCompletions API at: https://api.litellm.ai/44508ad4 -#### Other supported models: - - - -```shell -$ export ANTHROPIC_API_KEY=my-api-key -$ litellm --model claude-instant-1 --deploy -``` - - - - - -```shell -$ export TOGETHERAI_API_KEY=my-api-key -$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k --deploy -``` - - - - - -```shell -$ export REPLICATE_API_KEY=my-api-key -$ litellm \ - --model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3 - --deploy -``` - - - - - -```shell -$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf --deploy -``` - - - - - -```shell -$ export PALM_API_KEY=my-palm-key -$ litellm --model palm/chat-bison --deploy -``` - - - - - -```shell -$ export AZURE_API_KEY=my-api-key -$ export AZURE_API_BASE=my-api-base -$ export AZURE_API_VERSION=my-api-version - -$ litellm --model azure/my-deployment-id --deploy -``` - - - - - -```shell -$ export AI21_API_KEY=my-api-key -$ litellm --model j2-light --deploy -``` - - - - - -```shell -$ export COHERE_API_KEY=my-api-key -$ litellm --model command-nightly --deploy -``` - - - - - -### Test Deployed Proxy -Make a test ChatCompletion Request to your proxy - - - -```shell -litellm --test https://api.litellm.ai/44508ad4 -``` - - - - -```python -import openai - -openai.api_base = "https://api.litellm.ai/44508ad4" - -print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}])) -``` - - - - - -```curl -curl --location 'https://api.litellm.ai/44508ad4/chat/completions' \ ---header 'Content-Type: application/json' \ ---data '{ - "messages": [ - { - "role": "user", - "content": "what do you know?" - } - ], -}' -``` - - -## Setting api base, temperature, max tokens - -```shell -litellm --model huggingface/bigcode/starcoder \ - --api_base https://my-endpoint.huggingface.cloud \ - --max_tokens 250 \ - --temperature 0.5 -``` - -**Ollama example** - -```shell -$ litellm --model ollama/llama2 --api_base http://localhost:11434 -``` ## Tutorial - using HuggingFace LLMs with aider [Aider](https://github.com/paul-gauthier/aider) is an AI pair programming in your terminal.