diff --git a/docs/my-website/docs/proxy_server.md b/docs/my-website/docs/proxy_server.md
index ab976514e..13fe5243c 100644
--- a/docs/my-website/docs/proxy_server.md
+++ b/docs/my-website/docs/proxy_server.md
@@ -6,46 +6,23 @@ import TabItem from '@theme/TabItem';
CLI Tool to create a LLM Proxy Server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.) 100+ models [Provider List](https://docs.litellm.ai/docs/providers).
## Quick start
-Call Huggingface models through your OpenAI proxy.
+Call Ollama models through your OpenAI proxy.
### Start Proxy
```shell
$ pip install litellm
```
```shell
-$ litellm --model huggingface/bigcode/starcoder
+$ litellm --model ollama/llama2
#INFO: Uvicorn running on http://0.0.0.0:8000
```
This will host a local proxy api at: **http://0.0.0.0:8000**
-### Test Proxy
-Make a test ChatCompletion Request to your proxy
-
-
-
-```shell
-litellm --test http://0.0.0.0:8000
-```
-
-
-
-
-```python
-import openai
-
-openai.api_base = "http://0.0.0.0:8000"
-
-print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
-```
-
-
-
-
-
-```curl
-curl --location 'http://0.0.0.0:8000/chat/completions' \
+Let's see if it works
+```shell
+$ curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
@@ -56,20 +33,32 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \
],
}'
```
-
-
+
+### Replace OpenAI Base
+
+```python
+import openai
+
+openai.api_base = "http://0.0.0.0:8000"
+
+print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
+```
#### Other supported models:
-
+
+Assuming you're running vllm locally
```shell
-$ export ANTHROPIC_API_KEY=my-api-key
-$ litellm --model claude-instant-1
+$ litellm --model vllm/facebook/opt-125m
```
-
+
+```shell
+$ litellm --model openai/ --api_base
+```
+
```shell
@@ -77,6 +66,14 @@ $ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1
```
+
+
+
+```shell
+$ export ANTHROPIC_API_KEY=my-api-key
+$ litellm --model claude-instant-1
+```
+
@@ -120,9 +117,8 @@ $ litellm --model palm/chat-bison
```shell
$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base
-$ export AZURE_API_VERSION=my-api-version
-$ litellm --model azure/my-deployment-id
+$ litellm --model azure/my-deployment-name
```
@@ -149,8 +145,23 @@ $ litellm --model command-nightly
[**Jump to Code**](https://github.com/BerriAI/litellm/blob/fef4146396d5d87006259e00095a62e3900d6bb4/litellm/proxy.py#L36)
+## Configure Model
-### Deploy Proxy
+To save api keys and/or customize model prompt, run:
+```shell
+$ litellm --config
+```
+This will open a .env file that will store these values locally.
+
+To set api base, temperature, and max tokens, add it to your cli command
+```shell
+litellm --model ollama/llama2 \
+ --api_base http://localhost:11434 \
+ --max_tokens 250 \
+ --temperature 0.5
+```
+
+## Deploy Proxy
@@ -193,141 +204,9 @@ $ litellm --model claude-instant-1 --deploy
```
This will host a ChatCompletions API at: https://api.litellm.ai/44508ad4
-#### Other supported models:
-
-
-
-```shell
-$ export ANTHROPIC_API_KEY=my-api-key
-$ litellm --model claude-instant-1 --deploy
-```
-
-
-
-
-
-```shell
-$ export TOGETHERAI_API_KEY=my-api-key
-$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k --deploy
-```
-
-
-
-
-
-```shell
-$ export REPLICATE_API_KEY=my-api-key
-$ litellm \
- --model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3
- --deploy
-```
-
-
-
-
-
-```shell
-$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf --deploy
-```
-
-
-
-
-
-```shell
-$ export PALM_API_KEY=my-palm-key
-$ litellm --model palm/chat-bison --deploy
-```
-
-
-
-
-
-```shell
-$ export AZURE_API_KEY=my-api-key
-$ export AZURE_API_BASE=my-api-base
-$ export AZURE_API_VERSION=my-api-version
-
-$ litellm --model azure/my-deployment-id --deploy
-```
-
-
-
-
-
-```shell
-$ export AI21_API_KEY=my-api-key
-$ litellm --model j2-light --deploy
-```
-
-
-
-
-
-```shell
-$ export COHERE_API_KEY=my-api-key
-$ litellm --model command-nightly --deploy
-```
-
-
-
-
-
-### Test Deployed Proxy
-Make a test ChatCompletion Request to your proxy
-
-
-
-```shell
-litellm --test https://api.litellm.ai/44508ad4
-```
-
-
-
-
-```python
-import openai
-
-openai.api_base = "https://api.litellm.ai/44508ad4"
-
-print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
-```
-
-
-
-
-
-```curl
-curl --location 'https://api.litellm.ai/44508ad4/chat/completions' \
---header 'Content-Type: application/json' \
---data '{
- "messages": [
- {
- "role": "user",
- "content": "what do you know?"
- }
- ],
-}'
-```
-
-
-## Setting api base, temperature, max tokens
-
-```shell
-litellm --model huggingface/bigcode/starcoder \
- --api_base https://my-endpoint.huggingface.cloud \
- --max_tokens 250 \
- --temperature 0.5
-```
-
-**Ollama example**
-
-```shell
-$ litellm --model ollama/llama2 --api_base http://localhost:11434
-```
## Tutorial - using HuggingFace LLMs with aider
[Aider](https://github.com/paul-gauthier/aider) is an AI pair programming in your terminal.