(docs) proxy server

2023-11-07 12:06:25 -08:00 · 2023-11-07 12:06:25 -08:00 · 3ef75029cb
commit 3ef75029cb
parent a16707fc1a
1 changed files with 31 additions and 80 deletions
--- a/docs/my-website/docs/simple_proxy.md
+++ b/docs/my-website/docs/simple_proxy.md
@ -2,11 +2,11 @@ import Image from '@theme/IdealImage';
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';

-# 💥 Evaluate LLMs - OpenAI Proxy Server
+# 💥 OpenAI Proxy Server

-LiteLLM Server supports:
+LiteLLM Server manages:

-* Call Call 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format
+* Calling 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format
 * Set custom prompt templates + model-specific configs (`temperature`, `max_tokens`, etc.)

 ## Quick Start 
@ -28,8 +28,32 @@ curl http://0.0.0.0:8000/v1/chat/completions \

 This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints. 

+#### Output
+```json
+{
+  "object": "chat.completion",
+  "choices": [
+    {
+      "finish_reason": "length",
+      "index": 0,
+      "message": {
+        "content": ", and create a new test page.\n\n### Test data\n\n- A user named",
+        "role": "assistant"
+      }
+    }
+  ],
+  "id": "chatcmpl-56634359-d4ce-4dbc-972c-86a640e3a5d8",
+  "created": 1699308314.054251,
+  "model": "huggingface/bigcode/starcoder",
+  "usage": {
+    "completion_tokens": 16,
+    "prompt_tokens": 10,
+    "total_tokens": 26
+  }
+}
+```

-### Supported LLMs:
+### Supported LLMs
 <Tabs>
 <TabItem value="bedrock" label="Bedrock">

@ -297,81 +321,8 @@ model_list:
      api_key: your_huggingface_api_key # [OPTIONAL] if deployed on huggingface inference endpoints
      api_base: your_api_base # url where model is deployed 
 ```
-
-## Caching 
-
-Add Redis Caching to your server via environment variables  
-
-```env
-### REDIS
-REDIS_HOST = "" 
-REDIS_PORT = "" 
-REDIS_PASSWORD = "" 
-```
-
-Docker command: 
-
-```shell
-docker run -e REDIST_HOST=<your-redis-host> -e REDIS_PORT=<your-redis-port> -e REDIS_PASSWORD=<your-redis-password> -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest
-```
-
-## Logging 
-
-1. Debug Logs
-Print the input/output params by setting `SET_VERBOSE = "True"`.
-
-Docker command:
-
-```shell
-docker run -e SET_VERBOSE="True" -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest
-```
-2. Add Langfuse Logging to your server via environment variables  
-
-```env
-### LANGFUSE
-LANGFUSE_PUBLIC_KEY = ""
-LANGFUSE_SECRET_KEY = ""
-# Optional, defaults to https://cloud.langfuse.com
-LANGFUSE_HOST = "" # optional
-```
-
-Docker command: 
-
-```shell
-docker run -e LANGFUSE_PUBLIC_KEY=<your-public-key> -e LANGFUSE_SECRET_KEY=<your-secret-key> -e LANGFUSE_HOST=<your-langfuse-host> -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest
-```
-
 ## Advanced
-### Caching - Completion() and Embedding() Responses
-
-Enable caching by adding the following credentials to your server environment
-
-  ```
-  REDIS_HOST = ""       # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
-  REDIS_PORT = ""       # REDIS_PORT='18841'
-  REDIS_PASSWORD = ""   # REDIS_PASSWORD='liteLlmIsAmazing'
-  ```
-
-#### Test Caching 
-Send the same request twice:
-```shell
-curl http://0.0.0.0:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-     "model": "gpt-3.5-turbo",
-     "messages": [{"role": "user", "content": "write a poem about litellm!"}],
-     "temperature": 0.7
-   }'
-
-curl http://0.0.0.0:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-     "model": "gpt-3.5-turbo",
-     "messages": [{"role": "user", "content": "write a poem about litellm!"}],
-     "temperature": 0.7
-   }'
-```
-
+### Caching
 #### Control caching per completion request
 Caching can be switched on/off per /chat/completions request
 - Caching on for completion - pass `caching=True`:
@ -401,7 +352,7 @@ Caching can be switched on/off per /chat/completions request



-
+<!-- 
 ## Tutorials (Chat-UI, NeMO-Guardrails, PromptTools, Phoenix ArizeAI, Langchain, ragas, LlamaIndex, etc.)

 **Start server:**
@ -617,5 +568,5 @@ response = OpenAI(model="claude-2", api_key="your-anthropic-key",api_base="http:
 print(response)
 ```
 </TabItem>
-</Tabs>
+</Tabs> -->