forked from phoenix/litellm-mirror
(docs) proxy server
This commit is contained in:
parent
a16707fc1a
commit
3ef75029cb
1 changed files with 31 additions and 80 deletions
|
@ -2,11 +2,11 @@ import Image from '@theme/IdealImage';
|
||||||
import Tabs from '@theme/Tabs';
|
import Tabs from '@theme/Tabs';
|
||||||
import TabItem from '@theme/TabItem';
|
import TabItem from '@theme/TabItem';
|
||||||
|
|
||||||
# 💥 Evaluate LLMs - OpenAI Proxy Server
|
# 💥 OpenAI Proxy Server
|
||||||
|
|
||||||
LiteLLM Server supports:
|
LiteLLM Server manages:
|
||||||
|
|
||||||
* Call Call 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format
|
* Calling 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format
|
||||||
* Set custom prompt templates + model-specific configs (`temperature`, `max_tokens`, etc.)
|
* Set custom prompt templates + model-specific configs (`temperature`, `max_tokens`, etc.)
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
@ -28,8 +28,32 @@ curl http://0.0.0.0:8000/v1/chat/completions \
|
||||||
|
|
||||||
This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.
|
This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.
|
||||||
|
|
||||||
|
#### Output
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"object": "chat.completion",
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"finish_reason": "length",
|
||||||
|
"index": 0,
|
||||||
|
"message": {
|
||||||
|
"content": ", and create a new test page.\n\n### Test data\n\n- A user named",
|
||||||
|
"role": "assistant"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"id": "chatcmpl-56634359-d4ce-4dbc-972c-86a640e3a5d8",
|
||||||
|
"created": 1699308314.054251,
|
||||||
|
"model": "huggingface/bigcode/starcoder",
|
||||||
|
"usage": {
|
||||||
|
"completion_tokens": 16,
|
||||||
|
"prompt_tokens": 10,
|
||||||
|
"total_tokens": 26
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
### Supported LLMs:
|
### Supported LLMs
|
||||||
<Tabs>
|
<Tabs>
|
||||||
<TabItem value="bedrock" label="Bedrock">
|
<TabItem value="bedrock" label="Bedrock">
|
||||||
|
|
||||||
|
@ -297,81 +321,8 @@ model_list:
|
||||||
api_key: your_huggingface_api_key # [OPTIONAL] if deployed on huggingface inference endpoints
|
api_key: your_huggingface_api_key # [OPTIONAL] if deployed on huggingface inference endpoints
|
||||||
api_base: your_api_base # url where model is deployed
|
api_base: your_api_base # url where model is deployed
|
||||||
```
|
```
|
||||||
|
|
||||||
## Caching
|
|
||||||
|
|
||||||
Add Redis Caching to your server via environment variables
|
|
||||||
|
|
||||||
```env
|
|
||||||
### REDIS
|
|
||||||
REDIS_HOST = ""
|
|
||||||
REDIS_PORT = ""
|
|
||||||
REDIS_PASSWORD = ""
|
|
||||||
```
|
|
||||||
|
|
||||||
Docker command:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
docker run -e REDIST_HOST=<your-redis-host> -e REDIS_PORT=<your-redis-port> -e REDIS_PASSWORD=<your-redis-password> -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest
|
|
||||||
```
|
|
||||||
|
|
||||||
## Logging
|
|
||||||
|
|
||||||
1. Debug Logs
|
|
||||||
Print the input/output params by setting `SET_VERBOSE = "True"`.
|
|
||||||
|
|
||||||
Docker command:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
docker run -e SET_VERBOSE="True" -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest
|
|
||||||
```
|
|
||||||
2. Add Langfuse Logging to your server via environment variables
|
|
||||||
|
|
||||||
```env
|
|
||||||
### LANGFUSE
|
|
||||||
LANGFUSE_PUBLIC_KEY = ""
|
|
||||||
LANGFUSE_SECRET_KEY = ""
|
|
||||||
# Optional, defaults to https://cloud.langfuse.com
|
|
||||||
LANGFUSE_HOST = "" # optional
|
|
||||||
```
|
|
||||||
|
|
||||||
Docker command:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
docker run -e LANGFUSE_PUBLIC_KEY=<your-public-key> -e LANGFUSE_SECRET_KEY=<your-secret-key> -e LANGFUSE_HOST=<your-langfuse-host> -e PORT=8000 -p 8000:8000 ghcr.io/berriai/litellm:latest
|
|
||||||
```
|
|
||||||
|
|
||||||
## Advanced
|
## Advanced
|
||||||
### Caching - Completion() and Embedding() Responses
|
### Caching
|
||||||
|
|
||||||
Enable caching by adding the following credentials to your server environment
|
|
||||||
|
|
||||||
```
|
|
||||||
REDIS_HOST = "" # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
|
|
||||||
REDIS_PORT = "" # REDIS_PORT='18841'
|
|
||||||
REDIS_PASSWORD = "" # REDIS_PASSWORD='liteLlmIsAmazing'
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Test Caching
|
|
||||||
Send the same request twice:
|
|
||||||
```shell
|
|
||||||
curl http://0.0.0.0:8000/v1/chat/completions \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"model": "gpt-3.5-turbo",
|
|
||||||
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
|
|
||||||
"temperature": 0.7
|
|
||||||
}'
|
|
||||||
|
|
||||||
curl http://0.0.0.0:8000/v1/chat/completions \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"model": "gpt-3.5-turbo",
|
|
||||||
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
|
|
||||||
"temperature": 0.7
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Control caching per completion request
|
#### Control caching per completion request
|
||||||
Caching can be switched on/off per /chat/completions request
|
Caching can be switched on/off per /chat/completions request
|
||||||
- Caching on for completion - pass `caching=True`:
|
- Caching on for completion - pass `caching=True`:
|
||||||
|
@ -401,7 +352,7 @@ Caching can be switched on/off per /chat/completions request
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<!--
|
||||||
## Tutorials (Chat-UI, NeMO-Guardrails, PromptTools, Phoenix ArizeAI, Langchain, ragas, LlamaIndex, etc.)
|
## Tutorials (Chat-UI, NeMO-Guardrails, PromptTools, Phoenix ArizeAI, Langchain, ragas, LlamaIndex, etc.)
|
||||||
|
|
||||||
**Start server:**
|
**Start server:**
|
||||||
|
@ -617,5 +568,5 @@ response = OpenAI(model="claude-2", api_key="your-anthropic-key",api_base="http:
|
||||||
print(response)
|
print(response)
|
||||||
```
|
```
|
||||||
</TabItem>
|
</TabItem>
|
||||||
</Tabs>
|
</Tabs> -->
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue