forked from phoenix/litellm-mirror
Create readme.md
This commit is contained in:
parent
cb963dec68
commit
2d003b56fb
1 changed files with 168 additions and 0 deletions
168
cookbook/proxy-server/readme.md
Normal file
168
cookbook/proxy-server/readme.md
Normal file
|
@ -0,0 +1,168 @@
|
|||
|
||||
# liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
|
||||
### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
|
||||
[](https://pypi.org/project/litellm/)
|
||||
[](https://pypi.org/project/litellm/0.1.1/)
|
||||

|
||||
[](https://github.com/BerriAI/litellm)
|
||||
|
||||
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
|
||||
|
||||

|
||||
|
||||
## What does liteLLM proxy do
|
||||
- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
|
||||
|
||||
Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
||||
```json
|
||||
{
|
||||
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
|
||||
"messages": [
|
||||
{
|
||||
"content": "Hello, whats the weather in San Francisco??",
|
||||
"role": "user"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
- **Consistent Input/Output** Format
|
||||
- Call all models using the OpenAI format - `completion(model, messages)`
|
||||
- Text responses will always be available at `['choices'][0]['message']['content']`
|
||||
- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
|
||||
- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
|
||||
|
||||
**Example: Logs sent to Supabase**
|
||||
<img width="1015" alt="Screenshot 2023-08-11 at 4 02 46 PM" src="https://github.com/ishaan-jaff/proxy-server/assets/29436595/237557b8-ba09-4917-982c-8f3e1b2c8d08">
|
||||
|
||||
- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
|
||||
- **Caching** - Implementation of Semantic Caching
|
||||
- **Streaming & Async Support** - Return generators to stream text responses
|
||||
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### `/chat/completions` (POST)
|
||||
|
||||
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
|
||||
|
||||
#### Input
|
||||
This API endpoint accepts all inputs in raw JSON and expects the following inputs
|
||||
- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
|
||||
eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
||||
- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
|
||||
- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
|
||||
|
||||
|
||||
#### Example JSON body
|
||||
For claude-2
|
||||
```json
|
||||
{
|
||||
"model": "claude-2",
|
||||
"messages": [
|
||||
{
|
||||
"content": "Hello, whats the weather in San Francisco??",
|
||||
"role": "user"
|
||||
}
|
||||
]
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
### Making an API request to the Proxy Server
|
||||
```python
|
||||
import requests
|
||||
import json
|
||||
|
||||
# TODO: use your URL
|
||||
url = "http://localhost:5000/chat/completions"
|
||||
|
||||
payload = json.dumps({
|
||||
"model": "gpt-3.5-turbo",
|
||||
"messages": [
|
||||
{
|
||||
"content": "Hello, whats the weather in San Francisco??",
|
||||
"role": "user"
|
||||
}
|
||||
]
|
||||
})
|
||||
headers = {
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
response = requests.request("POST", url, headers=headers, data=payload)
|
||||
print(response.text)
|
||||
|
||||
```
|
||||
|
||||
### Output [Response Format]
|
||||
Responses from the server are given in the following format.
|
||||
All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
|
||||
```json
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"finish_reason": "stop",
|
||||
"index": 0,
|
||||
"message": {
|
||||
"content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
|
||||
"role": "assistant"
|
||||
}
|
||||
}
|
||||
],
|
||||
"created": 1691790381,
|
||||
"id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
|
||||
"model": "gpt-3.5-turbo-0613",
|
||||
"object": "chat.completion",
|
||||
"usage": {
|
||||
"completion_tokens": 41,
|
||||
"prompt_tokens": 16,
|
||||
"total_tokens": 57
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Installation & Usage
|
||||
### Running Locally
|
||||
1. Clone liteLLM repository to your local machine:
|
||||
```
|
||||
git clone https://github.com/BerriAI/liteLLM-proxy
|
||||
```
|
||||
2. Install the required dependencies using pip
|
||||
```
|
||||
pip install requirements.txt
|
||||
```
|
||||
3. Set your LLM API keys
|
||||
```
|
||||
os.environ['OPENAI_API_KEY]` = "YOUR_API_KEY"
|
||||
or
|
||||
set OPENAI_API_KEY in your .env file
|
||||
```
|
||||
4. Run the server:
|
||||
```
|
||||
python main.py
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Deploying
|
||||
1. Quick Start: Deploy on Railway
|
||||
|
||||
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
|
||||
|
||||
2. `GCP`, `AWS`, `Azure`
|
||||
This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
|
||||
|
||||
# Support / Talk with founders
|
||||
- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
||||
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
|
||||
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
|
||||
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
|
||||
|
||||
|
||||
## Roadmap
|
||||
- [ ] Support hosted db (e.g. Supabase)
|
||||
- [ ] Easily send data to places like posthog and sentry.
|
||||
- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings
|
||||
- [ ] Implement user-based rate-limiting
|
||||
- [ ] Spending controls per project - expose key creation endpoint
|
||||
- [ ] Need to store a keys db -> mapping created keys to their alias (i.e. project name)
|
||||
- [ ] Easily add new models as backups / as the entry-point (add this to the available model list)
|
Loading…
Add table
Add a link
Reference in a new issue