docs(docker_quick_start.md): add new quick start doc for litellm proxy

This commit is contained in:
Krrish Dholakia 2024-08-29 15:34:28 -07:00
parent 4b7ceade64
commit 601945d114
5 changed files with 366 additions and 9 deletions

View file

@ -399,6 +399,8 @@ The proxy provides:
### 📖 Proxy Endpoints - [Swagger Docs](https://litellm-api.up.railway.app/)
Go here for a complete tutorial with keys + rate limits - [**here**](./proxy/docker_quick_start.md)
### Quick Start Proxy - CLI
```shell
@ -433,4 +435,5 @@ print(response)
- [exception mapping](./exception_mapping.md)
- [retries + model fallbacks for completion()](./completion/reliable_completions.md)
- [proxy virtual keys & spend management](./tutorials/fallbacks.md)
- [proxy virtual keys & spend management](./proxy/virtual_keys.md)
- [E2E Tutorial for LiteLLM Proxy Server](./proxy/docker_quick_start.md)

View file

@ -0,0 +1,355 @@
# Getting Started - E2E Tutorial
End-to-End tutorial for LiteLLM Proxy to:
- Add an Azure OpenAI model
- Make a successful /chat/completion call
- Generate a virtual key
- Set RPM limit on virtual key
## Pre-Requisites
- Install LiteLLM Docker Image
```
docker pull ghcr.io/berriai/litellm:main-latest
```
[**See all docker images**](https://github.com/orgs/BerriAI/packages)
## 1. Add a model
Control LiteLLM Proxy with a config.yaml file.
Setup your config.yaml with your azure model.
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/my_azure_deployment
api_base: os.environ/AZURE_API_BASE
api_key: "os.environ/AZURE_API_KEY"
api_version: "2024-07-01-preview" # [OPTIONAL] litellm uses the latest azure api_version by default
```
---
### Model List Specification
- **`model_name`** (`str`) - This field should contain the name of the model as received.
- **`litellm_params`** (`dict`) [See All LiteLLM Params](https://github.com/BerriAI/litellm/blob/559a6ad826b5daef41565f54f06c739c8c068b28/litellm/types/router.py#L222)
- **`model`** (`str`) - Specifies the model name to be sent to `litellm.acompletion` / `litellm.aembedding`, etc. This is the identifier used by LiteLLM to route to the correct model + provider logic on the backend.
- **`api_key`** (`str`) - The API key required for authentication. It can be retrieved from an environment variable using `os.environ/`.
- **`api_base`** (`str`) - The API base for your azure deployment.
- **`api_version`** (`str`) - The API Version to use when calling Azure's OpenAI API. Get the latest Inference API version [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/api-version-deprecation?source=recommendations#latest-preview-api-releases).
### Useful Links
- [**All Supported LLM API Providers (OpenAI/Bedrock/Vertex/etc.)**](../providers/)
- [**Full Config.Yaml Spec**](./configs.md)
- [**Pass provider-specific params**](../completion/provider_specific_params.md#proxy-usage)
## 2. Make a successful /chat/completion call
LiteLLM Proxy is 100% OpenAI-compatible. Test your azure model via the `/chat/completions` route.
### 2.1 Start Proxy
Save your config.yaml from step 1. as `litellm_config.yaml`.
```bash
docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug
# RUNNING on http://0.0.0.0:4000
```
Confirm your config.yaml got mounted correctly
```bash
Loaded config YAML (api_key and environment_variables are not shown):
{
"model_list": [
{
"model_name ...
```
### 2.2 Make Call
```bash
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor. Guide the user through the solution step by step."
},
{
"role": "user",
"content": "how can I solve 8x + 7 = -23"
}
]
}'
```
**Expected Response**
```bash
{
"id": "chatcmpl-2076f062-3095-4052-a520-7c321c115c68",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I am gpt-3.5-turbo",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"created": 1724962831,
"model": "gpt-3.5-turbo",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 20,
"prompt_tokens": 10,
"total_tokens": 30
}
}
```
### Useful Links
- [All Supported LLM API Providers (OpenAI/Bedrock/Vertex/etc.)](../providers/)
- [Call LiteLLM Proxy via OpenAI SDK, Langchain, etc.](./user_keys.md#request-format)
- [All API Endpoints Swagger](https://litellm-api.up.railway.app/#/chat%2Fcompletions)
- [Other/Non-Chat Completion Endpoints](../embedding/supported_embedding.md)
- [Pass-through for VertexAI, Bedrock, etc.](../pass_through/vertex_ai.md)
## 3. Generate a virtual key
Track Spend, and control model access via virtual keys for the proxy
### 3.1 Set up a Database
**Requirements**
- Need a postgres database (e.g. [Supabase](https://supabase.com/), [Neon](https://neon.tech/), etc)
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/my_azure_deployment
api_base: os.environ/AZURE_API_BASE
api_key: "os.environ/AZURE_API_KEY"
api_version: "2024-07-01-preview" # [OPTIONAL] litellm uses the latest azure api_version by default
general_settings:
master_key: sk-1234
database_url: "postgresql://<user>:<password>@<host>:<port>/<dbname>" # 👈 KEY CHANGE
```
Save config.yaml as `litellm_config.yaml` (used in 3.2).
---
**What is `general_settings`?**
These are settings for the LiteLLM Proxy Server.
See All General Settings [here](http://localhost:3000/docs/proxy/configs#all-settings).
1. **`master_key`** (`str`)
- **Description**:
- Set a `master key`, this is your Proxy Admin key - you can use this to create other keys (🚨 must start with `sk-`).
- **Usage**:
- ** Set on config.yaml** set your master key under `general_settings:master_key`, example -
`master_key: sk-1234`
- ** Set env variable** set `LITELLM_MASTER_KEY`
2. **`database_url`** (str)
- **Description**:
- Set a `database_url`, this is the connection to your Postgres DB, which is used by litellm for generating keys, users, teams.
- **Usage**:
- ** Set on config.yaml** set your master key under `general_settings:database_url`, example -
`database_url: "postgresql://..."`
- Set `DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname>` in your env
### 3.2 Start Proxy
```bash
docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug
```
### 3.3 Create Key w/ RPM Limit
Create a key with `rpm_limit: 1`. This will only allow 1 request per minute for calls to proxy with this key.
```bash
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"rpm_limit": 1
}
```
[**See full API Spec**](https://litellm-api.up.railway.app/#/key%20management/generate_key_fn_key_generate_post)
**Expected Response**
```bash
{
"key": "sk-12..."
}
```
### 3.4 Test it!
**Use your virtual key from step 3.3**
1st call - Expect to work!
```bash
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-12...' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor. Guide the user through the solution step by step."
},
{
"role": "user",
"content": "how can I solve 8x + 7 = -23"
}
]
}'
```
**Expected Response**
```bash
{
"id": "chatcmpl-2076f062-3095-4052-a520-7c321c115c68",
"choices": [
...
}
```
2nd call - Expect to fail!
**Why did this call fail?**
We set the virtual key's requests per minute (RPM) limit to 1. This has now been crossed.
```bash
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-12...' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor. Guide the user through the solution step by step."
},
{
"role": "user",
"content": "how can I solve 8x + 7 = -23"
}
]
}'
```
**Expected Response**
```bash
{
"error": {
"message": "Max parallel request limit reached. Hit limit for api_key: daa1b272072a4c6841470a488c5dad0f298ff506e1cc935f4a181eed90c182ad. tpm_limit: 100, current_tpm: 29, rpm_limit: 1, current_rpm: 2.",
"type": "None",
"param": "None",
"code": "429"
}
}
```
### Useful Links
- [Creating Virtual Keys](./virtual_keys.md)
- [Key Management API Endpoints Swagger](https://litellm-api.up.railway.app/#/key%20management)
- [Set Budgets / Rate Limits per key/user/teams](./users.md)
- [Dynamic TPM/RPM Limits for keys](./team_budgets.md#dynamic-tpmrpm-allocation)
## Troubleshooting
### Non-root docker image?
If you need to run the docker image as a non-root user, use [this](https://github.com/BerriAI/litellm/pkgs/container/litellm-non_root).
### SSL Verification Issue
If you see
```bash
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)
```
You can disable ssl verification with:
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/my_azure_deployment
api_base: os.environ/AZURE_API_BASE
api_key: "os.environ/AZURE_API_KEY"
api_version: "2024-07-01-preview"
litellm_settings:
ssl_verify: false # 👈 KEY CHANGE
```
**What is `litellm_settings`?**
LiteLLM Proxy uses the [LiteLLM Python SDK](https://docs.litellm.ai/docs/routing) for handling LLM API calls.
`litellm_settings` are module-level params for the LiteLLM Python SDK (equivalent to doing `litellm.<some_param>` on the SDK). You can see all params [here](https://github.com/BerriAI/litellm/blob/208fe6cb90937f73e0def5c97ccb2359bf8a467b/litellm/__init__.py#L114)
## Support & Talk with founders
- [Schedule Demo 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
[![Chat on WhatsApp](https://img.shields.io/static/v1?label=Chat%20on&message=WhatsApp&color=success&logo=WhatsApp&style=flat-square)](https://wa.link/huol9n) [![Chat on Discord](https://img.shields.io/static/v1?label=Chat%20on&message=Discord&color=blue&logo=Discord&style=flat-square)](https://discord.gg/wuPM9dRgDw)

View file

@ -283,8 +283,6 @@ litellm_settings:
**Covers all errors (429, 500, etc.)**
[**See Code**]()
**Set via config**
```yaml
model_list:

View file

@ -29,6 +29,7 @@ const sidebars = {
},
items: [
"proxy/quick_start",
"proxy/docker_quick_start",
"proxy/deploy",
"proxy/prod",
{

View file

@ -1,9 +1,9 @@
model_list:
- model_name: fake-openai-endpoint
- model_name: my-fake-openai-endpoint
litellm_params:
model: openai/my-fake-model
api_key: my-fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
model: gpt-3.5-turbo
api_key: "my-fake-key"
mock_response: "hello-world"
litellm_settings:
ssl_verify: false