mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-24 18:24:20 +00:00
Merge 39a8986dca
into b82af5b826
This commit is contained in:
commit
37c0d3c4de
2 changed files with 111 additions and 111 deletions
|
@ -3,31 +3,31 @@ import Tabs from '@theme/Tabs';
|
|||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# Overview
|
||||
Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`master-key`) on the config.yaml.
|
||||
Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`master-key`) on the config.yaml.
|
||||
|
||||
| Param Name | Description |
|
||||
|----------------------|---------------------------------------------------------------|
|
||||
| `model_list` | List of supported models on the server, with model-specific configs |
|
||||
| `router_settings` | litellm Router settings, example `routing_strategy="least-busy"` [**see all**](#router-settings)|
|
||||
| `litellm_settings` | litellm Module settings, example `litellm.drop_params=True`, `litellm.set_verbose=True`, `litellm.api_base`, `litellm.cache` [**see all**](#all-settings)|
|
||||
| `router_settings` | litellm Router settings, example `routing_strategy="least-busy"` [**see all**](./config_settings.md#router_settings---reference)|
|
||||
| `litellm_settings` | litellm Module settings, example `litellm.drop_params=True`, `litellm.set_verbose=True`, `litellm.api_base`, `litellm.cache` [**see all**](./config_settings.md#litellm_settings---reference)|
|
||||
| `general_settings` | Server settings, example setting `master_key: sk-my_special_key` |
|
||||
| `environment_variables` | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` |
|
||||
|
||||
**Complete List:** Check the Swagger UI docs on `<your-proxy-url>/#/config.yaml` (e.g. http://0.0.0.0:4000/#/config.yaml), for everything you can pass in the config.yaml.
|
||||
|
||||
|
||||
## Quick Start
|
||||
## Quick Start
|
||||
|
||||
Set a model alias for your deployments.
|
||||
Set a model alias for your deployments.
|
||||
|
||||
In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment.
|
||||
In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment.
|
||||
|
||||
In the config below:
|
||||
- `model_name`: the name to pass TO litellm from the external client
|
||||
- `model_name`: the name to pass TO litellm from the external client
|
||||
- `litellm_params.model`: the model string passed to the litellm.completion() function
|
||||
|
||||
E.g.:
|
||||
- `model=vllm-models` will route to `openai/facebook/opt-125m`.
|
||||
E.g.:
|
||||
- `model=vllm-models` will route to `openai/facebook/opt-125m`.
|
||||
- `model=gpt-3.5-turbo` will load balance between `azure/gpt-turbo-small-eu` and `azure/gpt-turbo-small-ca`
|
||||
|
||||
```yaml
|
||||
|
@ -38,7 +38,7 @@ model_list:
|
|||
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
|
||||
api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
|
||||
rpm: 6 # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
|
||||
- model_name: bedrock-claude-v1
|
||||
- model_name: bedrock-claude-v1
|
||||
litellm_params:
|
||||
model: bedrock/anthropic.claude-instant-v1
|
||||
- model_name: gpt-3.5-turbo
|
||||
|
@ -48,7 +48,7 @@ model_list:
|
|||
api_key: "os.environ/AZURE_API_KEY_CA"
|
||||
rpm: 6
|
||||
- model_name: anthropic-claude
|
||||
litellm_params:
|
||||
litellm_params:
|
||||
model: bedrock/anthropic.claude-instant-v1
|
||||
### [OPTIONAL] SET AWS REGION ###
|
||||
aws_region_name: us-east-1
|
||||
|
@ -58,13 +58,13 @@ model_list:
|
|||
api_base: http://0.0.0.0:4000/v1
|
||||
api_key: none
|
||||
rpm: 1440
|
||||
model_info:
|
||||
model_info:
|
||||
version: 2
|
||||
|
||||
|
||||
# Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml
|
||||
# Default models
|
||||
# Works for ALL Providers and needs the default provider credentials in .env
|
||||
- model_name: "*"
|
||||
- model_name: "*"
|
||||
litellm_params:
|
||||
model: "*"
|
||||
|
||||
|
@ -72,7 +72,7 @@ litellm_settings: # module level litellm settings - https://github.com/BerriAI/l
|
|||
drop_params: True
|
||||
success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env
|
||||
|
||||
general_settings:
|
||||
general_settings:
|
||||
master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
|
||||
alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env
|
||||
```
|
||||
|
@ -90,7 +90,7 @@ $ litellm --config /path/to/config.yaml
|
|||
|
||||
:::tip
|
||||
|
||||
Run with `--detailed_debug` if you need detailed debug logs
|
||||
Run with `--detailed_debug` if you need detailed debug logs
|
||||
|
||||
```shell
|
||||
$ litellm --config /path/to/config.yaml --detailed_debug
|
||||
|
@ -100,7 +100,7 @@ $ litellm --config /path/to/config.yaml --detailed_debug
|
|||
|
||||
#### Step 3: Test it
|
||||
|
||||
Sends request to model where `model_name=gpt-3.5-turbo` on config.yaml.
|
||||
Sends request to model where `model_name=gpt-3.5-turbo` on config.yaml.
|
||||
|
||||
If multiple with `model_name=gpt-3.5-turbo` does [Load Balancing](https://docs.litellm.ai/docs/proxy/load_balancing)
|
||||
|
||||
|
@ -124,7 +124,7 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
|
|||
## LLM configs `model_list`
|
||||
|
||||
### Model-specific params (API Base, Keys, Temperature, Max Tokens, Organization, Headers etc.)
|
||||
You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.
|
||||
You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.
|
||||
|
||||
[**All input params**](https://docs.litellm.ai/docs/completion/input#input-params-1)
|
||||
|
||||
|
@ -200,18 +200,18 @@ model_list:
|
|||
|
||||
<TabItem value="sagemaker" label="Sagemaker, Bedrock Embeddings">
|
||||
|
||||
Here's how to route between GPT-J embedding (sagemaker endpoint), Amazon Titan embedding (Bedrock) and Azure OpenAI embedding on the proxy server:
|
||||
Here's how to route between GPT-J embedding (sagemaker endpoint), Amazon Titan embedding (Bedrock) and Azure OpenAI embedding on the proxy server:
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: sagemaker-embeddings
|
||||
litellm_params:
|
||||
litellm_params:
|
||||
model: "sagemaker/berri-benchmarking-gpt-j-6b-fp16"
|
||||
- model_name: amazon-embeddings
|
||||
litellm_params:
|
||||
model: "bedrock/amazon.titan-embed-text-v1"
|
||||
- model_name: azure-embeddings
|
||||
litellm_params:
|
||||
litellm_params:
|
||||
model: "azure/azure-embedding-model"
|
||||
api_base: "os.environ/AZURE_API_BASE" # os.getenv("AZURE_API_BASE")
|
||||
api_key: "os.environ/AZURE_API_KEY" # os.getenv("AZURE_API_KEY")
|
||||
|
@ -229,16 +229,16 @@ LiteLLM Proxy supports all <a href="https://huggingface.co/models?pipeline_tag=f
|
|||
```yaml
|
||||
model_list:
|
||||
- model_name: deployed-codebert-base
|
||||
litellm_params:
|
||||
litellm_params:
|
||||
# send request to deployed hugging face inference endpoint
|
||||
model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
|
||||
api_key: hf_LdS # api key for hugging face inference endpoint
|
||||
api_base: https://uysneno1wv2wd4lw.us-east-1.aws.endpoints.huggingface.cloud # your hf inference endpoint
|
||||
api_base: https://uysneno1wv2wd4lw.us-east-1.aws.endpoints.huggingface.cloud # your hf inference endpoint
|
||||
- model_name: codebert-base
|
||||
litellm_params:
|
||||
litellm_params:
|
||||
# no api_base set, sends request to hugging face free inference api https://api-inference.huggingface.co/models/
|
||||
model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
|
||||
api_key: hf_LdS # api key for hugging face
|
||||
api_key: hf_LdS # api key for hugging face
|
||||
|
||||
```
|
||||
|
||||
|
@ -264,9 +264,9 @@ model_list:
|
|||
model_list:
|
||||
- model_name: text-embedding-ada-002 # model group
|
||||
litellm_params:
|
||||
model: text-embedding-ada-002 # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
model: text-embedding-ada-002 # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
api_key: your-api-key-1
|
||||
- model_name: text-embedding-ada-002
|
||||
- model_name: text-embedding-ada-002
|
||||
litellm_params:
|
||||
model: text-embedding-ada-002
|
||||
api_key: your-api-key-2
|
||||
|
@ -285,7 +285,7 @@ https://docs.litellm.ai/docs/providers/xinference
|
|||
model_list:
|
||||
- model_name: embedding-model # model group
|
||||
litellm_params:
|
||||
model: xinference/bge-base-en # model name for litellm.embedding(model=xinference/bge-base-en)
|
||||
model: xinference/bge-base-en # model name for litellm.embedding(model=xinference/bge-base-en)
|
||||
api_base: http://0.0.0.0:9997/v1
|
||||
```
|
||||
|
||||
|
@ -301,7 +301,7 @@ model_list:
|
|||
model_list:
|
||||
- model_name: text-embedding-ada-002 # model group
|
||||
litellm_params:
|
||||
model: openai/<your-model-name> # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
model: openai/<your-model-name> # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
api_base: <model-api-base>
|
||||
```
|
||||
|
||||
|
@ -332,9 +332,9 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
|
|||
```
|
||||
|
||||
|
||||
### Multiple OpenAI Organizations
|
||||
### Multiple OpenAI Organizations
|
||||
|
||||
Add all openai models across all OpenAI organizations with just 1 model definition
|
||||
Add all openai models across all OpenAI organizations with just 1 model definition
|
||||
|
||||
```yaml
|
||||
- model_name: *
|
||||
|
@ -342,14 +342,14 @@ Add all openai models across all OpenAI organizations with just 1 model definiti
|
|||
model: openai/*
|
||||
api_key: os.environ/OPENAI_API_KEY
|
||||
organization:
|
||||
- org-1
|
||||
- org-2
|
||||
- org-1
|
||||
- org-2
|
||||
- org-3
|
||||
```
|
||||
|
||||
LiteLLM will automatically create separate deployments for each org.
|
||||
|
||||
Confirm this via
|
||||
Confirm this via
|
||||
|
||||
```bash
|
||||
curl --location 'http://0.0.0.0:4000/v1/model/info' \
|
||||
|
@ -357,7 +357,7 @@ curl --location 'http://0.0.0.0:4000/v1/model/info' \
|
|||
--data ''
|
||||
```
|
||||
|
||||
### Load Balancing
|
||||
### Load Balancing
|
||||
|
||||
:::info
|
||||
For more on this, go to [this page](https://docs.litellm.ai/docs/proxy/load_balancing)
|
||||
|
@ -384,34 +384,34 @@ model_list:
|
|||
model: huggingface/HuggingFaceH4/zephyr-7b-beta
|
||||
api_base: http://0.0.0.0:8001
|
||||
rpm: 60 # Optional[int]: When rpm/tpm set - litellm uses weighted pick for load balancing. rpm = Rate limit for this deployment: in requests per minute (rpm).
|
||||
tpm: 1000 # Optional[int]: tpm = Tokens Per Minute
|
||||
tpm: 1000 # Optional[int]: tpm = Tokens Per Minute
|
||||
- model_name: zephyr-beta
|
||||
litellm_params:
|
||||
model: huggingface/HuggingFaceH4/zephyr-7b-beta
|
||||
api_base: http://0.0.0.0:8002
|
||||
rpm: 600
|
||||
rpm: 600
|
||||
- model_name: zephyr-beta
|
||||
litellm_params:
|
||||
model: huggingface/HuggingFaceH4/zephyr-7b-beta
|
||||
api_base: http://0.0.0.0:8003
|
||||
rpm: 60000
|
||||
rpm: 60000
|
||||
- model_name: gpt-3.5-turbo
|
||||
litellm_params:
|
||||
model: gpt-3.5-turbo
|
||||
api_key: <my-openai-key>
|
||||
rpm: 200
|
||||
rpm: 200
|
||||
- model_name: gpt-3.5-turbo-16k
|
||||
litellm_params:
|
||||
model: gpt-3.5-turbo-16k
|
||||
api_key: <my-openai-key>
|
||||
rpm: 100
|
||||
rpm: 100
|
||||
|
||||
litellm_settings:
|
||||
num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
|
||||
request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
|
||||
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries
|
||||
request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
|
||||
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries
|
||||
context_window_fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo-16k"]}, {"gpt-3.5-turbo": ["gpt-3.5-turbo-16k"]}] # fallback to gpt-3.5-turbo-16k if context window error
|
||||
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
|
||||
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
|
||||
|
||||
router_settings: # router_settings are optional
|
||||
routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
|
||||
|
@ -426,7 +426,7 @@ router_settings: # router_settings are optional
|
|||
You can view your cost once you set up [Virtual keys](https://docs.litellm.ai/docs/proxy/virtual_keys) or [custom_callbacks](https://docs.litellm.ai/docs/proxy/logging)
|
||||
|
||||
|
||||
### Load API Keys / config values from Environment
|
||||
### Load API Keys / config values from Environment
|
||||
|
||||
If you have secrets saved in your environment, and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment. **This works for ANY value on the config.yaml**
|
||||
|
||||
|
@ -434,7 +434,7 @@ If you have secrets saved in your environment, and don't want to expose them in
|
|||
os.environ/<YOUR-ENV-VAR> # runs os.getenv("YOUR-ENV-VAR")
|
||||
```
|
||||
|
||||
```yaml
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: gpt-4-team1
|
||||
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
|
||||
|
@ -446,7 +446,7 @@ model_list:
|
|||
|
||||
[**See Code**](https://github.com/BerriAI/litellm/blob/c12d6c3fe80e1b5e704d9846b246c059defadce7/litellm/utils.py#L2366)
|
||||
|
||||
s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this.
|
||||
s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this.
|
||||
|
||||
### Centralized Credential Management
|
||||
|
||||
|
@ -519,7 +519,7 @@ model_list:
|
|||
|
||||
### Set Custom Prompt Templates
|
||||
|
||||
LiteLLM by default checks if a model has a [prompt template and applies it](../completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`:
|
||||
LiteLLM by default checks if a model has a [prompt template and applies it](../completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`:
|
||||
|
||||
**Step 1**: Save your prompt template in a `config.yaml`
|
||||
```yaml
|
||||
|
@ -527,7 +527,7 @@ LiteLLM by default checks if a model has a [prompt template and applies it](../c
|
|||
model_list:
|
||||
- model_name: mistral-7b # model alias
|
||||
litellm_params: # actual params for litellm.completion()
|
||||
model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1"
|
||||
model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1"
|
||||
api_base: "<your-api-base>"
|
||||
api_key: "<your-api-key>" # [OPTIONAL] for hf inference endpoints
|
||||
initial_prompt_value: "\n"
|
||||
|
@ -542,9 +542,9 @@ model_list:
|
|||
|
||||
```shell
|
||||
$ litellm --config /path/to/config.yaml
|
||||
```
|
||||
```
|
||||
|
||||
### Set custom tokenizer
|
||||
### Set custom tokenizer
|
||||
|
||||
If you're using the [`/utils/token_counter` endpoint](https://litellm-api.up.railway.app/#/llm%20utils/token_counter_utils_token_counter_post), and want to set a custom huggingface tokenizer for a model, you can do so in the `config.yaml`
|
||||
|
||||
|
@ -556,7 +556,7 @@ model_list:
|
|||
api_key: os.environ/OPENAI_API_KEY
|
||||
model_info:
|
||||
access_groups: ["restricted-models"]
|
||||
custom_tokenizer:
|
||||
custom_tokenizer:
|
||||
identifier: deepseek-ai/DeepSeek-V3-Base
|
||||
revision: main
|
||||
auth_token: os.environ/HUGGINGFACE_API_KEY
|
||||
|
@ -564,34 +564,34 @@ model_list:
|
|||
|
||||
**Spec**
|
||||
```
|
||||
custom_tokenizer:
|
||||
custom_tokenizer:
|
||||
identifier: str # huggingface model identifier
|
||||
revision: str # huggingface model revision (usually 'main')
|
||||
auth_token: Optional[str] # huggingface auth token
|
||||
auth_token: Optional[str] # huggingface auth token
|
||||
```
|
||||
|
||||
## General Settings `general_settings` (DB Connection, etc)
|
||||
|
||||
### Configure DB Pool Limits + Connection Timeouts
|
||||
### Configure DB Pool Limits + Connection Timeouts
|
||||
|
||||
```yaml
|
||||
general_settings:
|
||||
general_settings:
|
||||
database_connection_pool_limit: 100 # sets connection pool for prisma client to postgres db at 100
|
||||
database_connection_timeout: 60 # sets a 60s timeout for any connection call to the db
|
||||
database_connection_timeout: 60 # sets a 60s timeout for any connection call to the db
|
||||
```
|
||||
|
||||
## Extras
|
||||
|
||||
|
||||
### Disable Swagger UI
|
||||
### Disable Swagger UI
|
||||
|
||||
To disable the Swagger docs from the base url, set
|
||||
To disable the Swagger docs from the base url, set
|
||||
|
||||
```env
|
||||
NO_DOCS="True"
|
||||
```
|
||||
|
||||
in your environment, and restart the proxy.
|
||||
in your environment, and restart the proxy.
|
||||
|
||||
### Use CONFIG_FILE_PATH for proxy (Easier Azure container deployment)
|
||||
|
||||
|
@ -605,7 +605,7 @@ model_list:
|
|||
api_key: os.environ/OPENAI_API_KEY
|
||||
```
|
||||
|
||||
2. Store filepath as env var
|
||||
2. Store filepath as env var
|
||||
|
||||
```bash
|
||||
CONFIG_FILE_PATH="/path/to/config.yaml"
|
||||
|
@ -614,7 +614,7 @@ CONFIG_FILE_PATH="/path/to/config.yaml"
|
|||
3. Start Proxy
|
||||
|
||||
```bash
|
||||
$ litellm
|
||||
$ litellm
|
||||
|
||||
# RUNNING on http://0.0.0.0:4000
|
||||
```
|
||||
|
@ -624,19 +624,19 @@ $ litellm
|
|||
|
||||
Use this if you cannot mount a config file on your deployment service (example - AWS Fargate, Railway etc)
|
||||
|
||||
LiteLLM Proxy will read your config.yaml from an s3 Bucket or GCS Bucket
|
||||
LiteLLM Proxy will read your config.yaml from an s3 Bucket or GCS Bucket
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="gcs" label="GCS Bucket">
|
||||
|
||||
Set the following .env vars
|
||||
Set the following .env vars
|
||||
```shell
|
||||
LITELLM_CONFIG_BUCKET_TYPE = "gcs" # set this to "gcs"
|
||||
LITELLM_CONFIG_BUCKET_TYPE = "gcs" # set this to "gcs"
|
||||
LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on GCS
|
||||
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "proxy_config.yaml" # object key on GCS
|
||||
```
|
||||
|
||||
Start litellm proxy with these env vars - litellm will read your config from GCS
|
||||
Start litellm proxy with these env vars - litellm will read your config from GCS
|
||||
|
||||
```shell
|
||||
docker run --name litellm-proxy \
|
||||
|
@ -652,13 +652,13 @@ docker run --name litellm-proxy \
|
|||
|
||||
<TabItem value="s3" label="s3">
|
||||
|
||||
Set the following .env vars
|
||||
Set the following .env vars
|
||||
```shell
|
||||
LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on s3
|
||||
LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on s3
|
||||
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "litellm_proxy_config.yaml" # object key on s3
|
||||
```
|
||||
|
||||
Start litellm proxy with these env vars - litellm will read your config from s3
|
||||
Start litellm proxy with these env vars - litellm will read your config from s3
|
||||
|
||||
```shell
|
||||
docker run --name litellm-proxy \
|
||||
|
|
|
@ -5,13 +5,13 @@ import TabItem from '@theme/TabItem';
|
|||
# Getting Started - E2E Tutorial
|
||||
|
||||
End-to-End tutorial for LiteLLM Proxy to:
|
||||
- Add an Azure OpenAI model
|
||||
- Make a successful /chat/completion call
|
||||
- Generate a virtual key
|
||||
- Set RPM limit on virtual key
|
||||
- Add an Azure OpenAI model
|
||||
- Make a successful /chat/completion call
|
||||
- Generate a virtual key
|
||||
- Set RPM limit on virtual key
|
||||
|
||||
|
||||
## Pre-Requisites
|
||||
## Pre-Requisites
|
||||
|
||||
- Install LiteLLM Docker Image ** OR ** LiteLLM CLI (pip package)
|
||||
|
||||
|
@ -37,7 +37,7 @@ $ pip install 'litellm[proxy]'
|
|||
|
||||
</Tabs>
|
||||
|
||||
## 1. Add a model
|
||||
## 1. Add a model
|
||||
|
||||
Control LiteLLM Proxy with a config.yaml file.
|
||||
|
||||
|
@ -58,7 +58,7 @@ model_list:
|
|||
|
||||
- **`model_name`** (`str`) - This field should contain the name of the model as received.
|
||||
- **`litellm_params`** (`dict`) [See All LiteLLM Params](https://github.com/BerriAI/litellm/blob/559a6ad826b5daef41565f54f06c739c8c068b28/litellm/types/router.py#L222)
|
||||
- **`model`** (`str`) - Specifies the model name to be sent to `litellm.acompletion` / `litellm.aembedding`, etc. This is the identifier used by LiteLLM to route to the correct model + provider logic on the backend.
|
||||
- **`model`** (`str`) - Specifies the model name to be sent to `litellm.acompletion` / `litellm.aembedding`, etc. This is the identifier used by LiteLLM to route to the correct model + provider logic on the backend.
|
||||
- **`api_key`** (`str`) - The API key required for authentication. It can be retrieved from an environment variable using `os.environ/`.
|
||||
- **`api_base`** (`str`) - The API base for your azure deployment.
|
||||
- **`api_version`** (`str`) - The API Version to use when calling Azure's OpenAI API. Get the latest Inference API version [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/api-version-deprecation?source=recommendations#latest-preview-api-releases).
|
||||
|
@ -70,11 +70,11 @@ model_list:
|
|||
- [**Pass provider-specific params**](../completion/provider_specific_params.md#proxy-usage)
|
||||
|
||||
|
||||
## 2. Make a successful /chat/completion call
|
||||
## 2. Make a successful /chat/completion call
|
||||
|
||||
LiteLLM Proxy is 100% OpenAI-compatible. Test your azure model via the `/chat/completions` route.
|
||||
|
||||
### 2.1 Start Proxy
|
||||
### 2.1 Start Proxy
|
||||
|
||||
Save your config.yaml from step 1. as `litellm_config.yaml`.
|
||||
|
||||
|
@ -119,7 +119,7 @@ Loaded config YAML (api_key and environment_variables are not shown):
|
|||
"model_name ...
|
||||
```
|
||||
|
||||
### 2.2 Make Call
|
||||
### 2.2 Make Call
|
||||
|
||||
|
||||
```bash
|
||||
|
@ -183,7 +183,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \
|
|||
|
||||
Track Spend, and control model access via virtual keys for the proxy
|
||||
|
||||
### 3.1 Set up a Database
|
||||
### 3.1 Set up a Database
|
||||
|
||||
**Requirements**
|
||||
- Need a postgres database (e.g. [Supabase](https://supabase.com/), [Neon](https://neon.tech/), etc)
|
||||
|
@ -198,8 +198,8 @@ model_list:
|
|||
api_key: "os.environ/AZURE_API_KEY"
|
||||
api_version: "2024-07-01-preview" # [OPTIONAL] litellm uses the latest azure api_version by default
|
||||
|
||||
general_settings:
|
||||
master_key: sk-1234
|
||||
general_settings:
|
||||
master_key: sk-1234
|
||||
database_url: "postgresql://<user>:<password>@<host>:<port>/<dbname>" # 👈 KEY CHANGE
|
||||
```
|
||||
|
||||
|
@ -209,27 +209,27 @@ Save config.yaml as `litellm_config.yaml` (used in 3.2).
|
|||
|
||||
**What is `general_settings`?**
|
||||
|
||||
These are settings for the LiteLLM Proxy Server.
|
||||
These are settings for the LiteLLM Proxy Server.
|
||||
|
||||
See All General Settings [here](http://localhost:3000/docs/proxy/configs#all-settings).
|
||||
See All General Settings [here](http://localhost:3000/docs/proxy/config_settings).
|
||||
|
||||
1. **`master_key`** (`str`)
|
||||
- **Description**:
|
||||
- **Description**:
|
||||
- Set a `master key`, this is your Proxy Admin key - you can use this to create other keys (🚨 must start with `sk-`).
|
||||
- **Usage**:
|
||||
- ** Set on config.yaml** set your master key under `general_settings:master_key`, example -
|
||||
- **Usage**:
|
||||
- ** Set on config.yaml** set your master key under `general_settings:master_key`, example -
|
||||
`master_key: sk-1234`
|
||||
- ** Set env variable** set `LITELLM_MASTER_KEY`
|
||||
|
||||
2. **`database_url`** (str)
|
||||
- **Description**:
|
||||
- **Description**:
|
||||
- Set a `database_url`, this is the connection to your Postgres DB, which is used by litellm for generating keys, users, teams.
|
||||
- **Usage**:
|
||||
- ** Set on config.yaml** set your master key under `general_settings:database_url`, example -
|
||||
- **Usage**:
|
||||
- ** Set on config.yaml** set your master key under `general_settings:database_url`, example -
|
||||
`database_url: "postgresql://..."`
|
||||
- Set `DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname>` in your env
|
||||
- Set `DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname>` in your env
|
||||
|
||||
### 3.2 Start Proxy
|
||||
### 3.2 Start Proxy
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
|
@ -246,7 +246,7 @@ docker run \
|
|||
|
||||
Create a key with `rpm_limit: 1`. This will only allow 1 request per minute for calls to proxy with this key.
|
||||
|
||||
```bash
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-H 'Content-Type: application/json' \
|
||||
|
@ -265,11 +265,11 @@ curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
|
|||
}
|
||||
```
|
||||
|
||||
### 3.4 Test it!
|
||||
### 3.4 Test it!
|
||||
|
||||
**Use your virtual key from step 3.3**
|
||||
|
||||
1st call - Expect to work!
|
||||
1st call - Expect to work!
|
||||
|
||||
```bash
|
||||
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
|
||||
|
@ -300,7 +300,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \
|
|||
}
|
||||
```
|
||||
|
||||
2nd call - Expect to fail!
|
||||
2nd call - Expect to fail!
|
||||
|
||||
**Why did this call fail?**
|
||||
|
||||
|
@ -339,7 +339,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \
|
|||
}
|
||||
```
|
||||
|
||||
### Useful Links
|
||||
### Useful Links
|
||||
|
||||
- [Creating Virtual Keys](./virtual_keys.md)
|
||||
- [Key Management API Endpoints Swagger](https://litellm-api.up.railway.app/#/key%20management)
|
||||
|
@ -347,7 +347,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \
|
|||
- [Dynamic TPM/RPM Limits for keys](./team_budgets.md#dynamic-tpmrpm-allocation)
|
||||
|
||||
|
||||
## Troubleshooting
|
||||
## Troubleshooting
|
||||
|
||||
### Non-root docker image?
|
||||
|
||||
|
@ -355,7 +355,7 @@ If you need to run the docker image as a non-root user, use [this](https://githu
|
|||
|
||||
### SSL Verification Issue / Connection Error.
|
||||
|
||||
If you see
|
||||
If you see
|
||||
|
||||
```bash
|
||||
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)
|
||||
|
@ -367,7 +367,7 @@ OR
|
|||
Connection Error.
|
||||
```
|
||||
|
||||
You can disable ssl verification with:
|
||||
You can disable ssl verification with:
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
|
@ -383,23 +383,23 @@ litellm_settings:
|
|||
```
|
||||
|
||||
|
||||
### (DB) All connection attempts failed
|
||||
### (DB) All connection attempts failed
|
||||
|
||||
|
||||
If you see:
|
||||
|
||||
```
|
||||
httpx.ConnectError: All connection attempts failed
|
||||
|
||||
ERROR: Application startup failed. Exiting.
|
||||
3:21:43 - LiteLLM Proxy:ERROR: utils.py:2207 - Error getting LiteLLM_SpendLogs row count: All connection attempts failed
|
||||
httpx.ConnectError: All connection attempts failed
|
||||
|
||||
ERROR: Application startup failed. Exiting.
|
||||
3:21:43 - LiteLLM Proxy:ERROR: utils.py:2207 - Error getting LiteLLM_SpendLogs row count: All connection attempts failed
|
||||
```
|
||||
|
||||
This might be a DB permission issue.
|
||||
This might be a DB permission issue.
|
||||
|
||||
1. Validate db user permission issue
|
||||
1. Validate db user permission issue
|
||||
|
||||
Try creating a new database.
|
||||
Try creating a new database.
|
||||
|
||||
```bash
|
||||
STATEMENT: CREATE DATABASE "litellm"
|
||||
|
@ -408,10 +408,10 @@ STATEMENT: CREATE DATABASE "litellm"
|
|||
If you get:
|
||||
|
||||
```
|
||||
ERROR: permission denied to create
|
||||
ERROR: permission denied to create
|
||||
```
|
||||
|
||||
This indicates you have a permission issue.
|
||||
This indicates you have a permission issue.
|
||||
|
||||
2. Grant permissions to your DB user
|
||||
|
||||
|
@ -434,7 +434,7 @@ GRANT ALL PRIVILEGES ON DATABASE litellm TO your_username;
|
|||
|
||||
**What is `litellm_settings`?**
|
||||
|
||||
LiteLLM Proxy uses the [LiteLLM Python SDK](https://docs.litellm.ai/docs/routing) for handling LLM API calls.
|
||||
LiteLLM Proxy uses the [LiteLLM Python SDK](https://docs.litellm.ai/docs/routing) for handling LLM API calls.
|
||||
|
||||
`litellm_settings` are module-level params for the LiteLLM Python SDK (equivalent to doing `litellm.<some_param>` on the SDK). You can see all params [here](https://github.com/BerriAI/litellm/blob/208fe6cb90937f73e0def5c97ccb2359bf8a467b/litellm/__init__.py#L114)
|
||||
|
||||
|
@ -446,7 +446,7 @@ LiteLLM Proxy uses the [LiteLLM Python SDK](https://docs.litellm.ai/docs/routing
|
|||
|
||||
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
|
||||
|
||||
[](https://wa.link/huol9n) [](https://discord.gg/wuPM9dRgDw)
|
||||
[](https://wa.link/huol9n) [](https://discord.gg/wuPM9dRgDw)
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue