diff --git a/docs/my-website/docs/proxy/configs.md b/docs/my-website/docs/proxy/configs.md index db737f75af..e58c55182c 100644 --- a/docs/my-website/docs/proxy/configs.md +++ b/docs/my-website/docs/proxy/configs.md @@ -3,31 +3,31 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Overview -Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`master-key`) on the config.yaml. +Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`master-key`) on the config.yaml. | Param Name | Description | |----------------------|---------------------------------------------------------------| | `model_list` | List of supported models on the server, with model-specific configs | -| `router_settings` | litellm Router settings, example `routing_strategy="least-busy"` [**see all**](#router-settings)| -| `litellm_settings` | litellm Module settings, example `litellm.drop_params=True`, `litellm.set_verbose=True`, `litellm.api_base`, `litellm.cache` [**see all**](#all-settings)| +| `router_settings` | litellm Router settings, example `routing_strategy="least-busy"` [**see all**](./config_settings.md#router_settings---reference)| +| `litellm_settings` | litellm Module settings, example `litellm.drop_params=True`, `litellm.set_verbose=True`, `litellm.api_base`, `litellm.cache` [**see all**](./config_settings.md#litellm_settings---reference)| | `general_settings` | Server settings, example setting `master_key: sk-my_special_key` | | `environment_variables` | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` | **Complete List:** Check the Swagger UI docs on `/#/config.yaml` (e.g. http://0.0.0.0:4000/#/config.yaml), for everything you can pass in the config.yaml. -## Quick Start +## Quick Start -Set a model alias for your deployments. +Set a model alias for your deployments. -In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment. +In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment. In the config below: -- `model_name`: the name to pass TO litellm from the external client +- `model_name`: the name to pass TO litellm from the external client - `litellm_params.model`: the model string passed to the litellm.completion() function -E.g.: -- `model=vllm-models` will route to `openai/facebook/opt-125m`. +E.g.: +- `model=vllm-models` will route to `openai/facebook/opt-125m`. - `model=gpt-3.5-turbo` will load balance between `azure/gpt-turbo-small-eu` and `azure/gpt-turbo-small-ca` ```yaml @@ -38,7 +38,7 @@ model_list: api_base: https://my-endpoint-europe-berri-992.openai.azure.com/ api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU") rpm: 6 # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm) - - model_name: bedrock-claude-v1 + - model_name: bedrock-claude-v1 litellm_params: model: bedrock/anthropic.claude-instant-v1 - model_name: gpt-3.5-turbo @@ -48,7 +48,7 @@ model_list: api_key: "os.environ/AZURE_API_KEY_CA" rpm: 6 - model_name: anthropic-claude - litellm_params: + litellm_params: model: bedrock/anthropic.claude-instant-v1 ### [OPTIONAL] SET AWS REGION ### aws_region_name: us-east-1 @@ -58,13 +58,13 @@ model_list: api_base: http://0.0.0.0:4000/v1 api_key: none rpm: 1440 - model_info: + model_info: version: 2 - + # Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml # Default models # Works for ALL Providers and needs the default provider credentials in .env - - model_name: "*" + - model_name: "*" litellm_params: model: "*" @@ -72,7 +72,7 @@ litellm_settings: # module level litellm settings - https://github.com/BerriAI/l drop_params: True success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env -general_settings: +general_settings: master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234) alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env ``` @@ -90,7 +90,7 @@ $ litellm --config /path/to/config.yaml :::tip -Run with `--detailed_debug` if you need detailed debug logs +Run with `--detailed_debug` if you need detailed debug logs ```shell $ litellm --config /path/to/config.yaml --detailed_debug @@ -100,7 +100,7 @@ $ litellm --config /path/to/config.yaml --detailed_debug #### Step 3: Test it -Sends request to model where `model_name=gpt-3.5-turbo` on config.yaml. +Sends request to model where `model_name=gpt-3.5-turbo` on config.yaml. If multiple with `model_name=gpt-3.5-turbo` does [Load Balancing](https://docs.litellm.ai/docs/proxy/load_balancing) @@ -124,7 +124,7 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ ## LLM configs `model_list` ### Model-specific params (API Base, Keys, Temperature, Max Tokens, Organization, Headers etc.) -You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc. +You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc. [**All input params**](https://docs.litellm.ai/docs/completion/input#input-params-1) @@ -200,18 +200,18 @@ model_list: -Here's how to route between GPT-J embedding (sagemaker endpoint), Amazon Titan embedding (Bedrock) and Azure OpenAI embedding on the proxy server: +Here's how to route between GPT-J embedding (sagemaker endpoint), Amazon Titan embedding (Bedrock) and Azure OpenAI embedding on the proxy server: ```yaml model_list: - model_name: sagemaker-embeddings - litellm_params: + litellm_params: model: "sagemaker/berri-benchmarking-gpt-j-6b-fp16" - model_name: amazon-embeddings litellm_params: model: "bedrock/amazon.titan-embed-text-v1" - model_name: azure-embeddings - litellm_params: + litellm_params: model: "azure/azure-embedding-model" api_base: "os.environ/AZURE_API_BASE" # os.getenv("AZURE_API_BASE") api_key: "os.environ/AZURE_API_KEY" # os.getenv("AZURE_API_KEY") @@ -229,16 +229,16 @@ LiteLLM Proxy supports all 1 call in a minute. + allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. router_settings: # router_settings are optional routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle" @@ -426,7 +426,7 @@ router_settings: # router_settings are optional You can view your cost once you set up [Virtual keys](https://docs.litellm.ai/docs/proxy/virtual_keys) or [custom_callbacks](https://docs.litellm.ai/docs/proxy/logging) -### Load API Keys / config values from Environment +### Load API Keys / config values from Environment If you have secrets saved in your environment, and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment. **This works for ANY value on the config.yaml** @@ -434,7 +434,7 @@ If you have secrets saved in your environment, and don't want to expose them in os.environ/ # runs os.getenv("YOUR-ENV-VAR") ``` -```yaml +```yaml model_list: - model_name: gpt-4-team1 litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body @@ -446,7 +446,7 @@ model_list: [**See Code**](https://github.com/BerriAI/litellm/blob/c12d6c3fe80e1b5e704d9846b246c059defadce7/litellm/utils.py#L2366) -s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this. +s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this. ### Centralized Credential Management @@ -519,7 +519,7 @@ model_list: ### Set Custom Prompt Templates -LiteLLM by default checks if a model has a [prompt template and applies it](../completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`: +LiteLLM by default checks if a model has a [prompt template and applies it](../completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`: **Step 1**: Save your prompt template in a `config.yaml` ```yaml @@ -527,7 +527,7 @@ LiteLLM by default checks if a model has a [prompt template and applies it](../c model_list: - model_name: mistral-7b # model alias litellm_params: # actual params for litellm.completion() - model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1" + model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1" api_base: "" api_key: "" # [OPTIONAL] for hf inference endpoints initial_prompt_value: "\n" @@ -542,9 +542,9 @@ model_list: ```shell $ litellm --config /path/to/config.yaml -``` +``` -### Set custom tokenizer +### Set custom tokenizer If you're using the [`/utils/token_counter` endpoint](https://litellm-api.up.railway.app/#/llm%20utils/token_counter_utils_token_counter_post), and want to set a custom huggingface tokenizer for a model, you can do so in the `config.yaml` @@ -556,7 +556,7 @@ model_list: api_key: os.environ/OPENAI_API_KEY model_info: access_groups: ["restricted-models"] - custom_tokenizer: + custom_tokenizer: identifier: deepseek-ai/DeepSeek-V3-Base revision: main auth_token: os.environ/HUGGINGFACE_API_KEY @@ -564,34 +564,34 @@ model_list: **Spec** ``` -custom_tokenizer: +custom_tokenizer: identifier: str # huggingface model identifier revision: str # huggingface model revision (usually 'main') - auth_token: Optional[str] # huggingface auth token + auth_token: Optional[str] # huggingface auth token ``` ## General Settings `general_settings` (DB Connection, etc) -### Configure DB Pool Limits + Connection Timeouts +### Configure DB Pool Limits + Connection Timeouts ```yaml -general_settings: +general_settings: database_connection_pool_limit: 100 # sets connection pool for prisma client to postgres db at 100 - database_connection_timeout: 60 # sets a 60s timeout for any connection call to the db + database_connection_timeout: 60 # sets a 60s timeout for any connection call to the db ``` ## Extras -### Disable Swagger UI +### Disable Swagger UI -To disable the Swagger docs from the base url, set +To disable the Swagger docs from the base url, set ```env NO_DOCS="True" ``` -in your environment, and restart the proxy. +in your environment, and restart the proxy. ### Use CONFIG_FILE_PATH for proxy (Easier Azure container deployment) @@ -605,7 +605,7 @@ model_list: api_key: os.environ/OPENAI_API_KEY ``` -2. Store filepath as env var +2. Store filepath as env var ```bash CONFIG_FILE_PATH="/path/to/config.yaml" @@ -614,7 +614,7 @@ CONFIG_FILE_PATH="/path/to/config.yaml" 3. Start Proxy ```bash -$ litellm +$ litellm # RUNNING on http://0.0.0.0:4000 ``` @@ -624,19 +624,19 @@ $ litellm Use this if you cannot mount a config file on your deployment service (example - AWS Fargate, Railway etc) -LiteLLM Proxy will read your config.yaml from an s3 Bucket or GCS Bucket +LiteLLM Proxy will read your config.yaml from an s3 Bucket or GCS Bucket -Set the following .env vars +Set the following .env vars ```shell -LITELLM_CONFIG_BUCKET_TYPE = "gcs" # set this to "gcs" +LITELLM_CONFIG_BUCKET_TYPE = "gcs" # set this to "gcs" LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on GCS LITELLM_CONFIG_BUCKET_OBJECT_KEY = "proxy_config.yaml" # object key on GCS ``` -Start litellm proxy with these env vars - litellm will read your config from GCS +Start litellm proxy with these env vars - litellm will read your config from GCS ```shell docker run --name litellm-proxy \ @@ -652,13 +652,13 @@ docker run --name litellm-proxy \ -Set the following .env vars +Set the following .env vars ```shell -LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on s3 +LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy" # your bucket name on s3 LITELLM_CONFIG_BUCKET_OBJECT_KEY = "litellm_proxy_config.yaml" # object key on s3 ``` -Start litellm proxy with these env vars - litellm will read your config from s3 +Start litellm proxy with these env vars - litellm will read your config from s3 ```shell docker run --name litellm-proxy \ diff --git a/docs/my-website/docs/proxy/docker_quick_start.md b/docs/my-website/docs/proxy/docker_quick_start.md index c5f28effa4..fb008531f9 100644 --- a/docs/my-website/docs/proxy/docker_quick_start.md +++ b/docs/my-website/docs/proxy/docker_quick_start.md @@ -5,13 +5,13 @@ import TabItem from '@theme/TabItem'; # Getting Started - E2E Tutorial End-to-End tutorial for LiteLLM Proxy to: -- Add an Azure OpenAI model -- Make a successful /chat/completion call -- Generate a virtual key -- Set RPM limit on virtual key +- Add an Azure OpenAI model +- Make a successful /chat/completion call +- Generate a virtual key +- Set RPM limit on virtual key -## Pre-Requisites +## Pre-Requisites - Install LiteLLM Docker Image ** OR ** LiteLLM CLI (pip package) @@ -37,7 +37,7 @@ $ pip install 'litellm[proxy]' -## 1. Add a model +## 1. Add a model Control LiteLLM Proxy with a config.yaml file. @@ -58,7 +58,7 @@ model_list: - **`model_name`** (`str`) - This field should contain the name of the model as received. - **`litellm_params`** (`dict`) [See All LiteLLM Params](https://github.com/BerriAI/litellm/blob/559a6ad826b5daef41565f54f06c739c8c068b28/litellm/types/router.py#L222) - - **`model`** (`str`) - Specifies the model name to be sent to `litellm.acompletion` / `litellm.aembedding`, etc. This is the identifier used by LiteLLM to route to the correct model + provider logic on the backend. + - **`model`** (`str`) - Specifies the model name to be sent to `litellm.acompletion` / `litellm.aembedding`, etc. This is the identifier used by LiteLLM to route to the correct model + provider logic on the backend. - **`api_key`** (`str`) - The API key required for authentication. It can be retrieved from an environment variable using `os.environ/`. - **`api_base`** (`str`) - The API base for your azure deployment. - **`api_version`** (`str`) - The API Version to use when calling Azure's OpenAI API. Get the latest Inference API version [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/api-version-deprecation?source=recommendations#latest-preview-api-releases). @@ -70,11 +70,11 @@ model_list: - [**Pass provider-specific params**](../completion/provider_specific_params.md#proxy-usage) -## 2. Make a successful /chat/completion call +## 2. Make a successful /chat/completion call LiteLLM Proxy is 100% OpenAI-compatible. Test your azure model via the `/chat/completions` route. -### 2.1 Start Proxy +### 2.1 Start Proxy Save your config.yaml from step 1. as `litellm_config.yaml`. @@ -119,7 +119,7 @@ Loaded config YAML (api_key and environment_variables are not shown): "model_name ... ``` -### 2.2 Make Call +### 2.2 Make Call ```bash @@ -183,7 +183,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \ Track Spend, and control model access via virtual keys for the proxy -### 3.1 Set up a Database +### 3.1 Set up a Database **Requirements** - Need a postgres database (e.g. [Supabase](https://supabase.com/), [Neon](https://neon.tech/), etc) @@ -198,8 +198,8 @@ model_list: api_key: "os.environ/AZURE_API_KEY" api_version: "2024-07-01-preview" # [OPTIONAL] litellm uses the latest azure api_version by default -general_settings: - master_key: sk-1234 +general_settings: + master_key: sk-1234 database_url: "postgresql://:@:/" # 👈 KEY CHANGE ``` @@ -209,27 +209,27 @@ Save config.yaml as `litellm_config.yaml` (used in 3.2). **What is `general_settings`?** -These are settings for the LiteLLM Proxy Server. +These are settings for the LiteLLM Proxy Server. -See All General Settings [here](http://localhost:3000/docs/proxy/configs#all-settings). +See All General Settings [here](http://localhost:3000/docs/proxy/config_settings). 1. **`master_key`** (`str`) - - **Description**: + - **Description**: - Set a `master key`, this is your Proxy Admin key - you can use this to create other keys (🚨 must start with `sk-`). - - **Usage**: - - ** Set on config.yaml** set your master key under `general_settings:master_key`, example - + - **Usage**: + - ** Set on config.yaml** set your master key under `general_settings:master_key`, example - `master_key: sk-1234` - ** Set env variable** set `LITELLM_MASTER_KEY` 2. **`database_url`** (str) - - **Description**: + - **Description**: - Set a `database_url`, this is the connection to your Postgres DB, which is used by litellm for generating keys, users, teams. - - **Usage**: - - ** Set on config.yaml** set your master key under `general_settings:database_url`, example - + - **Usage**: + - ** Set on config.yaml** set your master key under `general_settings:database_url`, example - `database_url: "postgresql://..."` - - Set `DATABASE_URL=postgresql://:@:/` in your env + - Set `DATABASE_URL=postgresql://:@:/` in your env -### 3.2 Start Proxy +### 3.2 Start Proxy ```bash docker run \ @@ -246,7 +246,7 @@ docker run \ Create a key with `rpm_limit: 1`. This will only allow 1 request per minute for calls to proxy with this key. -```bash +```bash curl -L -X POST 'http://0.0.0.0:4000/key/generate' \ -H 'Authorization: Bearer sk-1234' \ -H 'Content-Type: application/json' \ @@ -265,11 +265,11 @@ curl -L -X POST 'http://0.0.0.0:4000/key/generate' \ } ``` -### 3.4 Test it! +### 3.4 Test it! **Use your virtual key from step 3.3** -1st call - Expect to work! +1st call - Expect to work! ```bash curl -X POST 'http://0.0.0.0:4000/chat/completions' \ @@ -300,7 +300,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \ } ``` -2nd call - Expect to fail! +2nd call - Expect to fail! **Why did this call fail?** @@ -339,7 +339,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \ } ``` -### Useful Links +### Useful Links - [Creating Virtual Keys](./virtual_keys.md) - [Key Management API Endpoints Swagger](https://litellm-api.up.railway.app/#/key%20management) @@ -347,7 +347,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \ - [Dynamic TPM/RPM Limits for keys](./team_budgets.md#dynamic-tpmrpm-allocation) -## Troubleshooting +## Troubleshooting ### Non-root docker image? @@ -355,7 +355,7 @@ If you need to run the docker image as a non-root user, use [this](https://githu ### SSL Verification Issue / Connection Error. -If you see +If you see ```bash ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006) @@ -367,7 +367,7 @@ OR Connection Error. ``` -You can disable ssl verification with: +You can disable ssl verification with: ```yaml model_list: @@ -383,23 +383,23 @@ litellm_settings: ``` -### (DB) All connection attempts failed +### (DB) All connection attempts failed If you see: ``` -httpx.ConnectError: All connection attempts failed - -ERROR: Application startup failed. Exiting. -3:21:43 - LiteLLM Proxy:ERROR: utils.py:2207 - Error getting LiteLLM_SpendLogs row count: All connection attempts failed +httpx.ConnectError: All connection attempts failed + +ERROR: Application startup failed. Exiting. +3:21:43 - LiteLLM Proxy:ERROR: utils.py:2207 - Error getting LiteLLM_SpendLogs row count: All connection attempts failed ``` -This might be a DB permission issue. +This might be a DB permission issue. -1. Validate db user permission issue +1. Validate db user permission issue -Try creating a new database. +Try creating a new database. ```bash STATEMENT: CREATE DATABASE "litellm" @@ -408,10 +408,10 @@ STATEMENT: CREATE DATABASE "litellm" If you get: ``` -ERROR: permission denied to create +ERROR: permission denied to create ``` -This indicates you have a permission issue. +This indicates you have a permission issue. 2. Grant permissions to your DB user @@ -434,7 +434,7 @@ GRANT ALL PRIVILEGES ON DATABASE litellm TO your_username; **What is `litellm_settings`?** -LiteLLM Proxy uses the [LiteLLM Python SDK](https://docs.litellm.ai/docs/routing) for handling LLM API calls. +LiteLLM Proxy uses the [LiteLLM Python SDK](https://docs.litellm.ai/docs/routing) for handling LLM API calls. `litellm_settings` are module-level params for the LiteLLM Python SDK (equivalent to doing `litellm.` on the SDK). You can see all params [here](https://github.com/BerriAI/litellm/blob/208fe6cb90937f73e0def5c97ccb2359bf8a467b/litellm/__init__.py#L114) @@ -446,7 +446,7 @@ LiteLLM Proxy uses the [LiteLLM Python SDK](https://docs.litellm.ai/docs/routing - Our emails ✉️ ishaan@berri.ai / krrish@berri.ai -[![Chat on WhatsApp](https://img.shields.io/static/v1?label=Chat%20on&message=WhatsApp&color=success&logo=WhatsApp&style=flat-square)](https://wa.link/huol9n) [![Chat on Discord](https://img.shields.io/static/v1?label=Chat%20on&message=Discord&color=blue&logo=Discord&style=flat-square)](https://discord.gg/wuPM9dRgDw) +[![Chat on WhatsApp](https://img.shields.io/static/v1?label=Chat%20on&message=WhatsApp&color=success&logo=WhatsApp&style=flat-square)](https://wa.link/huol9n) [![Chat on Discord](https://img.shields.io/static/v1?label=Chat%20on&message=Discord&color=blue&logo=Discord&style=flat-square)](https://discord.gg/wuPM9dRgDw)