fix(model_management.md): add docs on model management on proxy

This commit is contained in:
Krrish Dholakia 2023-12-04 09:36:37 -08:00
parent 0d44f5e441
commit 31d9762b50
4 changed files with 131 additions and 89 deletions

View file

@ -8,71 +8,46 @@ Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`m
| `general_settings` | Server settings, example setting `master_key: sk-my_special_key` | | `general_settings` | Server settings, example setting `master_key: sk-my_special_key` |
| `environment_variables` | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` | | `environment_variables` | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` |
#### Example Config ## Quick Start
Set a model alias for your deployments.
In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment.
In the config below requests with:
- `model=vllm-models` will route to `openai/facebook/opt-125m`.
- `model=gpt-3.5-turbo` will load balance between `azure/gpt-turbo-small-eu` and `azure/gpt-turbo-small-ca`
```yaml ```yaml
model_list: model_list:
- model_name: gpt-3.5-turbo - model_name: gpt-3.5-turbo # user-facing model alias
litellm_params: litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: azure/gpt-turbo-small-eu model: azure/gpt-turbo-small-eu
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/ api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
api_key: api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm) rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo - model_name: gpt-3.5-turbo
litellm_params: litellm_params:
model: azure/gpt-turbo-small-ca model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/ api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: api_key: "os.environ/AZURE_API_KEY_CA"
rpm: 6 rpm: 6
- model_name: gpt-3.5-turbo - model_name: vllm-models
litellm_params: litellm_params:
model: azure/gpt-turbo-large model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
api_base: https://openai-france-1234.openai.azure.com/ api_base: http://0.0.0.0:8000
api_key:
rpm: 1440 rpm: 1440
model_info:
version: 2
litellm_settings: litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
drop_params: True drop_params: True
set_verbose: True set_verbose: True
general_settings: general_settings:
master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234) master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
environment_variables:
OPENAI_API_KEY: sk-123
REPLICATE_API_KEY: sk-cohere-is-okay
REDIS_HOST: redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com
REDIS_PORT: "16337"
REDIS_PASSWORD:
``` ```
### Config for Multiple Models - GPT-4, Claude-2
Here's how you can use multiple llms with one proxy `config.yaml`.
#### Step 1: Setup Config
```yaml
model_list:
- model_name: zephyr-alpha # the 1st model is the default on the proxy
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
model: huggingface/HuggingFaceH4/zephyr-7b-alpha
api_base: http://0.0.0.0:8001
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: sk-1233
- model_name: claude-2
litellm_params:
model: claude-2
api_key: sk-claude
```
:::info
The proxy uses the first model in the config as the default model - in this config the default model is `zephyr-alpha`
:::
#### Step 2: Start Proxy with config #### Step 2: Start Proxy with config
```shell ```shell
@ -96,32 +71,11 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \
' '
``` ```
### Config for Embedding Models - xorbitsai/inference ## Save Model-specific params (API Base, API Keys, Temperature, Headers etc.)
Here's how you can use multiple llms with one proxy `config.yaml`.
Here is how [LiteLLM calls OpenAI Compatible Embedding models](https://docs.litellm.ai/docs/embedding/supported_embedding#openai-compatible-embedding-models)
#### Config
```yaml
model_list:
- model_name: custom_embedding_model
litellm_params:
model: openai/custom_embedding # the `openai/` prefix tells litellm it's openai compatible
api_base: http://0.0.0.0:8000/
- model_name: custom_embedding_model
litellm_params:
model: openai/custom_embedding # the `openai/` prefix tells litellm it's openai compatible
api_base: http://0.0.0.0:8001/
```
Run the proxy using this config
```shell
$ litellm --config /path/to/config.yaml
```
### Save Model-specific params (API Base, API Keys, Temperature, Headers etc.)
You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc. You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.
[**All input params**](https://docs.litellm.ai/docs/completion/input#input-params-1)
**Step 1**: Create a `config.yaml` file **Step 1**: Create a `config.yaml` file
```yaml ```yaml
model_list: model_list:
@ -152,9 +106,11 @@ model_list:
$ litellm --config /path/to/config.yaml $ litellm --config /path/to/config.yaml
``` ```
### Load API Keys from Vault ## Load API Keys
If you have secrets saved in Azure Vault, etc. and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment. ### Load API Keys from Environment
If you have secrets saved in your environment, and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment.
```python ```python
os.environ["AZURE_NORTH_AMERICA_API_KEY"] = "your-azure-api-key" os.environ["AZURE_NORTH_AMERICA_API_KEY"] = "your-azure-api-key"
@ -174,30 +130,42 @@ model_list:
s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this. s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this.
### Config for setting Model Aliases ### Load API Keys from Azure Vault
Set a model alias for your deployments. 1. Install Proxy dependencies
```bash
$ pip install litellm[proxy] litellm[extra_proxy]
```
In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment. 2. Save Azure details in your environment
```bash
In the config below requests with `model=gpt-4` will route to `ollama/llama2` export["AZURE_CLIENT_ID"]="your-azure-app-client-id"
export["AZURE_CLIENT_SECRET"]="your-azure-app-client-secret"
export["AZURE_TENANT_ID"]="your-azure-tenant-id"
export["AZURE_KEY_VAULT_URI"]="your-azure-key-vault-uri"
```
3. Add to proxy config.yaml
```yaml ```yaml
model_list: model_list:
- model_name: text-davinci-003 - model_name: "my-azure-models" # model alias
litellm_params: litellm_params:
model: ollama/zephyr model: "azure/<your-deployment-name>"
- model_name: gpt-4 api_key: "os.environ/AZURE-API-KEY" # reads from key vault - get_secret("AZURE_API_KEY")
litellm_params: api_base: "os.environ/AZURE-API-BASE" # reads from key vault - get_secret("AZURE_API_BASE")
model: ollama/llama2
- model_name: gpt-3.5-turbo general_settings:
litellm_params: use_azure_key_vault: True
model: ollama/llama2 ```
You can now test this by starting your proxy:
```bash
litellm --config /path/to/config.yaml
``` ```
### Set Custom Prompt Templates ### Set Custom Prompt Templates
LiteLLM by default checks if a model has a [prompt template and applies it](./completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`: LiteLLM by default checks if a model has a [prompt template and applies it](../completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`:
**Step 1**: Save your prompt template in a `config.yaml` **Step 1**: Save your prompt template in a `config.yaml`
```yaml ```yaml

View file

@ -0,0 +1,74 @@
# Model Management
Add new models + Get model info without restarting proxy.
## Get Model Information
Retrieve detailed information about each model listed in the `/models` endpoint, including descriptions from the `config.yaml` file, and additional model info (e.g. max tokens, cost per input token, etc.) pulled the model_info you set and the litellm model cost map. Sensitive details like API keys are excluded for security purposes.
<Tabs
defaultValue="curl"
values={[
{ label: 'cURL', value: 'curl', },
]}>
<TabItem value="curl">
```bash
curl -X GET "http://0.0.0.0:8000/model/info" \
-H "accept: application/json" \
```
</TabItem>
</Tabs>
## Add a New Model
Add a new model to the list in the `config.yaml` by providing the model parameters. This allows you to update the model list without restarting the proxy.
<Tabs
defaultValue="curl"
values={[
{ label: 'cURL', value: 'curl', },
]}>
<TabItem value="curl">
```bash
curl -X POST "http://0.0.0.0:8000/model/new" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{ "model_name": "azure-gpt-turbo", "litellm_params": {"model": "azure/gpt-3.5-turbo", "api_key": "os.environ/AZURE_API_KEY", "api_base": "my-azure-api-base"} }'
```
</TabItem>
</Tabs>
### Model Parameters Structure
When adding a new model, your JSON payload should conform to the following structure:
- `model_name`: The name of the new model (required).
- `litellm_params`: A dictionary containing parameters specific to the Litellm setup (required).
- `model_info`: An optional dictionary to provide additional information about the model.
Here's an example of how to structure your `ModelParams`:
```json
{
"model_name": "my_awesome_model",
"litellm_params": {
"some_parameter": "some_value",
"another_parameter": "another_value"
},
"model_info": {
"author": "Your Name",
"version": "1.0",
"description": "A brief description of the model."
}
}
```
---
Keep in mind that as both endpoints are in [BETA], you may need to visit the associated GitHub issues linked in the API descriptions to check for updates or provide feedback:
- Get Model Information: [Issue #933](https://github.com/BerriAI/litellm/issues/933)
- Add a New Model: [Issue #964](https://github.com/BerriAI/litellm/issues/964)
Feedback on the beta endpoints is valuable and helps improve the API for all users.

View file

@ -1,5 +1,4 @@
# Key Management
# Cost Tracking & Virtual Keys
Track Spend and create virtual keys for the proxy Track Spend and create virtual keys for the proxy
Grant other's temporary access to your proxy, with keys that expire after a set duration. Grant other's temporary access to your proxy, with keys that expire after a set duration.

View file

@ -99,6 +99,7 @@ const sidebars = {
"proxy/configs", "proxy/configs",
"proxy/load_balancing", "proxy/load_balancing",
"proxy/virtual_keys", "proxy/virtual_keys",
"proxy/model_management",
"proxy/caching", "proxy/caching",
"proxy/logging", "proxy/logging",
"proxy/cli", "proxy/cli",