mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-26 19:24:27 +00:00
fix(model_management.md): add docs on model management on proxy
This commit is contained in:
parent
92df0418e7
commit
e20c322f1e
4 changed files with 131 additions and 89 deletions
|
@ -8,71 +8,46 @@ Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`m
|
||||||
| `general_settings` | Server settings, example setting `master_key: sk-my_special_key` |
|
| `general_settings` | Server settings, example setting `master_key: sk-my_special_key` |
|
||||||
| `environment_variables` | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` |
|
| `environment_variables` | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` |
|
||||||
|
|
||||||
#### Example Config
|
## Quick Start
|
||||||
|
|
||||||
|
Set a model alias for your deployments.
|
||||||
|
|
||||||
|
In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment.
|
||||||
|
|
||||||
|
In the config below requests with:
|
||||||
|
- `model=vllm-models` will route to `openai/facebook/opt-125m`.
|
||||||
|
- `model=gpt-3.5-turbo` will load balance between `azure/gpt-turbo-small-eu` and `azure/gpt-turbo-small-ca`
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
model_list:
|
model_list:
|
||||||
- model_name: gpt-3.5-turbo
|
- model_name: gpt-3.5-turbo # user-facing model alias
|
||||||
litellm_params:
|
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
|
||||||
model: azure/gpt-turbo-small-eu
|
model: azure/gpt-turbo-small-eu
|
||||||
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
|
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
|
||||||
api_key:
|
api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
|
||||||
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
|
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
|
||||||
- model_name: gpt-3.5-turbo
|
- model_name: gpt-3.5-turbo
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: azure/gpt-turbo-small-ca
|
model: azure/gpt-turbo-small-ca
|
||||||
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
|
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
|
||||||
api_key:
|
api_key: "os.environ/AZURE_API_KEY_CA"
|
||||||
rpm: 6
|
rpm: 6
|
||||||
- model_name: gpt-3.5-turbo
|
- model_name: vllm-models
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: azure/gpt-turbo-large
|
model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
|
||||||
api_base: https://openai-france-1234.openai.azure.com/
|
api_base: http://0.0.0.0:8000
|
||||||
api_key:
|
|
||||||
rpm: 1440
|
rpm: 1440
|
||||||
|
model_info:
|
||||||
|
version: 2
|
||||||
|
|
||||||
litellm_settings:
|
litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
|
||||||
drop_params: True
|
drop_params: True
|
||||||
set_verbose: True
|
set_verbose: True
|
||||||
|
|
||||||
general_settings:
|
general_settings:
|
||||||
master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
|
master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
|
||||||
|
|
||||||
|
|
||||||
environment_variables:
|
|
||||||
OPENAI_API_KEY: sk-123
|
|
||||||
REPLICATE_API_KEY: sk-cohere-is-okay
|
|
||||||
REDIS_HOST: redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com
|
|
||||||
REDIS_PORT: "16337"
|
|
||||||
REDIS_PASSWORD:
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Config for Multiple Models - GPT-4, Claude-2
|
|
||||||
|
|
||||||
Here's how you can use multiple llms with one proxy `config.yaml`.
|
|
||||||
|
|
||||||
#### Step 1: Setup Config
|
|
||||||
```yaml
|
|
||||||
model_list:
|
|
||||||
- model_name: zephyr-alpha # the 1st model is the default on the proxy
|
|
||||||
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
|
|
||||||
model: huggingface/HuggingFaceH4/zephyr-7b-alpha
|
|
||||||
api_base: http://0.0.0.0:8001
|
|
||||||
- model_name: gpt-4
|
|
||||||
litellm_params:
|
|
||||||
model: gpt-4
|
|
||||||
api_key: sk-1233
|
|
||||||
- model_name: claude-2
|
|
||||||
litellm_params:
|
|
||||||
model: claude-2
|
|
||||||
api_key: sk-claude
|
|
||||||
```
|
|
||||||
|
|
||||||
:::info
|
|
||||||
|
|
||||||
The proxy uses the first model in the config as the default model - in this config the default model is `zephyr-alpha`
|
|
||||||
:::
|
|
||||||
|
|
||||||
|
|
||||||
#### Step 2: Start Proxy with config
|
#### Step 2: Start Proxy with config
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
|
@ -96,32 +71,11 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \
|
||||||
'
|
'
|
||||||
```
|
```
|
||||||
|
|
||||||
### Config for Embedding Models - xorbitsai/inference
|
## Save Model-specific params (API Base, API Keys, Temperature, Headers etc.)
|
||||||
|
|
||||||
Here's how you can use multiple llms with one proxy `config.yaml`.
|
|
||||||
Here is how [LiteLLM calls OpenAI Compatible Embedding models](https://docs.litellm.ai/docs/embedding/supported_embedding#openai-compatible-embedding-models)
|
|
||||||
|
|
||||||
#### Config
|
|
||||||
```yaml
|
|
||||||
model_list:
|
|
||||||
- model_name: custom_embedding_model
|
|
||||||
litellm_params:
|
|
||||||
model: openai/custom_embedding # the `openai/` prefix tells litellm it's openai compatible
|
|
||||||
api_base: http://0.0.0.0:8000/
|
|
||||||
- model_name: custom_embedding_model
|
|
||||||
litellm_params:
|
|
||||||
model: openai/custom_embedding # the `openai/` prefix tells litellm it's openai compatible
|
|
||||||
api_base: http://0.0.0.0:8001/
|
|
||||||
```
|
|
||||||
|
|
||||||
Run the proxy using this config
|
|
||||||
```shell
|
|
||||||
$ litellm --config /path/to/config.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
### Save Model-specific params (API Base, API Keys, Temperature, Headers etc.)
|
|
||||||
You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.
|
You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.
|
||||||
|
|
||||||
|
[**All input params**](https://docs.litellm.ai/docs/completion/input#input-params-1)
|
||||||
|
|
||||||
**Step 1**: Create a `config.yaml` file
|
**Step 1**: Create a `config.yaml` file
|
||||||
```yaml
|
```yaml
|
||||||
model_list:
|
model_list:
|
||||||
|
@ -152,9 +106,11 @@ model_list:
|
||||||
$ litellm --config /path/to/config.yaml
|
$ litellm --config /path/to/config.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
### Load API Keys from Vault
|
## Load API Keys
|
||||||
|
|
||||||
If you have secrets saved in Azure Vault, etc. and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment.
|
### Load API Keys from Environment
|
||||||
|
|
||||||
|
If you have secrets saved in your environment, and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
os.environ["AZURE_NORTH_AMERICA_API_KEY"] = "your-azure-api-key"
|
os.environ["AZURE_NORTH_AMERICA_API_KEY"] = "your-azure-api-key"
|
||||||
|
@ -174,30 +130,42 @@ model_list:
|
||||||
|
|
||||||
s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this.
|
s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this.
|
||||||
|
|
||||||
### Config for setting Model Aliases
|
### Load API Keys from Azure Vault
|
||||||
|
|
||||||
Set a model alias for your deployments.
|
1. Install Proxy dependencies
|
||||||
|
```bash
|
||||||
|
$ pip install litellm[proxy] litellm[extra_proxy]
|
||||||
|
```
|
||||||
|
|
||||||
In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment.
|
2. Save Azure details in your environment
|
||||||
|
```bash
|
||||||
In the config below requests with `model=gpt-4` will route to `ollama/llama2`
|
export["AZURE_CLIENT_ID"]="your-azure-app-client-id"
|
||||||
|
export["AZURE_CLIENT_SECRET"]="your-azure-app-client-secret"
|
||||||
|
export["AZURE_TENANT_ID"]="your-azure-tenant-id"
|
||||||
|
export["AZURE_KEY_VAULT_URI"]="your-azure-key-vault-uri"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Add to proxy config.yaml
|
||||||
```yaml
|
```yaml
|
||||||
model_list:
|
model_list:
|
||||||
- model_name: text-davinci-003
|
- model_name: "my-azure-models" # model alias
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: ollama/zephyr
|
model: "azure/<your-deployment-name>"
|
||||||
- model_name: gpt-4
|
api_key: "os.environ/AZURE-API-KEY" # reads from key vault - get_secret("AZURE_API_KEY")
|
||||||
litellm_params:
|
api_base: "os.environ/AZURE-API-BASE" # reads from key vault - get_secret("AZURE_API_BASE")
|
||||||
model: ollama/llama2
|
|
||||||
- model_name: gpt-3.5-turbo
|
general_settings:
|
||||||
litellm_params:
|
use_azure_key_vault: True
|
||||||
model: ollama/llama2
|
```
|
||||||
|
|
||||||
|
You can now test this by starting your proxy:
|
||||||
|
```bash
|
||||||
|
litellm --config /path/to/config.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
### Set Custom Prompt Templates
|
### Set Custom Prompt Templates
|
||||||
|
|
||||||
LiteLLM by default checks if a model has a [prompt template and applies it](./completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`:
|
LiteLLM by default checks if a model has a [prompt template and applies it](../completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`:
|
||||||
|
|
||||||
**Step 1**: Save your prompt template in a `config.yaml`
|
**Step 1**: Save your prompt template in a `config.yaml`
|
||||||
```yaml
|
```yaml
|
||||||
|
|
74
docs/my-website/docs/proxy/model_management.md
Normal file
74
docs/my-website/docs/proxy/model_management.md
Normal file
|
@ -0,0 +1,74 @@
|
||||||
|
# Model Management
|
||||||
|
Add new models + Get model info without restarting proxy.
|
||||||
|
|
||||||
|
## Get Model Information
|
||||||
|
|
||||||
|
Retrieve detailed information about each model listed in the `/models` endpoint, including descriptions from the `config.yaml` file, and additional model info (e.g. max tokens, cost per input token, etc.) pulled the model_info you set and the litellm model cost map. Sensitive details like API keys are excluded for security purposes.
|
||||||
|
|
||||||
|
<Tabs
|
||||||
|
defaultValue="curl"
|
||||||
|
values={[
|
||||||
|
{ label: 'cURL', value: 'curl', },
|
||||||
|
]}>
|
||||||
|
<TabItem value="curl">
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X GET "http://0.0.0.0:8000/model/info" \
|
||||||
|
-H "accept: application/json" \
|
||||||
|
```
|
||||||
|
</TabItem>
|
||||||
|
</Tabs>
|
||||||
|
|
||||||
|
## Add a New Model
|
||||||
|
|
||||||
|
Add a new model to the list in the `config.yaml` by providing the model parameters. This allows you to update the model list without restarting the proxy.
|
||||||
|
|
||||||
|
<Tabs
|
||||||
|
defaultValue="curl"
|
||||||
|
values={[
|
||||||
|
{ label: 'cURL', value: 'curl', },
|
||||||
|
]}>
|
||||||
|
<TabItem value="curl">
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST "http://0.0.0.0:8000/model/new" \
|
||||||
|
-H "accept: application/json" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{ "model_name": "azure-gpt-turbo", "litellm_params": {"model": "azure/gpt-3.5-turbo", "api_key": "os.environ/AZURE_API_KEY", "api_base": "my-azure-api-base"} }'
|
||||||
|
```
|
||||||
|
</TabItem>
|
||||||
|
</Tabs>
|
||||||
|
|
||||||
|
|
||||||
|
### Model Parameters Structure
|
||||||
|
|
||||||
|
When adding a new model, your JSON payload should conform to the following structure:
|
||||||
|
|
||||||
|
- `model_name`: The name of the new model (required).
|
||||||
|
- `litellm_params`: A dictionary containing parameters specific to the Litellm setup (required).
|
||||||
|
- `model_info`: An optional dictionary to provide additional information about the model.
|
||||||
|
|
||||||
|
Here's an example of how to structure your `ModelParams`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"model_name": "my_awesome_model",
|
||||||
|
"litellm_params": {
|
||||||
|
"some_parameter": "some_value",
|
||||||
|
"another_parameter": "another_value"
|
||||||
|
},
|
||||||
|
"model_info": {
|
||||||
|
"author": "Your Name",
|
||||||
|
"version": "1.0",
|
||||||
|
"description": "A brief description of the model."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
---
|
||||||
|
|
||||||
|
Keep in mind that as both endpoints are in [BETA], you may need to visit the associated GitHub issues linked in the API descriptions to check for updates or provide feedback:
|
||||||
|
|
||||||
|
- Get Model Information: [Issue #933](https://github.com/BerriAI/litellm/issues/933)
|
||||||
|
- Add a New Model: [Issue #964](https://github.com/BerriAI/litellm/issues/964)
|
||||||
|
|
||||||
|
Feedback on the beta endpoints is valuable and helps improve the API for all users.
|
|
@ -1,5 +1,4 @@
|
||||||
|
# Key Management
|
||||||
# Cost Tracking & Virtual Keys
|
|
||||||
Track Spend and create virtual keys for the proxy
|
Track Spend and create virtual keys for the proxy
|
||||||
|
|
||||||
Grant other's temporary access to your proxy, with keys that expire after a set duration.
|
Grant other's temporary access to your proxy, with keys that expire after a set duration.
|
||||||
|
|
|
@ -99,6 +99,7 @@ const sidebars = {
|
||||||
"proxy/configs",
|
"proxy/configs",
|
||||||
"proxy/load_balancing",
|
"proxy/load_balancing",
|
||||||
"proxy/virtual_keys",
|
"proxy/virtual_keys",
|
||||||
|
"proxy/model_management",
|
||||||
"proxy/caching",
|
"proxy/caching",
|
||||||
"proxy/logging",
|
"proxy/logging",
|
||||||
"proxy/cli",
|
"proxy/cli",
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue