fix(model_management.md): add docs on model management on proxy

2025-04-26 11:14:04 +00:00 · 2023-12-04 09:36:37 -08:00 · 2023-12-04 09:36:37 -08:00 · e20c322f1e
commit e20c322f1e
parent 92df0418e7
4 changed files with 131 additions and 89 deletions
--- a/docs/my-website/docs/proxy/configs.md
+++ b/docs/my-website/docs/proxy/configs.md
@ -8,71 +8,46 @@ Set model list, `api_base`, `api_key`, `temperature` & proxy server settings (`m
 | `general_settings`   | Server settings, example setting `master_key: sk-my_special_key` |
 | `environment_variables`   | Environment Variables example, `REDIS_HOST`, `REDIS_PORT` |

-#### Example Config
+## Quick Start 
+
+Set a model alias for your deployments. 
+
+In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment. 
+
+In the config below requests with:
+- `model=vllm-models` will route to `openai/facebook/opt-125m`. 
+- `model=gpt-3.5-turbo` will load balance between `azure/gpt-turbo-small-eu` and `azure/gpt-turbo-small-ca`
+
 ```yaml
 model_list:
-  - model_name: gpt-3.5-turbo
-    litellm_params:
+  - model_name: gpt-3.5-turbo # user-facing model alias
+    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
      model: azure/gpt-turbo-small-eu
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
-      api_key: 
+      api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
      rpm: 6      # Rate limit for this deployment: in requests per minute (rpm)
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
-      api_key: 
+      api_key: "os.environ/AZURE_API_KEY_CA"
      rpm: 6
-  - model_name: gpt-3.5-turbo
+  - model_name: vllm-models
    litellm_params:
-      model: azure/gpt-turbo-large
-      api_base: https://openai-france-1234.openai.azure.com/
-      api_key: 
+      model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
+      api_base: http://0.0.0.0:8000
      rpm: 1440
+    model_info: 
+      version: 2

-litellm_settings:
+litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
  drop_params: True
  set_verbose: True

 general_settings: 
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
-
-
-environment_variables:
-  OPENAI_API_KEY: sk-123
-  REPLICATE_API_KEY: sk-cohere-is-okay
-  REDIS_HOST: redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com
-  REDIS_PORT: "16337"
-  REDIS_PASSWORD: 
 ```

-### Config for Multiple Models - GPT-4, Claude-2
-
-Here's how you can use multiple llms with one proxy `config.yaml`. 
-
-#### Step 1: Setup Config
-```yaml
-model_list:
-  - model_name: zephyr-alpha # the 1st model is the default on the proxy
-    litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
-      model: huggingface/HuggingFaceH4/zephyr-7b-alpha
-      api_base: http://0.0.0.0:8001
-  - model_name: gpt-4
-    litellm_params:
-      model: gpt-4
-      api_key: sk-1233
-  - model_name: claude-2
-    litellm_params:
-      model: claude-2
-      api_key: sk-claude    
-```
-
-:::info
-
-The proxy uses the first model in the config as the default model - in this config the default model is `zephyr-alpha`
-:::
-
-
 #### Step 2: Start Proxy with config

 ```shell
@ -96,32 +71,11 @@ curl --location 'http://0.0.0.0:8000/chat/completions' \
 '
 ```

-### Config for Embedding Models - xorbitsai/inference
-
-Here's how you can use multiple llms with one proxy `config.yaml`. 
-Here is how [LiteLLM calls OpenAI Compatible Embedding models](https://docs.litellm.ai/docs/embedding/supported_embedding#openai-compatible-embedding-models)
-
-#### Config
-```yaml
-model_list:
-  - model_name: custom_embedding_model
-    litellm_params:
-      model: openai/custom_embedding  # the `openai/` prefix tells litellm it's openai compatible
-      api_base: http://0.0.0.0:8000/
-  - model_name: custom_embedding_model
-    litellm_params:
-      model: openai/custom_embedding  # the `openai/` prefix tells litellm it's openai compatible
-      api_base: http://0.0.0.0:8001/
-```
-
-Run the proxy using this config
-```shell
-$ litellm --config /path/to/config.yaml
-```
-
-### Save Model-specific params (API Base, API Keys, Temperature, Headers etc.)
+## Save Model-specific params (API Base, API Keys, Temperature, Headers etc.)
 You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc. 

+[**All input params**](https://docs.litellm.ai/docs/completion/input#input-params-1)
+
 **Step 1**: Create a `config.yaml` file
 ```yaml
 model_list:
@ -152,9 +106,11 @@ model_list:
 $ litellm --config /path/to/config.yaml
 ```

-### Load API Keys from Vault 
+## Load API Keys

-If you have secrets saved in Azure Vault, etc. and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment. 
+### Load API Keys from Environment 
+
+If you have secrets saved in your environment, and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment. 

 ```python
 os.environ["AZURE_NORTH_AMERICA_API_KEY"] = "your-azure-api-key"
@ -174,30 +130,42 @@ model_list:

 s/o to [@David Manouchehri](https://www.linkedin.com/in/davidmanouchehri/) for helping with this. 

-### Config for setting Model Aliases
+### Load API Keys from Azure Vault 

-Set a model alias for your deployments. 
+1. Install Proxy dependencies 
+```bash
+$ pip install litellm[proxy] litellm[extra_proxy]
+```

-In the `config.yaml` the model_name parameter is the user-facing name to use for your deployment. 
-
-In the config below requests with `model=gpt-4` will route to `ollama/llama2`
+2. Save Azure details in your environment
+```bash 
+export["AZURE_CLIENT_ID"]="your-azure-app-client-id"
+export["AZURE_CLIENT_SECRET"]="your-azure-app-client-secret"
+export["AZURE_TENANT_ID"]="your-azure-tenant-id"
+export["AZURE_KEY_VAULT_URI"]="your-azure-key-vault-uri"
+```

+3. Add to proxy config.yaml 
 ```yaml
-model_list:
-  - model_name: text-davinci-003
-    litellm_params:
-        model: ollama/zephyr
-  - model_name: gpt-4
-    litellm_params:
-        model: ollama/llama2
-  - model_name: gpt-3.5-turbo
-    litellm_params:
-        model: ollama/llama2
+model_list: 
+    - model_name: "my-azure-models" # model alias 
+        litellm_params:
+            model: "azure/<your-deployment-name>"
+            api_key: "os.environ/AZURE-API-KEY" # reads from key vault - get_secret("AZURE_API_KEY")
+            api_base: "os.environ/AZURE-API-BASE" # reads from key vault - get_secret("AZURE_API_BASE")
+
+general_settings:
+  use_azure_key_vault: True
+```
+
+You can now test this by starting your proxy: 
+```bash
+litellm --config /path/to/config.yaml
 ```

 ### Set Custom Prompt Templates

-LiteLLM by default checks if a model has a [prompt template and applies it](./completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`: 
+LiteLLM by default checks if a model has a [prompt template and applies it](../completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`: 

 **Step 1**: Save your prompt template in a `config.yaml`
 ```yaml
--- a/docs/my-website/docs/proxy/model_management.md
+++ b/docs/my-website/docs/proxy/model_management.md
@ -0,0 +1,74 @@
+# Model Management
+Add new models + Get model info without restarting proxy.
+
+## Get Model Information
+
+Retrieve detailed information about each model listed in the `/models` endpoint, including descriptions from the `config.yaml` file, and additional model info (e.g. max tokens, cost per input token, etc.) pulled the model_info you set and the litellm model cost map. Sensitive details like API keys are excluded for security purposes.
+
+<Tabs
+  defaultValue="curl"
+  values={[
+    { label: 'cURL', value: 'curl', },
+  ]}>
+  <TabItem value="curl">
+
+```bash
+curl -X GET "http://0.0.0.0:8000/model/info" \
+     -H "accept: application/json" \
+```
+  </TabItem>
+</Tabs>
+
+## Add a New Model
+
+Add a new model to the list in the `config.yaml` by providing the model parameters. This allows you to update the model list without restarting the proxy.
+
+<Tabs
+  defaultValue="curl"
+  values={[
+    { label: 'cURL', value: 'curl', },
+  ]}>
+  <TabItem value="curl">
+
+```bash
+curl -X POST "http://0.0.0.0:8000/model/new" \
+     -H "accept: application/json" \
+     -H "Content-Type: application/json" \
+     -d '{ "model_name": "azure-gpt-turbo", "litellm_params": {"model": "azure/gpt-3.5-turbo", "api_key": "os.environ/AZURE_API_KEY", "api_base": "my-azure-api-base"} }'
+```
+  </TabItem>
+</Tabs>
+
+
+### Model Parameters Structure
+
+When adding a new model, your JSON payload should conform to the following structure:
+
+- `model_name`: The name of the new model (required).
+- `litellm_params`: A dictionary containing parameters specific to the Litellm setup (required).
+- `model_info`: An optional dictionary to provide additional information about the model.
+
+Here's an example of how to structure your `ModelParams`:
+
+```json
+{
+  "model_name": "my_awesome_model",
+  "litellm_params": {
+    "some_parameter": "some_value",
+    "another_parameter": "another_value"
+  },
+  "model_info": {
+    "author": "Your Name",
+    "version": "1.0",
+    "description": "A brief description of the model."
+  }
+}
+```
+---
+
+Keep in mind that as both endpoints are in [BETA], you may need to visit the associated GitHub issues linked in the API descriptions to check for updates or provide feedback:
+
+- Get Model Information: [Issue #933](https://github.com/BerriAI/litellm/issues/933)
+- Add a New Model: [Issue #964](https://github.com/BerriAI/litellm/issues/964)
+
+Feedback on the beta endpoints is valuable and helps improve the API for all users.
--- a/docs/my-website/docs/proxy/virtual_keys.md
+++ b/docs/my-website/docs/proxy/virtual_keys.md
@ -1,5 +1,4 @@
-
-# Cost Tracking & Virtual Keys
+# Key Management
 Track Spend and create virtual keys for the proxy

 Grant other's temporary access to your proxy, with keys that expire after a set duration.
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -99,6 +99,7 @@ const sidebars = {
        "proxy/configs", 
        "proxy/load_balancing", 
        "proxy/virtual_keys",
+        "proxy/model_management",
        "proxy/caching",
        "proxy/logging", 
        "proxy/cli",