import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Virtual Keys Track Spend, and control model access via virtual keys for the proxy :::info - 🔑 [UI to Generate, Edit, Delete Keys (with SSO)](https://docs.litellm.ai/docs/proxy/ui) - [Deploy LiteLLM Proxy with Key Management](https://docs.litellm.ai/docs/proxy/deploy#deploy-with-database) - [Dockerfile.database for LiteLLM Proxy + Key Management](https://github.com/BerriAI/litellm/blob/main/docker/Dockerfile.database) ::: ## Setup Requirements: - Need a postgres database (e.g. [Supabase](https://supabase.com/), [Neon](https://neon.tech/), etc) - Set `DATABASE_URL=postgresql://:@:/` in your env - Set a `master key`, this is your Proxy Admin key - you can use this to create other keys (🚨 must start with `sk-`). - ** Set on config.yaml** set your master key under `general_settings:master_key`, example below - ** Set env variable** set `LITELLM_MASTER_KEY` (the proxy Dockerfile checks if the `DATABASE_URL` is set and then intializes the DB connection) ```shell export DATABASE_URL=postgresql://:@:/ ``` You can then generate keys by hitting the `/key/generate` endpoint. [**See code**](https://github.com/BerriAI/litellm/blob/7a669a36d2689c7f7890bc9c93e04ff3c2641299/litellm/proxy/proxy_server.py#L672) ## **Quick Start - Generate a Key** **Step 1: Save postgres db url** ```yaml model_list: - model_name: gpt-4 litellm_params: model: ollama/llama2 - model_name: gpt-3.5-turbo litellm_params: model: ollama/llama2 general_settings: master_key: sk-1234 database_url: "postgresql://:@:/" # 👈 KEY CHANGE ``` **Step 2: Start litellm** ```shell litellm --config /path/to/config.yaml ``` **Step 3: Generate keys** ```shell curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"], "metadata": {"user": "ishaan@berri.ai"}}' ``` ## Spend Tracking Get spend per: - key - via `/key/info` [Swagger](https://litellm-api.up.railway.app/#/key%20management/info_key_fn_key_info_get) - user - via `/user/info` [Swagger](https://litellm-api.up.railway.app/#/user%20management/user_info_user_info_get) - team - via `/team/info` [Swagger](https://litellm-api.up.railway.app/#/team%20management/team_info_team_info_get) - ⏳ end-users - via `/end_user/info` - [Comment on this issue for end-user cost tracking](https://github.com/BerriAI/litellm/issues/2633) **How is it calculated?** The cost per model is stored [here](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json) and calculated by the [`completion_cost`](https://github.com/BerriAI/litellm/blob/db7974f9f216ee50b53c53120d1e3fc064173b60/litellm/utils.py#L3771) function. **How is it tracking?** Spend is automatically tracked for the key in the "LiteLLM_VerificationTokenTable". If the key has an attached 'user_id' or 'team_id', the spend for that user is tracked in the "LiteLLM_UserTable", and team in the "LiteLLM_TeamTable". You can get spend for a key by using the `/key/info` endpoint. ```bash curl 'http://0.0.0.0:4000/key/info?key=' \ -X GET \ -H 'Authorization: Bearer ' ``` This is automatically updated (in USD) when calls are made to /completions, /chat/completions, /embeddings using litellm's completion_cost() function. [**See Code**](https://github.com/BerriAI/litellm/blob/1a6ea20a0bb66491968907c2bfaabb7fe45fc064/litellm/utils.py#L1654). **Sample response** ```python { "key": "sk-tXL0wt5-lOOVK9sfY2UacA", "info": { "token": "sk-tXL0wt5-lOOVK9sfY2UacA", "spend": 0.0001065, # 👈 SPEND "expires": "2023-11-24T23:19:11.131000Z", "models": [ "gpt-3.5-turbo", "gpt-4", "claude-2" ], "aliases": { "mistral-7b": "gpt-3.5-turbo" }, "config": {} } } ``` **1. Create a user** ```bash curl --location 'http://localhost:4000/user/new' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{user_email: "krrish@berri.ai"}' ``` **Expected Response** ```bash { ... "expires": "2023-12-22T09:53:13.861000Z", "user_id": "my-unique-id", # 👈 unique id "max_budget": 0.0 } ``` **2. Create a key for that user** ```bash curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"], "user_id": "my-unique-id"}' ``` Returns a key - `sk-...`. **3. See spend for user** ```bash curl 'http://0.0.0.0:4000/user/info?user_id=my-unique-id' \ -X GET \ -H 'Authorization: Bearer ' ``` Expected Response ```bash { ... "spend": 0 # 👈 SPEND } ``` Use teams, if you want keys to be owned by multiple people (e.g. for a production app). **1. Create a team** ```bash curl --location 'http://localhost:4000/team/new' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{"team_alias": "my-awesome-team"}' ``` **Expected Response** ```bash { ... "expires": "2023-12-22T09:53:13.861000Z", "team_id": "my-unique-id", # 👈 unique id "max_budget": 0.0 } ``` **2. Create a key for that team** ```bash curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"], "team_id": "my-unique-id"}' ``` Returns a key - `sk-...`. **3. See spend for team** ```bash curl 'http://0.0.0.0:4000/team/info?team_id=my-unique-id' \ -X GET \ -H 'Authorization: Bearer ' ``` Expected Response ```bash { ... "spend": 0 # 👈 SPEND } ``` ## **Model Access** ### **Restrict models by Virtual Key** Set allowed models for a key using the `models` param ```shell curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"]}' ``` :::info This key can only make requests to `models` that are `gpt-3.5-turbo` or `gpt-4` ::: Verify this is set correctly by ```shell curl -i http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-1234" \ -d '{ "model": "gpt-4", "messages": [ {"role": "user", "content": "Hello"} ] }' ``` :::info Expect this to fail since gpt-4o is not in the `models` for the key generated ::: ```shell curl -i http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-1234" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello"} ] }' ``` ### **Restrict models by `team_id`** `litellm-dev` can only access `azure-gpt-3.5` **1. Create a team via `/team/new`** ```shell curl --location 'http://localhost:4000/team/new' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "team_alias": "litellm-dev", "models": ["azure-gpt-3.5"] }' # returns {...,"team_id": "my-unique-id"} ``` **2. Create a key for team** ```shell curl --location 'http://localhost:4000/key/generate' \ --header 'Authorization: Bearer sk-1234' \ --header 'Content-Type: application/json' \ --data-raw '{"team_id": "my-unique-id"}' ``` **3. Test it** ```shell curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer sk-qo992IjKOC2CHKZGRoJIGA' \ --data '{ "model": "BEDROCK_GROUP", "messages": [ { "role": "user", "content": "hi" } ] }' ``` ```shell {"error":{"message":"Invalid model for team litellm-dev: BEDROCK_GROUP. Valid models for team are: ['azure-gpt-3.5']\n\n\nTraceback (most recent call last):\n File \"/Users/ishaanjaffer/Github/litellm/litellm/proxy/proxy_server.py\", line 2298, in chat_completion\n _is_valid_team_configs(\n File \"/Users/ishaanjaffer/Github/litellm/litellm/proxy/utils.py\", line 1296, in _is_valid_team_configs\n raise Exception(\nException: Invalid model for team litellm-dev: BEDROCK_GROUP. Valid models for team are: ['azure-gpt-3.5']\n\n","type":"None","param":"None","code":500}}% ``` ### **Grant Access to new model (Access Groups)** Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.) **Step 1. Assign model, access group in config.yaml** ```yaml model_list: - model_name: gpt-4 litellm_params: model: openai/fake api_key: fake-key api_base: https://exampleopenaiendpoint-production.up.railway.app/ model_info: access_groups: ["beta-models"] # 👈 Model Access Group - model_name: fireworks-llama-v3-70b-instruct litellm_params: model: fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct api_key: "os.environ/FIREWORKS" model_info: access_groups: ["beta-models"] # 👈 Model Access Group ``` **Create key with access group** ```bash curl --location 'http://localhost:4000/key/generate' \ -H 'Authorization: Bearer ' \ -H 'Content-Type: application/json' \ -d '{"models": ["beta-models"], # 👈 Model Access Group "max_budget": 0,}' ``` Test Key ```shell curl -i http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-" \ -d '{ "model": "gpt-4", "messages": [ {"role": "user", "content": "Hello"} ] }' ``` :::info Expect this to fail since gpt-4o is not in the `beta-models` access group ::: ```shell curl -i http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello"} ] }' ``` Create Team ```shell curl --location 'http://localhost:4000/team/new' \ -H 'Authorization: Bearer sk-' \ -H 'Content-Type: application/json' \ -d '{"models": ["beta-models"]}' ``` Create Key for Team ```shell curl --location 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer sk-' \ --header 'Content-Type: application/json' \ --data '{"team_id": "0ac97648-c194-4c90-8cd6-40af7b0d2d2a"} ``` Test Key ```shell curl -i http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-" \ -d '{ "model": "gpt-4", "messages": [ {"role": "user", "content": "Hello"} ] }' ``` :::info Expect this to fail since gpt-4o is not in the `beta-models` access group ::: ```shell curl -i http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello"} ] }' ``` ### Model Aliases If a user is expected to use a given model (i.e. gpt3-5), and you want to: - try to upgrade the request (i.e. GPT4) - or downgrade it (i.e. Mistral) - OR rotate the API KEY (i.e. open AI) - OR access the same model through different end points (i.e. openAI vs openrouter vs Azure) Here's how you can do that: **Step 1: Create a model group in config.yaml (save model name, api keys, etc.)** ```yaml model_list: - model_name: my-free-tier litellm_params: model: huggingface/HuggingFaceH4/zephyr-7b-beta api_base: http://0.0.0.0:8001 - model_name: my-free-tier litellm_params: model: huggingface/HuggingFaceH4/zephyr-7b-beta api_base: http://0.0.0.0:8002 - model_name: my-free-tier litellm_params: model: huggingface/HuggingFaceH4/zephyr-7b-beta api_base: http://0.0.0.0:8003 - model_name: my-paid-tier litellm_params: model: gpt-4 api_key: my-api-key ``` **Step 2: Generate a user key - enabling them access to specific models, custom model aliases, etc.** ```bash curl -X POST "https://0.0.0.0:4000/key/generate" \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "models": ["my-free-tier"], "aliases": {"gpt-3.5-turbo": "my-free-tier"}, "duration": "30min" }' ``` - **How to upgrade / downgrade request?** Change the alias mapping - **How are routing between diff keys/api bases done?** litellm handles this by shuffling between different models in the model list with the same model_name. [**See Code**](https://github.com/BerriAI/litellm/blob/main/litellm/router.py) ## Advanced ### Pass LiteLLM Key in custom header Use this to make LiteLLM proxy look for the virtual key in a custom header instead of the default `"Authorization"` header **Step 1** Define `litellm_key_header_name` name on litellm config.yaml ```yaml model_list: - model_name: fake-openai-endpoint litellm_params: model: openai/fake api_key: fake-key api_base: https://exampleopenaiendpoint-production.up.railway.app/ general_settings: master_key: sk-1234 litellm_key_header_name: "X-Litellm-Key" # 👈 Key Change ``` **Step 2** Test it In this request, litellm will use the Virtual key in the `X-Litellm-Key` header ```shell curl http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "X-Litellm-Key: Bearer sk-1234" \ -H "Authorization: Bearer bad-key" \ -d '{ "model": "fake-openai-endpoint", "messages": [ {"role": "user", "content": "Hello, Claude gm!"} ] }' ``` **Expected Response** Expect to see a successfull response from the litellm proxy since the key passed in `X-Litellm-Key` is valid ```shell {"id":"chatcmpl-f9b2b79a7c30477ab93cd0e717d1773e","choices":[{"finish_reason":"stop","index":0,"message":{"content":"\n\nHello there, how may I assist you today?","role":"assistant","tool_calls":null,"function_call":null}}],"created":1677652288,"model":"gpt-3.5-turbo-0125","object":"chat.completion","system_fingerprint":"fp_44709d6fcb","usage":{"completion_tokens":12,"prompt_tokens":9,"total_tokens":21} ``` ```python client = openai.OpenAI( api_key="not-used", base_url="https://api-gateway-url.com/llmservc/api/litellmp", default_headers={ "Authorization": f"Bearer {API_GATEWAY_TOKEN}", # (optional) For your API Gateway "X-Litellm-Key": f"Bearer sk-1234" # For LiteLLM Proxy } ) ``` ### Enable/Disable Virtual Keys **Disable Keys** ```bash curl -L -X POST 'http://0.0.0.0:4000/key/block' \ -H 'Authorization: Bearer LITELLM_MASTER_KEY' \ -H 'Content-Type: application/json' \ -d '{"key": "KEY-TO-BLOCK"}' ``` Expected Response: ```bash { ... "blocked": true } ``` **Enable Keys** ```bash curl -L -X POST 'http://0.0.0.0:4000/key/unblock' \ -H 'Authorization: Bearer LITELLM_MASTER_KEY' \ -H 'Content-Type: application/json' \ -d '{"key": "KEY-TO-UNBLOCK"}' ``` ```bash { ... "blocked": false } ``` ### Custom Auth You can now override the default api key auth. Here's how: #### 1. Create a custom auth file. Make sure the response type follows the `UserAPIKeyAuth` pydantic object. This is used by for logging usage specific to that user key. ```python from litellm.proxy._types import UserAPIKeyAuth async def user_api_key_auth(request: Request, api_key: str) -> UserAPIKeyAuth: try: modified_master_key = "sk-my-master-key" if api_key == modified_master_key: return UserAPIKeyAuth(api_key=api_key) raise Exception except: raise Exception ``` #### 2. Pass the filepath (relative to the config.yaml) Pass the filepath to the config.yaml e.g. if they're both in the same dir - `./config.yaml` and `./custom_auth.py`, this is what it looks like: ```yaml model_list: - model_name: "openai-model" litellm_params: model: "gpt-3.5-turbo" litellm_settings: drop_params: True set_verbose: True general_settings: custom_auth: custom_auth.user_api_key_auth ``` [**Implementation Code**](https://github.com/BerriAI/litellm/blob/caf2a6b279ddbe89ebd1d8f4499f65715d684851/litellm/proxy/utils.py#L122) #### 3. Start the proxy ```shell $ litellm --config /path/to/config.yaml ``` ### Custom /key/generate If you need to add custom logic before generating a Proxy API Key (Example Validating `team_id`) #### 1. Write a custom `custom_generate_key_fn` The input to the custom_generate_key_fn function is a single parameter: `data` [(Type: GenerateKeyRequest)](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/_types.py#L125) The output of your `custom_generate_key_fn` should be a dictionary with the following structure ```python { "decision": False, "message": "This violates LiteLLM Proxy Rules. No team id provided.", } ``` - decision (Type: bool): A boolean value indicating whether the key generation is allowed (True) or not (False). - message (Type: str, Optional): An optional message providing additional information about the decision. This field is included when the decision is False. ```python async def custom_generate_key_fn(data: GenerateKeyRequest)-> dict: """ Asynchronous function for generating a key based on the input data. Args: data (GenerateKeyRequest): The input data for key generation. Returns: dict: A dictionary containing the decision and an optional message. { "decision": False, "message": "This violates LiteLLM Proxy Rules. No team id provided.", } """ # decide if a key should be generated or not print("using custom auth function!") data_json = data.json() # type: ignore # Unpacking variables team_id = data_json.get("team_id") duration = data_json.get("duration") models = data_json.get("models") aliases = data_json.get("aliases") config = data_json.get("config") spend = data_json.get("spend") user_id = data_json.get("user_id") max_parallel_requests = data_json.get("max_parallel_requests") metadata = data_json.get("metadata") tpm_limit = data_json.get("tpm_limit") rpm_limit = data_json.get("rpm_limit") if team_id is not None and team_id == "litellm-core-infra@gmail.com": # only team_id="litellm-core-infra@gmail.com" can make keys return { "decision": True, } else: print("Failed custom auth") return { "decision": False, "message": "This violates LiteLLM Proxy Rules. No team id provided.", } ``` #### 2. Pass the filepath (relative to the config.yaml) Pass the filepath to the config.yaml e.g. if they're both in the same dir - `./config.yaml` and `./custom_auth.py`, this is what it looks like: ```yaml model_list: - model_name: "openai-model" litellm_params: model: "gpt-3.5-turbo" litellm_settings: drop_params: True set_verbose: True general_settings: custom_key_generate: custom_auth.custom_generate_key_fn ``` ### Upperbound /key/generate params Use this, if you need to set default upperbounds for `max_budget`, `budget_duration` or any `key/generate` param per key. Set `litellm_settings:upperbound_key_generate_params`: ```yaml litellm_settings: upperbound_key_generate_params: max_budget: 100 # Optional[float], optional): upperbound of $100, for all /key/generate requests budget_duration: "10d" # Optional[str], optional): upperbound of 10 days for budget_duration values duration: "30d" # Optional[str], optional): upperbound of 30 days for all /key/generate requests max_parallel_requests: 1000 # (Optional[int], optional): Max number of requests that can be made in parallel. Defaults to None. tpm_limit: 1000 #(Optional[int], optional): Tpm limit. Defaults to None. rpm_limit: 1000 #(Optional[int], optional): Rpm limit. Defaults to None. ``` ** Expected Behavior ** - Send a `/key/generate` request with `max_budget=200` - Key will be created with `max_budget=100` since 100 is the upper bound ### Default /key/generate params Use this, if you need to control the default `max_budget` or any `key/generate` param per key. When a `/key/generate` request does not specify `max_budget`, it will use the `max_budget` specified in `default_key_generate_params` Set `litellm_settings:default_key_generate_params`: ```yaml litellm_settings: default_key_generate_params: max_budget: 1.5000 models: ["azure-gpt-3.5"] duration: # blank means `null` metadata: {"setting":"default"} team_id: "core-infra" ``` ## **Next Steps - Set Budgets, Rate Limits per Virtual Key** [Follow this doc to set budgets, rate limiters per virtual key with LiteLLM](users) ## Endpoint Reference (Spec) ### Keys #### [**👉 API REFERENCE DOCS**](https://litellm-api.up.railway.app/#/key%20management/) ### Users #### [**👉 API REFERENCE DOCS**](https://litellm-api.up.railway.app/#/user%20management/) ### Teams #### [**👉 API REFERENCE DOCS**](https://litellm-api.up.railway.app/#/team%20management)