litellm/docs/my-website/docs/proxy/virtual_keys.md
Krish Dholakia 424b8b0231
Litellm dev 11 23 2024 (#6881)
* build(ui/create_key_button.tsx): support adding tags for cost tracking/routing when making key

* LiteLLM Minor Fixes & Improvements (11/23/2024)  (#6870)

* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc.

* fix(utils.py): allow disabling end user cost tracking with new param

Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small

* docs(configs.md): add disable_end_user_cost_tracking reference to docs

* feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role

Enables admin to restrict key creation, and assign team admins to handle distributing keys

* test(test_key_management.py): add unit testing for personal / team key restriction checks

* docs: add docs on restricting key creation

* docs(finetuned_models.md): add new guide on calling finetuned models

* docs(input.md): cleanup anthropic supported params

Closes https://github.com/BerriAI/litellm/issues/6856

* test(test_embedding.py): add test for passing extra headers via embedding

* feat(cohere/embed): pass client to async embedding

* feat(rerank.py): add `/v1/rerank` if missing for cohere base url

Closes https://github.com/BerriAI/litellm/issues/6844

* fix(main.py): pass extra_headers param to openai

Fixes https://github.com/BerriAI/litellm/issues/6836

* fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set

Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically

* fix(handler.py): fix linting error

* fix: fix typing

* build: add conftest to proxy_admin_ui_tests/

* test: fix test

* fix: fix linting errors

* test: fix test

* fix: fix pass through testing

* feat(key_management_endpoints.py): allow proxy_admin to enforce params on key creation

allows admin to force team keys to have tags

* build(ui/): show teams in leftnav + allow team admin to add new members

* build(ui/): show created tags in dropdown

makes it easier for admin to add tags to keys

* test(test_key_management.py): fix test

* test: fix test

* fix playwright e2e ui test

* fix e2e ui testing deps

* fix: fix linting errors

* fix e2e ui testing

* fix e2e ui testing, only run e2e ui testing in playwright

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2024-11-23 22:37:16 +05:30

24 KiB

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Virtual Keys

Track Spend, and control model access via virtual keys for the proxy

:::info

:::

Setup

Requirements:

  • Need a postgres database (e.g. Supabase, Neon, etc)
  • Set DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> in your env
  • Set a master key, this is your Proxy Admin key - you can use this to create other keys (🚨 must start with sk-).
    • ** Set on config.yaml** set your master key under general_settings:master_key, example below
    • ** Set env variable** set LITELLM_MASTER_KEY

(the proxy Dockerfile checks if the DATABASE_URL is set and then intializes the DB connection)

export DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname>

You can then generate keys by hitting the /key/generate endpoint.

See code

Quick Start - Generate a Key

Step 1: Save postgres db url

model_list:
  - model_name: gpt-4
    litellm_params:
        model: ollama/llama2
  - model_name: gpt-3.5-turbo
    litellm_params:
        model: ollama/llama2

general_settings: 
  master_key: sk-1234 
  database_url: "postgresql://<user>:<password>@<host>:<port>/<dbname>" # 👈 KEY CHANGE

Step 2: Start litellm

litellm --config /path/to/config.yaml

Step 3: Generate keys

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"], "metadata": {"user": "ishaan@berri.ai"}}'

Spend Tracking

Get spend per:

How is it calculated?

The cost per model is stored here and calculated by the completion_cost function.

How is it tracking?

Spend is automatically tracked for the key in the "LiteLLM_VerificationTokenTable". If the key has an attached 'user_id' or 'team_id', the spend for that user is tracked in the "LiteLLM_UserTable", and team in the "LiteLLM_TeamTable".

You can get spend for a key by using the /key/info endpoint.

curl 'http://0.0.0.0:4000/key/info?key=<user-key>' \
     -X GET \
     -H 'Authorization: Bearer <your-master-key>'

This is automatically updated (in USD) when calls are made to /completions, /chat/completions, /embeddings using litellm's completion_cost() function. See Code.

Sample response

{
    "key": "sk-tXL0wt5-lOOVK9sfY2UacA",
    "info": {
        "token": "sk-tXL0wt5-lOOVK9sfY2UacA",
        "spend": 0.0001065, # 👈 SPEND
        "expires": "2023-11-24T23:19:11.131000Z",
        "models": [
            "gpt-3.5-turbo",
            "gpt-4",
            "claude-2"
        ],
        "aliases": {
            "mistral-7b": "gpt-3.5-turbo"
        },
        "config": {}
    }
}

1. Create a user

curl --location 'http://localhost:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{user_email: "krrish@berri.ai"}' 

Expected Response

{
    ...
    "expires": "2023-12-22T09:53:13.861000Z",
    "user_id": "my-unique-id", # 👈 unique id
    "max_budget": 0.0
}

2. Create a key for that user

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"], "user_id": "my-unique-id"}'

Returns a key - sk-....

3. See spend for user

curl 'http://0.0.0.0:4000/user/info?user_id=my-unique-id' \
     -X GET \
     -H 'Authorization: Bearer <your-master-key>'

Expected Response

{
  ...
  "spend": 0 # 👈 SPEND
}

Use teams, if you want keys to be owned by multiple people (e.g. for a production app).

1. Create a team

curl --location 'http://localhost:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"team_alias": "my-awesome-team"}' 

Expected Response

{
    ...
    "expires": "2023-12-22T09:53:13.861000Z",
    "team_id": "my-unique-id", # 👈 unique id
    "max_budget": 0.0
}

2. Create a key for that team

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"], "team_id": "my-unique-id"}'

Returns a key - sk-....

3. See spend for team

curl 'http://0.0.0.0:4000/team/info?team_id=my-unique-id' \
     -X GET \
     -H 'Authorization: Bearer <your-master-key>'

Expected Response

{
  ...
  "spend": 0 # 👈 SPEND
}

Model Access

Restrict models by Virtual Key

Set allowed models for a key using the models param

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4"]}'

:::info

This key can only make requests to models that are gpt-3.5-turbo or gpt-4

:::

Verify this is set correctly by

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

:::info

Expect this to fail since gpt-4o is not in the models for the key generated

:::

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Restrict models by team_id

litellm-dev can only access azure-gpt-3.5

1. Create a team via /team/new

curl --location 'http://localhost:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_alias": "litellm-dev",
  "models": ["azure-gpt-3.5"]
}' 

# returns {...,"team_id": "my-unique-id"}

2. Create a key for team

curl --location 'http://localhost:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{"team_id": "my-unique-id"}'

3. Test it

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-qo992IjKOC2CHKZGRoJIGA' \
    --data '{
        "model": "BEDROCK_GROUP",
        "messages": [
            {
                "role": "user",
                "content": "hi"
            }
        ]
    }'
{"error":{"message":"Invalid model for team litellm-dev: BEDROCK_GROUP.  Valid models for team are: ['azure-gpt-3.5']\n\n\nTraceback (most recent call last):\n  File \"/Users/ishaanjaffer/Github/litellm/litellm/proxy/proxy_server.py\", line 2298, in chat_completion\n    _is_valid_team_configs(\n  File \"/Users/ishaanjaffer/Github/litellm/litellm/proxy/utils.py\", line 1296, in _is_valid_team_configs\n    raise Exception(\nException: Invalid model for team litellm-dev: BEDROCK_GROUP.  Valid models for team are: ['azure-gpt-3.5']\n\n","type":"None","param":"None","code":500}}%            

Grant Access to new model (Access Groups)

Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.)

Step 1. Assign model, access group in config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/fake
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/
    model_info:
      access_groups: ["beta-models"] # 👈 Model Access Group
  - model_name: fireworks-llama-v3-70b-instruct
    litellm_params:
      model: fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct
      api_key: "os.environ/FIREWORKS"
    model_info:
      access_groups: ["beta-models"] # 👈 Model Access Group

Create key with access group

curl --location 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 Model Access Group
			"max_budget": 0,}'

Test Key

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-<key-from-previous-step>" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

:::info

Expect this to fail since gpt-4o is not in the beta-models access group

:::

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-<key-from-previous-step>" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Create Team

curl --location 'http://localhost:4000/team/new' \
-H 'Authorization: Bearer sk-<key-from-previous-step>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"]}'

Create Key for Team

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-<key-from-previous-step>' \
--header 'Content-Type: application/json' \
--data '{"team_id": "0ac97648-c194-4c90-8cd6-40af7b0d2d2a"}

Test Key

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-<key-from-previous-step>" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

:::info

Expect this to fail since gpt-4o is not in the beta-models access group

:::

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-<key-from-previous-step>" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Model Aliases

If a user is expected to use a given model (i.e. gpt3-5), and you want to:

  • try to upgrade the request (i.e. GPT4)
  • or downgrade it (i.e. Mistral)
  • OR rotate the API KEY (i.e. open AI)
  • OR access the same model through different end points (i.e. openAI vs openrouter vs Azure)

Here's how you can do that:

Step 1: Create a model group in config.yaml (save model name, api keys, etc.)

model_list:
  - model_name: my-free-tier
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8001
  - model_name: my-free-tier
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8002
  - model_name: my-free-tier
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8003
	- model_name: my-paid-tier
    litellm_params:
        model: gpt-4
        api_key: my-api-key

Step 2: Generate a user key - enabling them access to specific models, custom model aliases, etc.

curl -X POST "https://0.0.0.0:4000/key/generate" \
-H "Authorization: Bearer <your-master-key>" \
-H "Content-Type: application/json" \
-d '{
	"models": ["my-free-tier"], 
	"aliases": {"gpt-3.5-turbo": "my-free-tier"}, 
	"duration": "30min"
}'
  • How to upgrade / downgrade request? Change the alias mapping
  • How are routing between diff keys/api bases done? litellm handles this by shuffling between different models in the model list with the same model_name. See Code

Advanced

Pass LiteLLM Key in custom header

Use this to make LiteLLM proxy look for the virtual key in a custom header instead of the default "Authorization" header

Step 1 Define litellm_key_header_name name on litellm config.yaml

model_list:
  - model_name: fake-openai-endpoint
    litellm_params:
      model: openai/fake
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/

general_settings: 
  master_key: sk-1234 
  litellm_key_header_name: "X-Litellm-Key" # 👈 Key Change

Step 2 Test it

In this request, litellm will use the Virtual key in the X-Litellm-Key header

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Litellm-Key: Bearer sk-1234" \
  -H "Authorization: Bearer bad-key" \
  -d '{
    "model": "fake-openai-endpoint",
    "messages": [
      {"role": "user", "content": "Hello, Claude gm!"}
    ]
  }'

Expected Response

Expect to see a successfull response from the litellm proxy since the key passed in X-Litellm-Key is valid

{"id":"chatcmpl-f9b2b79a7c30477ab93cd0e717d1773e","choices":[{"finish_reason":"stop","index":0,"message":{"content":"\n\nHello there, how may I assist you today?","role":"assistant","tool_calls":null,"function_call":null}}],"created":1677652288,"model":"gpt-3.5-turbo-0125","object":"chat.completion","system_fingerprint":"fp_44709d6fcb","usage":{"completion_tokens":12,"prompt_tokens":9,"total_tokens":21}
client = openai.OpenAI(
    api_key="not-used",
    base_url="https://api-gateway-url.com/llmservc/api/litellmp",
    default_headers={
        "Authorization": f"Bearer {API_GATEWAY_TOKEN}", # (optional) For your API Gateway
        "X-Litellm-Key": f"Bearer sk-1234"              # For LiteLLM Proxy
    }
)

Enable/Disable Virtual Keys

Disable Keys

curl -L -X POST 'http://0.0.0.0:4000/key/block' \
-H 'Authorization: Bearer LITELLM_MASTER_KEY' \
-H 'Content-Type: application/json' \
-d '{"key": "KEY-TO-BLOCK"}'

Expected Response:

{
  ...
  "blocked": true
}

Enable Keys

curl -L -X POST 'http://0.0.0.0:4000/key/unblock' \
-H 'Authorization: Bearer LITELLM_MASTER_KEY' \
-H 'Content-Type: application/json' \
-d '{"key": "KEY-TO-UNBLOCK"}'
{
  ...
  "blocked": false
}

Custom Auth

You can now override the default api key auth.

Here's how:

1. Create a custom auth file.

Make sure the response type follows the UserAPIKeyAuth pydantic object. This is used by for logging usage specific to that user key.

from litellm.proxy._types import UserAPIKeyAuth

async def user_api_key_auth(request: Request, api_key: str) -> UserAPIKeyAuth: 
    try: 
        modified_master_key = "sk-my-master-key"
        if api_key == modified_master_key:
            return UserAPIKeyAuth(api_key=api_key)
        raise Exception
    except: 
        raise Exception

2. Pass the filepath (relative to the config.yaml)

Pass the filepath to the config.yaml

e.g. if they're both in the same dir - ./config.yaml and ./custom_auth.py, this is what it looks like:

model_list: 
  - model_name: "openai-model"
    litellm_params: 
      model: "gpt-3.5-turbo"

litellm_settings:
  drop_params: True
  set_verbose: True

general_settings:
  custom_auth: custom_auth.user_api_key_auth

Implementation Code

3. Start the proxy

$ litellm --config /path/to/config.yaml 

Custom /key/generate

If you need to add custom logic before generating a Proxy API Key (Example Validating team_id)

1. Write a custom custom_generate_key_fn

The input to the custom_generate_key_fn function is a single parameter: data (Type: GenerateKeyRequest)

The output of your custom_generate_key_fn should be a dictionary with the following structure

{
    "decision": False,
    "message": "This violates LiteLLM Proxy Rules. No team id provided.",
}

  • decision (Type: bool): A boolean value indicating whether the key generation is allowed (True) or not (False).

  • message (Type: str, Optional): An optional message providing additional information about the decision. This field is included when the decision is False.

async def custom_generate_key_fn(data: GenerateKeyRequest)-> dict:
        """
        Asynchronous function for generating a key based on the input data.

        Args:
            data (GenerateKeyRequest): The input data for key generation.

        Returns:
            dict: A dictionary containing the decision and an optional message.
            {
                "decision": False,
                "message": "This violates LiteLLM Proxy Rules. No team id provided.",
            }
        """
        
        # decide if a key should be generated or not
        print("using custom auth function!")
        data_json = data.json()  # type: ignore

        # Unpacking variables
        team_id = data_json.get("team_id")
        duration = data_json.get("duration")
        models = data_json.get("models")
        aliases = data_json.get("aliases")
        config = data_json.get("config")
        spend = data_json.get("spend")
        user_id = data_json.get("user_id")
        max_parallel_requests = data_json.get("max_parallel_requests")
        metadata = data_json.get("metadata")
        tpm_limit = data_json.get("tpm_limit")
        rpm_limit = data_json.get("rpm_limit")

        if team_id is not None and team_id == "litellm-core-infra@gmail.com":
            # only team_id="litellm-core-infra@gmail.com" can make keys
            return {
                "decision": True,
            }
        else:
            print("Failed custom auth")
            return {
                "decision": False,
                "message": "This violates LiteLLM Proxy Rules. No team id provided.",
            }

2. Pass the filepath (relative to the config.yaml)

Pass the filepath to the config.yaml

e.g. if they're both in the same dir - ./config.yaml and ./custom_auth.py, this is what it looks like:

model_list: 
  - model_name: "openai-model"
    litellm_params: 
      model: "gpt-3.5-turbo"

litellm_settings:
  drop_params: True
  set_verbose: True

general_settings:
  custom_key_generate: custom_auth.custom_generate_key_fn

Upperbound /key/generate params

Use this, if you need to set default upperbounds for max_budget, budget_duration or any key/generate param per key.

Set litellm_settings:upperbound_key_generate_params:

litellm_settings:
  upperbound_key_generate_params:
    max_budget: 100 # Optional[float], optional): upperbound of $100, for all /key/generate requests
    budget_duration: "10d" # Optional[str], optional): upperbound of 10 days for budget_duration values
    duration: "30d" # Optional[str], optional): upperbound of 30 days for all /key/generate requests
    max_parallel_requests: 1000 # (Optional[int], optional): Max number of requests that can be made in parallel. Defaults to None.
    tpm_limit: 1000 #(Optional[int], optional): Tpm limit. Defaults to None.
    rpm_limit: 1000 #(Optional[int], optional): Rpm limit. Defaults to None.

** Expected Behavior **

  • Send a /key/generate request with max_budget=200
  • Key will be created with max_budget=100 since 100 is the upper bound

Default /key/generate params

Use this, if you need to control the default max_budget or any key/generate param per key.

When a /key/generate request does not specify max_budget, it will use the max_budget specified in default_key_generate_params

Set litellm_settings:default_key_generate_params:

litellm_settings:
  default_key_generate_params:
    max_budget: 1.5000
    models: ["azure-gpt-3.5"]
    duration:     # blank means `null`
    metadata: {"setting":"default"}
    team_id: "core-infra"

Restricting Key Generation

Use this to control who can generate keys. Useful when letting others create keys on the UI.

litellm_settings:
  key_generation_settings:
    team_key_generation:
      allowed_team_member_roles: ["admin"]
      required_params: ["tags"] # require team admins to set tags for cost-tracking when generating a team key
    personal_key_generation: # maps to 'Default Team' on UI 
      allowed_user_roles: ["proxy_admin"]

Spec

class TeamUIKeyGenerationConfig(TypedDict):
    allowed_team_member_roles: List[str]
    required_params: List[str] # require params on `/key/generate` to be set if a team key (team_id in request) is being generated


class PersonalUIKeyGenerationConfig(TypedDict):
    allowed_user_roles: List[LitellmUserRoles] 
    required_params: List[str] # require params on `/key/generate` to be set if a personal key (no team_id in request) is being generated


class StandardKeyGenerationConfig(TypedDict, total=False):
    team_key_generation: TeamUIKeyGenerationConfig
    personal_key_generation: PersonalUIKeyGenerationConfig


class LitellmUserRoles(str, enum.Enum):
    """
    Admin Roles:
    PROXY_ADMIN: admin over the platform
    PROXY_ADMIN_VIEW_ONLY: can login, view all own keys, view all spend
    ORG_ADMIN: admin over a specific organization, can create teams, users only within their organization

    Internal User Roles:
    INTERNAL_USER: can login, view/create/delete their own keys, view their spend
    INTERNAL_USER_VIEW_ONLY: can login, view their own keys, view their own spend


    Team Roles:
    TEAM: used for JWT auth


    Customer Roles:
    CUSTOMER: External users -> these are customers

    """

    # Admin Roles
    PROXY_ADMIN = "proxy_admin"
    PROXY_ADMIN_VIEW_ONLY = "proxy_admin_viewer"

    # Organization admins
    ORG_ADMIN = "org_admin"

    # Internal User Roles
    INTERNAL_USER = "internal_user"
    INTERNAL_USER_VIEW_ONLY = "internal_user_viewer"

    # Team Roles
    TEAM = "team"

    # Customer Roles - External users of proxy
    CUSTOMER = "customer"

Next Steps - Set Budgets, Rate Limits per Virtual Key

Follow this doc to set budgets, rate limiters per virtual key with LiteLLM

Endpoint Reference (Spec)

Keys

👉 API REFERENCE DOCS

Users

👉 API REFERENCE DOCS

Teams

👉 API REFERENCE DOCS