phoenix/litellm

Fork 0

forked from phoenix/litellm-mirror

ishaan-jaff ea6f42216c (docs) use port 4000

2024-03-08 21:59:00 -08:00

23 KiB

Raw Blame History

🔑 Virtual Keys, Users

Track Spend, Set budgets and create virtual keys for the proxy

Grant other's temporary access to your proxy, with keys that expire after a set duration.

:::info

🔑 UI to Generate, Edit, Delete Keys (with SSO)
Deploy LiteLLM Proxy with Key Management
Dockerfile.database for LiteLLM Proxy + Key Management here

:::

Setup

Requirements:

Need a postgres database (e.g. Supabase, Neon, etc)
Set DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> in your env
Set a master key, this is your Proxy Admin key - you can use this to create other keys
- ** Set on config.yaml** set your master key under general_settings:master_key, example below
- ** Set env variable** set LITELLM_MASTER_KEY (Note: either set this on the config.yaml or in your env whatever is more convenient for you)

(the proxy Dockerfile checks if the DATABASE_URL is set and then intializes the DB connection)

export DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname>

You can then generate temporary keys by hitting the /key/generate endpoint.

See code

Step 1: Save postgres db url

model_list:
  - model_name: gpt-4
    litellm_params:
        model: ollama/llama2
  - model_name: gpt-3.5-turbo
    litellm_params:
        model: ollama/llama2

general_settings: 
  master_key: sk-1234 # [OPTIONAL] if set all calls to proxy will require either this key or a valid generated token
  database_url: "postgresql://<user>:<password>@<host>:<port>/<dbname>"

Step 2: Start litellm

litellm --config /path/to/config.yaml

Step 3: Generate temporary keys

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "ishaan@berri.ai"}}'

/key/generate

Request

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "models": ["gpt-3.5-turbo", "gpt-4", "claude-2"],
  "duration": "20m",
  "metadata": {"user": "ishaan@berri.ai"},
  "team_id": "core-infra",
  "max_budget": 10,
  "soft_budget": 5,
}'

Request Params:

duration: Optional[str] - Specify the length of time the token is valid for. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
key_alias: Optional[str] - User defined key alias
team_id: Optional[str] - The team id of the user
models: Optional[list] - Model_name's a user is allowed to call. (if empty, key is allowed to call all models)
aliases: Optional[dict] - Any alias mappings, on top of anything in the config.yaml model list. - https://docs.litellm.ai/docs/proxy/virtual_keys#managing-auth---upgradedowngrade-models
config: Optional[dict] - any key-specific configs, overrides config in config.yaml
spend: Optional[int] - Amount spent by key. Default is 0. Will be updated by proxy whenever key is used. https://docs.litellm.ai/docs/proxy/virtual_keys#managing-auth---tracking-spend
max_budget: Optional[float] - Specify max budget for a given key.
soft_budget: Optional[float] - Specify soft limit budget for a given key. Get Alerts when key hits its soft budget
model_max_budget: Optional[dict[str, float]] - Specify max budget for each model, model_max_budget={"gpt4": 0.5, "gpt-5": 0.01}
max_parallel_requests: Optional[int] - Rate limit a user based on the number of parallel requests. Raises 429 error, if user's parallel requests > x.
metadata: Optional[dict] - Metadata for key, store information for key. Example metadata = {"team": "core-infra", "app": "app2", "email": "ishaan@berri.ai" }

Response

{
    "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
    "expires": "2023-11-19T01:38:25.834000+00:00" # datetime object
    "key_name": "sk-...7sFA" # abbreviated key string, ONLY stored in db if `allow_user_auth: true` set - [see](./ui.md)
    ...
}

Upgrade/Downgrade Models

If a user is expected to use a given model (i.e. gpt3-5), and you want to:

try to upgrade the request (i.e. GPT4)
or downgrade it (i.e. Mistral)
OR rotate the API KEY (i.e. open AI)
OR access the same model through different end points (i.e. openAI vs openrouter vs Azure)

Here's how you can do that:

Step 1: Create a model group in config.yaml (save model name, api keys, etc.)

model_list:
  - model_name: my-free-tier
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8001
  - model_name: my-free-tier
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8002
  - model_name: my-free-tier
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8003
	- model_name: my-paid-tier
    litellm_params:
        model: gpt-4
        api_key: my-api-key

Step 2: Generate a user key - enabling them access to specific models, custom model aliases, etc.

curl -X POST "https://0.0.0.0:4000/key/generate" \
-H "Authorization: Bearer <your-master-key>" \
-H "Content-Type: application/json" \
-d '{
	"models": ["my-free-tier"], 
	"aliases": {"gpt-3.5-turbo": "my-free-tier"}, 
	"duration": "30min"
}'

How to upgrade / downgrade request? Change the alias mapping
How are routing between diff keys/api bases done? litellm handles this by shuffling between different models in the model list with the same model_name. See Code

Grant Access to new model

Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.)

Step 1. Assign model, access group in config.yaml

model_list:
  - model_name: text-embedding-ada-002
    litellm_params:
      model: azure/azure-embedding-model
      api_base: "os.environ/AZURE_API_BASE"
      api_key: "os.environ/AZURE_API_KEY"
      api_version: "2023-07-01-preview"
    model_info:
      access_groups: ["beta-models"] # 👈 Model Access Group

Step 2. Create key with access group

curl --location 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 Model Access Group
			"max_budget": 0,}'

/key/info

Request

curl -X GET "http://0.0.0.0:4000/key/info?key=sk-02Wr4IAlN3NvPXvL5JVvDA" \
-H "Authorization: Bearer sk-1234"

Request Params:

key: str - The key you want the info for

Response

token is the hashed key (The DB stores the hashed key for security)

{
  "key": "sk-02Wr4IAlN3NvPXvL5JVvDA",
  "info": {
    "token": "80321a12d03412c527f2bd9db5fabd746abead2e1d50b435a534432fbaca9ef5",
    "spend": 0.0,
    "expires": "2024-01-18T23:52:09.125000+00:00",
    "models": ["azure-gpt-3.5", "azure-embedding-model"],
    "aliases": {},
    "config": {},
    "user_id": "ishaan2@berri.ai",
    "team_id": "None",
    "max_parallel_requests": null,
    "metadata": {}
  }
}

/key/update

Request

curl 'http://0.0.0.0:4000/key/update' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "key": "sk-kdEXbIqZRwEeEiHwdg7sFA",
  "models": ["gpt-3.5-turbo", "gpt-4", "claude-2"],
  "metadata": {"user": "ishaan@berri.ai"},
  "team_id": "core-infra"
}'

Request Params:

key: str - The key that needs to be updated.
models: list or null (optional) - Specify the models a token has access to. If null, then the token has access to all models on the server.
metadata: dict or null (optional) - Pass metadata for the updated token. If null, defaults to an empty dictionary.
team_id: str or null (optional) - Specify the team_id for the associated key.

Response

{
  "key": "sk-kdEXbIqZRwEeEiHwdg7sFA",
  "models": ["gpt-3.5-turbo", "gpt-4", "claude-2"],
  "metadata": {
    "user": "ishaan@berri.ai"
  }
}

/key/delete

Request

curl 'http://0.0.0.0:4000/key/delete' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "keys": ["sk-kdEXbIqZRwEeEiHwdg7sFA"]
}'

Request Params:

keys: List[str] - List of keys to delete

Response

{
  "deleted_keys": ["sk-kdEXbIqZRwEeEiHwdg7sFA"]
}

/user/new

Request

All key/generate params supported for creating a user

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{
  "user_id": "ishaan1",
  "user_email": "ishaan@litellm.ai",
  "user_role": "admin",
  "team_id": "cto-team",
  "max_budget": 20,
  "budget_duration": "1h"

}'

Request Params:

user_id: str (optional - defaults to uuid) - The unique identifier for the user.
user_email: str (optional - defaults to "") - The email address associated with the user.
user_role: str (optional - defaults to "app_user") - The role assigned to the user. Can be "admin", "app_owner", "app_user"

Possible user_role values

"admin" - Maintaining the proxy and owning the overall budget
"app_owner" - employees maintaining the apps, each owner may own more than one app
"app_user" - users who know nothing about the proxy. These users get created when you pass `user` to /chat/completions

team_id: str (optional - defaults to "") - The identifier for the team to which the user belongs.
max_budget: float (optional - defaults to null) - The maximum budget allocated for the user. No budget checks done if max_budget==null
budget_duration: str (optional - defaults to null) - The duration for which the budget is valid, e.g., "1h", "1d"

Response

A key will be generated for the new user created

{
  "models": [],
  "spend": 0.0,
  "max_budget": null,
  "user_id": "ishaan1",
  "team_id": null,
  "max_parallel_requests": null,
  "metadata": {},
  "tpm_limit": null,
  "rpm_limit": null,
  "budget_duration": null,
  "allowed_cache_controls": [],
  "key_alias": null,
  "duration": null,
  "aliases": {},
  "config": {},
  "key": "sk-JflB33ucTqc2NYvNAgiBCA",
  "key_name": null,
  "expires": null
}

/user/info

Request

View all Users

If you're trying to view all users, we recommend using pagination with the following args

view_all=true
page=0 Optional(int) min = 0, default=0
page_size=25 Optional(int) min = 1, default = 25

curl -X GET "http://0.0.0.0:4000/user/info?view_all=true&page=0&page_size=25" -H "Authorization: Bearer sk-1234"

View specific user_id

curl -X GET "http://0.0.0.0:4000/user/info?user_id=228da235-eef0-4c30-bf53-5d6ac0d278c2" -H "Authorization: Bearer sk-1234"

Response

View user spend, budget, models, keys and teams

{
  "user_id": "228da235-eef0-4c30-bf53-5d6ac0d278c2",
  "user_info": {
    "user_id": "228da235-eef0-4c30-bf53-5d6ac0d278c2",
    "team_id": null,
    "teams": [],
    "user_role": "app_user",
    "max_budget": null,
    "spend": 200000.0,
    "user_email": null,
    "models": [],
    "max_parallel_requests": null,
    "tpm_limit": null,
    "rpm_limit": null,
    "budget_duration": null,
    "budget_reset_at": null,
    "allowed_cache_controls": [],
    "model_spend": {
      "chatgpt-v-2": 200000
    },
    "model_max_budget": {}
  },
  "keys": [
    {
      "token": "16c337f9df00a0e6472627e39a2ed02e67bc9a8a760c983c4e9b8cad7954f3c0",
      "key_name": null,
      "key_alias": null,
      "spend": 200000.0,
      "expires": null,
      "models": [],
      "aliases": {},
      "config": {},
      "user_id": "228da235-eef0-4c30-bf53-5d6ac0d278c2",
      "team_id": null,
      "permissions": {},
      "max_parallel_requests": null,
      "metadata": {},
      "tpm_limit": null,
      "rpm_limit": null,
      "max_budget": null,
      "budget_duration": null,
      "budget_reset_at": null,
      "allowed_cache_controls": [],
      "model_spend": {
        "chatgpt-v-2": 200000
      },
      "model_max_budget": {}
    }
  ],
  "teams": []
}

Advanced

Upperbound /key/generate params

Use this, if you need to control the upperbound that users can use for max_budget, budget_duration or any key/generate param per key.

Set litellm_settings:upperbound_key_generate_params:

litellm_settings:
  upperbound_key_generate_params:
    max_budget: 100 # upperbound of $100, for all /key/generate requests
    duration: "30d" # upperbound of 30 days for all /key/generate requests

** Expected Behavior **

Send a /key/generate request with max_budget=200
Key will be created with max_budget=100 since 100 is the upper bound

Default /key/generate params

Use this, if you need to control the default max_budget or any key/generate param per key.

When a /key/generate request does not specify max_budget, it will use the max_budget specified in default_key_generate_params

Set litellm_settings:default_key_generate_params:

litellm_settings:
  default_key_generate_params:
    max_budget: 1.5000
    models: ["azure-gpt-3.5"]
    duration:     # blank means `null`
    metadata: {"setting":"default"}
    team_id: "core-infra"

Restrict models by `team_id`

litellm-dev can only access azure-gpt-3.5

litellm_settings:
  default_team_settings:
    - team_id: litellm-dev
      models: ["azure-gpt-3.5"]

Create key with team_id="litellm-dev"

curl --location 'http://localhost:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{"team_id": "litellm-dev"}'

Use Key to call invalid model - Fails

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-qo992IjKOC2CHKZGRoJIGA' \
    --data '{
        "model": "BEDROCK_GROUP",
        "messages": [
            {
                "role": "user",
                "content": "hi"
            }
        ]
    }'

{"error":{"message":"Invalid model for team litellm-dev: BEDROCK_GROUP.  Valid models for team are: ['azure-gpt-3.5']\n\n\nTraceback (most recent call last):\n  File \"/Users/ishaanjaffer/Github/litellm/litellm/proxy/proxy_server.py\", line 2298, in chat_completion\n    _is_valid_team_configs(\n  File \"/Users/ishaanjaffer/Github/litellm/litellm/proxy/utils.py\", line 1296, in _is_valid_team_configs\n    raise Exception(\nException: Invalid model for team litellm-dev: BEDROCK_GROUP.  Valid models for team are: ['azure-gpt-3.5']\n\n","type":"None","param":"None","code":500}}%

Set Budgets - Per Key

Set max_budget in (USD $) param in the key/generate request. By default the max_budget is set to null and is not checked for keys

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "metadata": {"user": "ishaan@berri.ai"},
  "team_id": "core-infra",
  "max_budget": 10,
}'

Expected Behaviour

Costs Per key get auto-populated in LiteLLM_VerificationToken Table
After the key crosses it's max_budget, requests fail

Example Request to /chat/completions when key has crossed budget

curl --location 'http://0.0.0.0:4000/chat/completions' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer sk-ULl_IKCVFy2EZRzQB16RUA' \
  --data ' {
  "model": "azure-gpt-3.5",
  "user": "e09b4da8-ed80-4b05-ac93-e16d9eb56fca",
  "messages": [
      {
      "role": "user",
      "content": "respond in 50 lines"
      }
  ],
}'

Expected Response from /chat/completions when key has crossed budget

{
  "detail":"Authentication Error, ExceededTokenBudget: Current spend for token: 7.2e-05; Max Budget for Token: 2e-07"
}

Set Budgets - Per User

LiteLLM exposes a /user/new endpoint to create budgets for users, that persist across multiple keys.

This is documented in the swagger (live on your server root endpoint - e.g. http://0.0.0.0:4000/). Here's an example request.

curl --location 'http://localhost:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}'

The request is a normal /key/generate request body + a max_budget field.

Sample Response

{
    "key": "sk-YF2OxDbrgd1y2KgwxmEA2w",
    "expires": "2023-12-22T09:53:13.861000Z",
    "user_id": "krrish3@berri.ai",
    "max_budget": 0.0
}

Tracking Spend

You can get spend for a key by using the /key/info endpoint.

curl 'http://0.0.0.0:4000/key/info?key=<user-key>' \
     -X GET \
     -H 'Authorization: Bearer <your-master-key>'

This is automatically updated (in USD) when calls are made to /completions, /chat/completions, /embeddings using litellm's completion_cost() function. See Code.

Sample response

{
    "key": "sk-tXL0wt5-lOOVK9sfY2UacA",
    "info": {
        "token": "sk-tXL0wt5-lOOVK9sfY2UacA",
        "spend": 0.0001065,
        "expires": "2023-11-24T23:19:11.131000Z",
        "models": [
            "gpt-3.5-turbo",
            "gpt-4",
            "claude-2"
        ],
        "aliases": {
            "mistral-7b": "gpt-3.5-turbo"
        },
        "config": {}
    }
}

Custom Auth

You can now override the default api key auth.

Here's how:

1. Create a custom auth file.

Make sure the response type follows the UserAPIKeyAuth pydantic object. This is used by for logging usage specific to that user key.

from litellm.proxy._types import UserAPIKeyAuth

async def user_api_key_auth(request: Request, api_key: str) -> UserAPIKeyAuth: 
    try: 
        modified_master_key = "sk-my-master-key"
        if api_key == modified_master_key:
            return UserAPIKeyAuth(api_key=api_key)
        raise Exception
    except: 
        raise Exception

2. Pass the filepath (relative to the config.yaml)

Pass the filepath to the config.yaml

e.g. if they're both in the same dir - ./config.yaml and ./custom_auth.py, this is what it looks like:

model_list: 
  - model_name: "openai-model"
    litellm_params: 
      model: "gpt-3.5-turbo"

litellm_settings:
  drop_params: True
  set_verbose: True

general_settings:
  custom_auth: custom_auth.user_api_key_auth

Implementation Code

3. Start the proxy

$ litellm --config /path/to/config.yaml

Custom /key/generate

If you need to add custom logic before generating a Proxy API Key (Example Validating team_id)

1. Write a custom `custom_generate_key_fn`

The input to the custom_generate_key_fn function is a single parameter: data (Type: GenerateKeyRequest)

The output of your custom_generate_key_fn should be a dictionary with the following structure

{
    "decision": False,
    "message": "This violates LiteLLM Proxy Rules. No team id provided.",
}

decision (Type: bool): A boolean value indicating whether the key generation is allowed (True) or not (False).
message (Type: str, Optional): An optional message providing additional information about the decision. This field is included when the decision is False.

async def custom_generate_key_fn(data: GenerateKeyRequest)-> dict:
        """
        Asynchronous function for generating a key based on the input data.

        Args:
            data (GenerateKeyRequest): The input data for key generation.

        Returns:
            dict: A dictionary containing the decision and an optional message.
            {
                "decision": False,
                "message": "This violates LiteLLM Proxy Rules. No team id provided.",
            }
        """
        
        # decide if a key should be generated or not
        print("using custom auth function!")
        data_json = data.json()  # type: ignore

        # Unpacking variables
        team_id = data_json.get("team_id")
        duration = data_json.get("duration")
        models = data_json.get("models")
        aliases = data_json.get("aliases")
        config = data_json.get("config")
        spend = data_json.get("spend")
        user_id = data_json.get("user_id")
        max_parallel_requests = data_json.get("max_parallel_requests")
        metadata = data_json.get("metadata")
        tpm_limit = data_json.get("tpm_limit")
        rpm_limit = data_json.get("rpm_limit")

        if team_id is not None and team_id == "litellm-core-infra@gmail.com":
            # only team_id="litellm-core-infra@gmail.com" can make keys
            return {
                "decision": True,
            }
        else:
            print("Failed custom auth")
            return {
                "decision": False,
                "message": "This violates LiteLLM Proxy Rules. No team id provided.",
            }

2. Pass the filepath (relative to the config.yaml)

Pass the filepath to the config.yaml

e.g. if they're both in the same dir - ./config.yaml and ./custom_auth.py, this is what it looks like:

model_list: 
  - model_name: "openai-model"
    litellm_params: 
      model: "gpt-3.5-turbo"

litellm_settings:
  drop_params: True
  set_verbose: True

general_settings:
  custom_key_generate: custom_auth.custom_generate_key_fn

[BETA] Dynamo DB

Step 1. Save keys to env

AWS_ACCESS_KEY_ID = "your-aws-access-key-id"
AWS_SECRET_ACCESS_KEY = "your-aws-secret-access-key"

Step 2. Add details to config

general_settings: 
  master_key: sk-1234
  database_type: "dynamo_db" 
  database_args: { # 👈  all args - https://github.com/BerriAI/litellm/blob/befbcbb7ac8f59835ce47415c128decf37aac328/litellm/proxy/_types.py#L190
    "billing_mode": "PAY_PER_REQUEST", 
    "region_name": "us-west-2" 
    "user_table_name": "your-user-table",
    "key_table_name": "your-token-table",
    "config_table_name": "your-config-table",
    "aws_role_name": "your-aws_role_name",
    "aws_session_name": "your-aws_session_name",
  }

Step 3. Generate Key

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "aliases": {"mistral-7b": "gpt-3.5-turbo"}, "duration": null}'

23 KiB Raw Blame History

🔑 Virtual Keys, Users

Setup

/key/generate

Request

Response

Upgrade/Downgrade Models

Grant Access to new model

/key/info

Request

Response

/key/update

Request

Response

/key/delete

Request

Response

/user/new

Request

Response

/user/info

Request

View all Users

View specific user_id

Response

Advanced

Upperbound /key/generate params

Default /key/generate params

Restrict models by team_id

Create key with team_id="litellm-dev"

Use Key to call invalid model - Fails

Set Budgets - Per Key

Expected Behaviour

Set Budgets - Per User

Tracking Spend

Custom Auth

1. Create a custom auth file.

2. Pass the filepath (relative to the config.yaml)

3. Start the proxy

Custom /key/generate

1. Write a custom custom_generate_key_fn

2. Pass the filepath (relative to the config.yaml)

[BETA] Dynamo DB

Step 1. Save keys to env

Step 2. Add details to config

Step 3. Generate Key

23 KiB

Raw Blame History

Restrict models by `team_id`

1. Write a custom `custom_generate_key_fn`