import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # 💰 Budgets, Rate Limits Requirements: - Need to a postgres database (e.g. [Supabase](https://supabase.com/), [Neon](https://neon.tech/), etc) ## Set Budgets You can set budgets at 3 levels: - For the proxy - For a user - For a 'user' passed to `/chat/completions`, `/embeddings` etc - For a key - For a key (model specific budgets) Apply a budget across all calls on the proxy **Step 1. Modify config.yaml** ```yaml general_settings: master_key: sk-1234 litellm_settings: # other litellm settings max_budget: 0 # (float) sets max budget as $0 USD budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d"). ``` **Step 2. Start proxy** ```bash litellm /path/to/config.yaml ``` **Step 3. Send test call** ```bash curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Autherization: Bearer sk-1234' \ --header 'Content-Type: application/json' \ --data '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "what llm are you" } ], }' ``` Apply a budget across multiple keys. LiteLLM exposes a `/user/new` endpoint to create budgets for this. You can: - Add budgets to users [**Jump**](#add-budgets-to-users) - Add budget durations, to reset spend [**Jump**](#add-budget-duration-to-users) By default the `max_budget` is set to `null` and is not checked for keys #### **Add budgets to users** ```shell curl --location 'http://localhost:4000/user/new' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}' ``` [**See Swagger**](https://litellm-api.up.railway.app/#/user%20management/new_user_user_new_post) **Sample Response** ```shell { "key": "sk-YF2OxDbrgd1y2KgwxmEA2w", "expires": "2023-12-22T09:53:13.861000Z", "user_id": "krrish3@berri.ai", "max_budget": 0.0 } ``` #### **Add budget duration to users** `budget_duration`: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d"). ``` curl 'http://0.0.0.0:4000/user/new' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "team_id": "core-infra", # [OPTIONAL] "max_budget": 10, "budget_duration": 10s, }' ``` #### Create new keys for existing user Now you can just call `/key/generate` with that user_id (i.e. krrish3@berri.ai) and: - **Budget Check**: krrish3@berri.ai's budget (i.e. $10) will be checked for this key - **Spend Tracking**: spend for this key will update krrish3@berri.ai's spend as well ```bash curl --location 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data '{"models": ["azure-models"], "user_id": "krrish3@berri.ai"}' ``` You can: - Add budgets to Teams #### **Add budgets to users** ```shell curl --location 'http://localhost:4000/team/new' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "team_alias": "my-new-team_4", "members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}], "rpm_limit": 99 }' ``` [**See Swagger**](https://litellm-api.up.railway.app/#/team%20management/new_team_team_new_post) **Sample Response** ```shell { "team_alias": "my-new-team_4", "team_id": "13e83b19-f851-43fe-8e93-f96e21033100", "admins": [], "members": [], "members_with_roles": [ { "role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a" } ], "metadata": {}, "tpm_limit": null, "rpm_limit": 99, "max_budget": null, "models": [], "spend": 0.0, "max_parallel_requests": null, "budget_duration": null, "budget_reset_at": null } ``` Use this to budget `user` passed to `/chat/completions`, **without needing to create a key for every user** **Step 1. Modify config.yaml** Define `litellm.max_user_budget` ```yaml general_settings: master_key: sk-1234 litellm_settings: max_budget: 10 # global budget for proxy max_user_budget: 0.0001 # budget for 'user' passed to /chat/completions ``` 2. Make a /chat/completions call, pass 'user' - First call Works ```shell curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \ --data ' { "model": "azure-gpt-3.5", "user": "ishaan3", "messages": [ { "role": "user", "content": "what time is it" } ] }' ``` 3. Make a /chat/completions call, pass 'user' - Call Fails, since 'ishaan3' over budget ```shell curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \ --data ' { "model": "azure-gpt-3.5", "user": "ishaan3", "messages": [ { "role": "user", "content": "what time is it" } ] }' ``` Error ```shell {"error":{"message":"Authentication Error, ExceededBudget: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}% ``` Apply a budget on a key. You can: - Add budgets to keys [**Jump**](#add-budgets-to-keys) - Add budget durations, to reset spend [**Jump**](#add-budget-duration-to-keys) **Expected Behaviour** - Costs Per key get auto-populated in `LiteLLM_VerificationToken` Table - After the key crosses it's `max_budget`, requests fail - If duration set, spend is reset at the end of the duration By default the `max_budget` is set to `null` and is not checked for keys #### **Add budgets to keys** ```bash curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "team_id": "core-infra", # [OPTIONAL] "max_budget": 10, }' ``` Example Request to `/chat/completions` when key has crossed budget ```shell curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer ' \ --data ' { "model": "azure-gpt-3.5", "user": "e09b4da8-ed80-4b05-ac93-e16d9eb56fca", "messages": [ { "role": "user", "content": "respond in 50 lines" } ], }' ``` Expected Response from `/chat/completions` when key has crossed budget ```shell { "detail":"Authentication Error, ExceededTokenBudget: Current spend for token: 7.2e-05; Max Budget for Token: 2e-07" } ``` #### **Add budget duration to keys** `budget_duration`: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d"). ``` curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "team_id": "core-infra", # [OPTIONAL] "max_budget": 10, "budget_duration": 10s, }' ``` Apply model specific budgets on a key. **Expected Behaviour** - `model_spend` gets auto-populated in `LiteLLM_VerificationToken` Table - After the key crosses the budget set for the `model` in `model_max_budget`, calls fail By default the `model_max_budget` is set to `{}` and is not checked for keys :::info - LiteLLM will track the cost/budgets for the `model` passed to LLM endpoints (`/chat/completions`, `/embeddings`) ::: #### **Add model specific budgets to keys** ```bash curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ model_max_budget={"gpt4": 0.5, "gpt-5": 0.01} }' ``` ## Set Rate Limits You can set: - tpm limits (tokens per minute) - rpm limits (requests per minute) - max parallel requests Use `/user/new`, to persist rate limits across multiple keys. ```shell curl --location 'http://0.0.0.0:4000/user/new' \ --header 'Authorization: Bearer sk-1234' \ --header 'Content-Type: application/json' \ --data '{"user_id": "krrish@berri.ai", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' ``` [**See Swagger**](https://litellm-api.up.railway.app/#/user%20management/new_user_user_new_post) **Expected Response** ```json { "key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q", "expires": "2024-01-19T01:21:12.816168", "user_id": "krrish@berri.ai", } ``` Use `/key/generate`, if you want them for just that key. ```shell curl --location 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer sk-1234' \ --header 'Content-Type: application/json' \ --data '{"max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' ``` **Expected Response** ```json { "key": "sk-ulGNRXWtv7M0lFnnsQk0wQ", "expires": "2024-01-18T20:48:44.297973", "user_id": "78c2c8fc-c233-43b9-b0c3-eb931da27b84" // 👈 auto-generated } ``` ## Grant Access to new model Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.). Difference between doing this with `/key/generate` vs. `/user/new`? If you do it on `/user/new` it'll persist across multiple keys generated for that user. **Step 1. Assign model, access group in config.yaml** ```yaml model_list: - model_name: text-embedding-ada-002 litellm_params: model: azure/azure-embedding-model api_base: "os.environ/AZURE_API_BASE" api_key: "os.environ/AZURE_API_KEY" api_version: "2023-07-01-preview" model_info: access_groups: ["beta-models"] # 👈 Model Access Group ``` **Step 2. Create key with access group** ```bash curl --location 'http://localhost:4000/user/new' \ -H 'Authorization: Bearer ' \ -H 'Content-Type: application/json' \ -d '{"models": ["beta-models"], # 👈 Model Access Group "max_budget": 0}' ``` ## Create new keys for existing user Just include user_id in the `/key/generate` request. ```bash curl --location 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data '{"models": ["azure-models"], "user_id": "krrish@berri.ai"}' ```