mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-27 11:43:54 +00:00
(chore) - enforce model budgets on virtual keys as enterprise feature (#7353)
* docs - enforce model budget as enterprise feature * docs link to correct place
This commit is contained in:
parent
d80307b7bb
commit
23d277f167
4 changed files with 132 additions and 136 deletions
|
@ -31,9 +31,10 @@ This covers:
|
||||||
- ✅ [Team Based Logging](./proxy/team_logging.md) - Allow each team to use their own Langfuse Project / custom callbacks
|
- ✅ [Team Based Logging](./proxy/team_logging.md) - Allow each team to use their own Langfuse Project / custom callbacks
|
||||||
- ✅ [Disable Logging for a Team](./proxy/team_logging.md#disable-logging-for-a-team) - Switch off all logging for a team/project (GDPR Compliance)
|
- ✅ [Disable Logging for a Team](./proxy/team_logging.md#disable-logging-for-a-team) - Switch off all logging for a team/project (GDPR Compliance)
|
||||||
- **Controlling Guardrails by Virtual Keys**
|
- **Controlling Guardrails by Virtual Keys**
|
||||||
- **Spend Tracking & Data Exports**
|
- **Spend Tracking, Budgets & Data Exports**
|
||||||
- ✅ [Tracking Spend for Custom Tags](./proxy/enterprise#tracking-spend-for-custom-tags)
|
- ✅ [Tracking Spend for Custom Tags](./proxy/enterprise#tracking-spend-for-custom-tags)
|
||||||
- ✅ [Set USD Budgets Spend for Custom Tags](./proxy/provider_budget_routing#-tag-budgets)
|
- ✅ [Set USD Budgets Spend for Custom Tags](./proxy/provider_budget_routing#-tag-budgets)
|
||||||
|
- ✅ [Set Model budgets for Virtual Keys](./proxy/users#-virtual-key-model-specific)
|
||||||
- ✅ [Exporting LLM Logs to GCS Bucket, Azure Blob Storage](./proxy/bucket#🪣-logging-gcs-s3-buckets)
|
- ✅ [Exporting LLM Logs to GCS Bucket, Azure Blob Storage](./proxy/bucket#🪣-logging-gcs-s3-buckets)
|
||||||
- ✅ [API Endpoints to get Spend Reports per Team, API Key, Customer](./proxy/cost_tracking.md#✨-enterprise-api-endpoints-to-get-spend)
|
- ✅ [API Endpoints to get Spend Reports per Team, API Key, Customer](./proxy/cost_tracking.md#✨-enterprise-api-endpoints-to-get-spend)
|
||||||
- **Prometheus Metrics**
|
- **Prometheus Metrics**
|
||||||
|
|
|
@ -29,6 +29,7 @@ Features:
|
||||||
- **Spend Tracking & Data Exports**
|
- **Spend Tracking & Data Exports**
|
||||||
- ✅ [Tracking Spend for Custom Tags](#tracking-spend-for-custom-tags)
|
- ✅ [Tracking Spend for Custom Tags](#tracking-spend-for-custom-tags)
|
||||||
- ✅ [Set USD Budgets Spend for Custom Tags](./provider_budget_routing#-tag-budgets)
|
- ✅ [Set USD Budgets Spend for Custom Tags](./provider_budget_routing#-tag-budgets)
|
||||||
|
- ✅ [Set Model budgets for Virtual Keys](./users#-virtual-key-model-specific)
|
||||||
- ✅ [Exporting LLM Logs to GCS Bucket, Azure Blob Storage](./proxy/bucket#🪣-logging-gcs-s3-buckets)
|
- ✅ [Exporting LLM Logs to GCS Bucket, Azure Blob Storage](./proxy/bucket#🪣-logging-gcs-s3-buckets)
|
||||||
- ✅ [`/spend/report` API endpoint](cost_tracking.md#✨-enterprise-api-endpoints-to-get-spend)
|
- ✅ [`/spend/report` API endpoint](cost_tracking.md#✨-enterprise-api-endpoints-to-get-spend)
|
||||||
- **Prometheus Metrics**
|
- **Prometheus Metrics**
|
||||||
|
|
|
@ -10,16 +10,7 @@ Requirements:
|
||||||
|
|
||||||
## Set Budgets
|
## Set Budgets
|
||||||
|
|
||||||
You can set budgets at 5 levels:
|
### Global Proxy
|
||||||
- For the proxy
|
|
||||||
- For an internal user
|
|
||||||
- For a customer (end-user)
|
|
||||||
- For a key
|
|
||||||
- For a key (model specific budgets)
|
|
||||||
|
|
||||||
|
|
||||||
<Tabs>
|
|
||||||
<TabItem value="proxy" label="For Proxy">
|
|
||||||
|
|
||||||
Apply a budget across all calls on the proxy
|
Apply a budget across all calls on the proxy
|
||||||
|
|
||||||
|
@ -57,8 +48,9 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
|
||||||
],
|
],
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
</TabItem>
|
|
||||||
<TabItem value="per-team" label="For Team">
|
### Team
|
||||||
|
|
||||||
You can:
|
You can:
|
||||||
- Add budgets to Teams
|
- Add budgets to Teams
|
||||||
|
|
||||||
|
@ -126,8 +118,7 @@ curl 'http://0.0.0.0:4000/team/new' \
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
### Team Members
|
||||||
<TabItem value="per-team-member" label="For Team Members">
|
|
||||||
|
|
||||||
Use this when you want to budget a users spend within a Team
|
Use this when you want to budget a users spend within a Team
|
||||||
|
|
||||||
|
@ -196,62 +187,75 @@ curl --location 'http://localhost:4000/chat/completions' \
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
<TabItem value="per-user-chat" label="For Customers">
|
|
||||||
|
|
||||||
Use this to budget `user` passed to `/chat/completions`, **without needing to create a key for every user**
|
### Internal User
|
||||||
|
|
||||||
**Step 1. Modify config.yaml**
|
Apply a budget across all calls an internal user (key owner) can make on the proxy.
|
||||||
Define `litellm.max_end_user_budget`
|
|
||||||
```yaml
|
|
||||||
general_settings:
|
|
||||||
master_key: sk-1234
|
|
||||||
|
|
||||||
litellm_settings:
|
:::info
|
||||||
max_end_user_budget: 0.0001 # budget for 'user' passed to /chat/completions
|
|
||||||
|
For most use-cases, we recommend setting team-member budgets
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
LiteLLM exposes a `/user/new` endpoint to create budgets for this.
|
||||||
|
|
||||||
|
You can:
|
||||||
|
- Add budgets to users [**Jump**](#add-budgets-to-users)
|
||||||
|
- Add budget durations, to reset spend [**Jump**](#add-budget-duration-to-users)
|
||||||
|
|
||||||
|
By default the `max_budget` is set to `null` and is not checked for keys
|
||||||
|
|
||||||
|
#### **Add budgets to users**
|
||||||
|
```shell
|
||||||
|
curl --location 'http://localhost:4000/user/new' \
|
||||||
|
--header 'Authorization: Bearer <your-master-key>' \
|
||||||
|
--header 'Content-Type: application/json' \
|
||||||
|
--data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}'
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Make a /chat/completions call, pass 'user' - First call Works
|
[**See Swagger**](https://litellm-api.up.railway.app/#/user%20management/new_user_user_new_post)
|
||||||
|
|
||||||
|
**Sample Response**
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl --location 'http://0.0.0.0:4000/chat/completions' \
|
{
|
||||||
--header 'Content-Type: application/json' \
|
"key": "sk-YF2OxDbrgd1y2KgwxmEA2w",
|
||||||
--header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
|
"expires": "2023-12-22T09:53:13.861000Z",
|
||||||
--data ' {
|
"user_id": "krrish3@berri.ai",
|
||||||
"model": "azure-gpt-3.5",
|
"max_budget": 0.0
|
||||||
"user": "ishaan3",
|
}
|
||||||
"messages": [
|
|
||||||
{
|
|
||||||
"role": "user",
|
|
||||||
"content": "what time is it"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Make a /chat/completions call, pass 'user' - Call Fails, since 'ishaan3' over budget
|
#### **Add budget duration to users**
|
||||||
```shell
|
|
||||||
curl --location 'http://0.0.0.0:4000/chat/completions' \
|
`budget_duration`: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
|
||||||
--header 'Content-Type: application/json' \
|
|
||||||
--header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
|
```
|
||||||
--data ' {
|
curl 'http://0.0.0.0:4000/user/new' \
|
||||||
"model": "azure-gpt-3.5",
|
--header 'Authorization: Bearer <your-master-key>' \
|
||||||
"user": "ishaan3",
|
--header 'Content-Type: application/json' \
|
||||||
"messages": [
|
--data-raw '{
|
||||||
{
|
"team_id": "core-infra", # [OPTIONAL]
|
||||||
"role": "user",
|
"max_budget": 10,
|
||||||
"content": "what time is it"
|
"budget_duration": 10s,
|
||||||
}
|
}'
|
||||||
]
|
|
||||||
}'
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Error
|
#### Create new keys for existing user
|
||||||
```shell
|
|
||||||
{"error":{"message":"Budget has been exceeded: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%
|
Now you can just call `/key/generate` with that user_id (i.e. krrish3@berri.ai) and:
|
||||||
|
- **Budget Check**: krrish3@berri.ai's budget (i.e. $10) will be checked for this key
|
||||||
|
- **Spend Tracking**: spend for this key will update krrish3@berri.ai's spend as well
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl --location 'http://0.0.0.0:4000/key/generate' \
|
||||||
|
--header 'Authorization: Bearer <your-master-key>' \
|
||||||
|
--header 'Content-Type: application/json' \
|
||||||
|
--data '{"models": ["azure-models"], "user_id": "krrish3@berri.ai"}'
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
### Virtual Key
|
||||||
<TabItem value="per-key" label="For Key">
|
|
||||||
|
|
||||||
Apply a budget on a key.
|
Apply a budget on a key.
|
||||||
|
|
||||||
|
@ -319,84 +323,19 @@ curl 'http://0.0.0.0:4000/key/generate' \
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
<TabItem value="per-user" label="For Internal User (Global)">
|
### ✨ Virtual Key (Model Specific)
|
||||||
|
|
||||||
Apply a budget across all calls an internal user (key owner) can make on the proxy.
|
|
||||||
|
|
||||||
:::info
|
|
||||||
|
|
||||||
For most use-cases, we recommend setting team-member budgets
|
|
||||||
|
|
||||||
:::
|
|
||||||
|
|
||||||
LiteLLM exposes a `/user/new` endpoint to create budgets for this.
|
|
||||||
|
|
||||||
You can:
|
|
||||||
- Add budgets to users [**Jump**](#add-budgets-to-users)
|
|
||||||
- Add budget durations, to reset spend [**Jump**](#add-budget-duration-to-users)
|
|
||||||
|
|
||||||
By default the `max_budget` is set to `null` and is not checked for keys
|
|
||||||
|
|
||||||
#### **Add budgets to users**
|
|
||||||
```shell
|
|
||||||
curl --location 'http://localhost:4000/user/new' \
|
|
||||||
--header 'Authorization: Bearer <your-master-key>' \
|
|
||||||
--header 'Content-Type: application/json' \
|
|
||||||
--data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
[**See Swagger**](https://litellm-api.up.railway.app/#/user%20management/new_user_user_new_post)
|
|
||||||
|
|
||||||
**Sample Response**
|
|
||||||
|
|
||||||
```shell
|
|
||||||
{
|
|
||||||
"key": "sk-YF2OxDbrgd1y2KgwxmEA2w",
|
|
||||||
"expires": "2023-12-22T09:53:13.861000Z",
|
|
||||||
"user_id": "krrish3@berri.ai",
|
|
||||||
"max_budget": 0.0
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
#### **Add budget duration to users**
|
|
||||||
|
|
||||||
`budget_duration`: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
|
|
||||||
|
|
||||||
```
|
|
||||||
curl 'http://0.0.0.0:4000/user/new' \
|
|
||||||
--header 'Authorization: Bearer <your-master-key>' \
|
|
||||||
--header 'Content-Type: application/json' \
|
|
||||||
--data-raw '{
|
|
||||||
"team_id": "core-infra", # [OPTIONAL]
|
|
||||||
"max_budget": 10,
|
|
||||||
"budget_duration": 10s,
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Create new keys for existing user
|
|
||||||
|
|
||||||
Now you can just call `/key/generate` with that user_id (i.e. krrish3@berri.ai) and:
|
|
||||||
- **Budget Check**: krrish3@berri.ai's budget (i.e. $10) will be checked for this key
|
|
||||||
- **Spend Tracking**: spend for this key will update krrish3@berri.ai's spend as well
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl --location 'http://0.0.0.0:4000/key/generate' \
|
|
||||||
--header 'Authorization: Bearer <your-master-key>' \
|
|
||||||
--header 'Content-Type: application/json' \
|
|
||||||
--data '{"models": ["azure-models"], "user_id": "krrish3@berri.ai"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
<TabItem value="per-model-key" label="For Key (model specific)">
|
|
||||||
|
|
||||||
Apply model specific budgets on a key. Example:
|
Apply model specific budgets on a key. Example:
|
||||||
- Budget for `gpt-4o` is $0.0000001, for time period `1d` for `key = "sk-12345"`
|
- Budget for `gpt-4o` is $0.0000001, for time period `1d` for `key = "sk-12345"`
|
||||||
- Budget for `gpt-4o-mini` is $10, for time period `30d` for `key = "sk-12345"`
|
- Budget for `gpt-4o-mini` is $10, for time period `30d` for `key = "sk-12345"`
|
||||||
|
|
||||||
#### **Add model specific budgets to keys**
|
:::info
|
||||||
|
|
||||||
|
✨ This is an Enterprise only feature [Get Started with Enterprise here](https://www.litellm.ai/#pricing)
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
|
||||||
The spec for `model_max_budget` is **[`Dict[str, GenericBudgetInfo]`](#genericbudgetinfo)**
|
The spec for `model_max_budget` is **[`Dict[str, GenericBudgetInfo]`](#genericbudgetinfo)**
|
||||||
|
|
||||||
|
@ -470,14 +409,63 @@ Expected response on failure
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
|
||||||
</Tabs>
|
</Tabs>
|
||||||
|
|
||||||
|
|
||||||
</TabItem>
|
### Customers
|
||||||
</Tabs>
|
|
||||||
|
|
||||||
### Reset Budgets
|
Use this to budget `user` passed to `/chat/completions`, **without needing to create a key for every user**
|
||||||
|
|
||||||
|
**Step 1. Modify config.yaml**
|
||||||
|
Define `litellm.max_end_user_budget`
|
||||||
|
```yaml
|
||||||
|
general_settings:
|
||||||
|
master_key: sk-1234
|
||||||
|
|
||||||
|
litellm_settings:
|
||||||
|
max_end_user_budget: 0.0001 # budget for 'user' passed to /chat/completions
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Make a /chat/completions call, pass 'user' - First call Works
|
||||||
|
```shell
|
||||||
|
curl --location 'http://0.0.0.0:4000/chat/completions' \
|
||||||
|
--header 'Content-Type: application/json' \
|
||||||
|
--header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
|
||||||
|
--data ' {
|
||||||
|
"model": "azure-gpt-3.5",
|
||||||
|
"user": "ishaan3",
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "what time is it"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Make a /chat/completions call, pass 'user' - Call Fails, since 'ishaan3' over budget
|
||||||
|
```shell
|
||||||
|
curl --location 'http://0.0.0.0:4000/chat/completions' \
|
||||||
|
--header 'Content-Type: application/json' \
|
||||||
|
--header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
|
||||||
|
--data ' {
|
||||||
|
"model": "azure-gpt-3.5",
|
||||||
|
"user": "ishaan3",
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "what time is it"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Error
|
||||||
|
```shell
|
||||||
|
{"error":{"message":"Budget has been exceeded: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%
|
||||||
|
```
|
||||||
|
|
||||||
|
## Reset Budgets
|
||||||
|
|
||||||
Reset budgets across keys/internal users/teams/customers
|
Reset budgets across keys/internal users/teams/customers
|
||||||
|
|
||||||
|
|
|
@ -1933,7 +1933,7 @@ async def _enforce_unique_key_alias(
|
||||||
|
|
||||||
def validate_model_max_budget(model_max_budget: Optional[Dict]) -> None:
|
def validate_model_max_budget(model_max_budget: Optional[Dict]) -> None:
|
||||||
"""
|
"""
|
||||||
Validate the model_max_budget is GenericBudgetConfigType
|
Validate the model_max_budget is GenericBudgetConfigType + enforce user has an enterprise license
|
||||||
|
|
||||||
Raises:
|
Raises:
|
||||||
Exception: If model_max_budget is not a valid GenericBudgetConfigType
|
Exception: If model_max_budget is not a valid GenericBudgetConfigType
|
||||||
|
@ -1944,6 +1944,12 @@ def validate_model_max_budget(model_max_budget: Optional[Dict]) -> None:
|
||||||
if len(model_max_budget) == 0:
|
if len(model_max_budget) == 0:
|
||||||
return
|
return
|
||||||
if model_max_budget is not None:
|
if model_max_budget is not None:
|
||||||
|
from litellm.proxy.proxy_server import CommonProxyErrors, premium_user
|
||||||
|
|
||||||
|
if premium_user is not True:
|
||||||
|
raise ValueError(
|
||||||
|
f"You must have an enterprise license to set model_max_budget. {CommonProxyErrors.not_premium_user.value}"
|
||||||
|
)
|
||||||
for _model, _budget_info in model_max_budget.items():
|
for _model, _budget_info in model_max_budget.items():
|
||||||
assert isinstance(_model, str)
|
assert isinstance(_model, str)
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue