forked from phoenix/litellm-mirror
Merge pull request #5432 from BerriAI/litellm_add_tag_control_team
[Feat-Proxy] Set tags per team - (use tag based routing for team)
This commit is contained in:
commit
9444b34711
8 changed files with 382 additions and 31 deletions
|
@ -1,7 +1,11 @@
|
||||||
# Tag Based Routing
|
# Tag Based Routing
|
||||||
|
|
||||||
Route requests based on tags.
|
Route requests based on tags.
|
||||||
This is useful for implementing free / paid tiers for users
|
This is useful for
|
||||||
|
- Implementing free / paid tiers for users
|
||||||
|
- Controlling model access per team, example Team A can access gpt-4 deployment A, Team B can access gpt-4 deployment B
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
### 1. Define tags on config.yaml
|
### 1. Define tags on config.yaml
|
||||||
|
|
||||||
|
@ -131,3 +135,124 @@ Response
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## ✨ Team based tag routing (Enterprise)
|
||||||
|
|
||||||
|
LiteLLM Proxy supports team-based tag routing, allowing you to associate specific tags with teams and route requests accordingly. Example **Team A can access gpt-4 deployment A, Team B can access gpt-4 deployment B**
|
||||||
|
|
||||||
|
|
||||||
|
:::info
|
||||||
|
|
||||||
|
This is an enterprise feature, [Contact us here to get a free trial](https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat)
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
Here's how to set up and use team-based tag routing using curl commands:
|
||||||
|
|
||||||
|
1. **Enable tag filtering in your proxy configuration:**
|
||||||
|
|
||||||
|
In your `proxy_config.yaml`, ensure you have the following setting:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
model_list:
|
||||||
|
- model_name: fake-openai-endpoint
|
||||||
|
litellm_params:
|
||||||
|
model: openai/fake
|
||||||
|
api_key: fake-key
|
||||||
|
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
||||||
|
tags: ["teamA"] # 👈 Key Change
|
||||||
|
model_info:
|
||||||
|
id: "team-a-model" # used for identifying model in response headers
|
||||||
|
- model_name: fake-openai-endpoint
|
||||||
|
litellm_params:
|
||||||
|
model: openai/fake
|
||||||
|
api_key: fake-key
|
||||||
|
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
||||||
|
tags: ["teamB"] # 👈 Key Change
|
||||||
|
model_info:
|
||||||
|
id: "team-b-model" # used for identifying model in response headers
|
||||||
|
|
||||||
|
router_settings:
|
||||||
|
enable_tag_filtering: True # 👈 Key Change
|
||||||
|
|
||||||
|
general_settings:
|
||||||
|
master_key: sk-1234
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Create teams with tags:**
|
||||||
|
|
||||||
|
Use the `/team/new` endpoint to create teams with specific tags:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# Create Team A
|
||||||
|
curl -X POST http://0.0.0.0:4000/team/new \
|
||||||
|
-H "Authorization: Bearer sk-1234" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"tags": ["teamA"]}'
|
||||||
|
```
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# Create Team B
|
||||||
|
curl -X POST http://0.0.0.0:4000/team/new \
|
||||||
|
-H "Authorization: Bearer sk-1234" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"tags": ["teamB"]}'
|
||||||
|
```
|
||||||
|
|
||||||
|
These commands will return JSON responses containing the `team_id` for each team.
|
||||||
|
|
||||||
|
3. **Generate keys for team members:**
|
||||||
|
|
||||||
|
Use the `/key/generate` endpoint to create keys associated with specific teams:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# Generate key for Team A
|
||||||
|
curl -X POST http://0.0.0.0:4000/key/generate \
|
||||||
|
-H "Authorization: Bearer sk-1234" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"team_id": "team_a_id_here"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# Generate key for Team B
|
||||||
|
curl -X POST http://0.0.0.0:4000/key/generate \
|
||||||
|
-H "Authorization: Bearer sk-1234" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"team_id": "team_b_id_here"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `team_a_id_here` and `team_b_id_here` with the actual team IDs received from step 2.
|
||||||
|
|
||||||
|
4. **Verify routing:**
|
||||||
|
|
||||||
|
Check the `x-litellm-model-id` header in the response to confirm that the request was routed to the correct model based on the team's tags. You can use the `-i` flag with curl to include the response headers:
|
||||||
|
|
||||||
|
Request with Team A's key (including headers)
|
||||||
|
```shell
|
||||||
|
curl -i -X POST http://0.0.0.0:4000/chat/completions \
|
||||||
|
-H "Authorization: Bearer team_a_key_here" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"model": "fake-openai-endpoint",
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "Hello!"}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
In the response headers, you should see:
|
||||||
|
```
|
||||||
|
x-litellm-model-id: team-a-model
|
||||||
|
```
|
||||||
|
|
||||||
|
Similarly, when using Team B's key, you should see:
|
||||||
|
```
|
||||||
|
x-litellm-model-id: team-b-model
|
||||||
|
```
|
||||||
|
|
||||||
|
By following these steps and using these curl commands, you can implement and test team-based tag routing in your LiteLLM Proxy setup, ensuring that different teams are routed to the appropriate models or deployments based on their assigned tags.
|
||||||
|
|
||||||
|
## Other Tag Based Features
|
||||||
|
- [Track spend per tag](cost_tracking#-custom-tags)
|
||||||
|
- [Setup Budgets per Virtual Key, Team](users)
|
||||||
|
|
||||||
|
|
|
@ -813,6 +813,7 @@ class TeamBase(LiteLLMBase):
|
||||||
|
|
||||||
class NewTeamRequest(TeamBase):
|
class NewTeamRequest(TeamBase):
|
||||||
model_aliases: Optional[dict] = None
|
model_aliases: Optional[dict] = None
|
||||||
|
tags: Optional[list] = None
|
||||||
|
|
||||||
model_config = ConfigDict(protected_namespaces=())
|
model_config = ConfigDict(protected_namespaces=())
|
||||||
|
|
||||||
|
@ -883,6 +884,7 @@ class UpdateTeamRequest(LiteLLMBase):
|
||||||
models: Optional[list] = None
|
models: Optional[list] = None
|
||||||
blocked: Optional[bool] = None
|
blocked: Optional[bool] = None
|
||||||
budget_duration: Optional[str] = None
|
budget_duration: Optional[str] = None
|
||||||
|
tags: Optional[list] = None
|
||||||
|
|
||||||
|
|
||||||
class ResetTeamBudgetRequest(LiteLLMBase):
|
class ResetTeamBudgetRequest(LiteLLMBase):
|
||||||
|
|
|
@ -4,11 +4,23 @@ model_list:
|
||||||
model: openai/fake
|
model: openai/fake
|
||||||
api_key: fake-key
|
api_key: fake-key
|
||||||
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
||||||
- model_name: rerank-english-v3.0
|
tags: ["teamA"]
|
||||||
|
model_info:
|
||||||
|
id: "team-a-model"
|
||||||
|
- model_name: fake-openai-endpoint
|
||||||
|
litellm_params:
|
||||||
|
model: openai/fake
|
||||||
|
api_key: fake-key
|
||||||
|
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
||||||
|
tags: ["teamB"]
|
||||||
|
model_info:
|
||||||
|
id: "team-b-model"
|
||||||
|
- model_name: rerank-english-v3.0 # Fixed indentation here
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: cohere/rerank-english-v3.0
|
model: cohere/rerank-english-v3.0
|
||||||
api_key: os.environ/COHERE_API_KEY
|
api_key: os.environ/COHERE_API_KEY
|
||||||
|
|
||||||
|
|
||||||
litellm_settings:
|
litellm_settings:
|
||||||
cache: true
|
cache: true
|
||||||
callbacks: ["otel"]
|
callbacks: ["otel"]
|
||||||
|
|
|
@ -224,6 +224,13 @@ async def new_team(
|
||||||
model_id=_model_id,
|
model_id=_model_id,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Set tags on the new team
|
||||||
|
if data.tags is not None:
|
||||||
|
if complete_team_data.metadata is None:
|
||||||
|
complete_team_data.metadata = {"tags": data.tags}
|
||||||
|
else:
|
||||||
|
complete_team_data.metadata["tags"] = data.tags
|
||||||
|
|
||||||
# If budget_duration is set, set `budget_reset_at`
|
# If budget_duration is set, set `budget_reset_at`
|
||||||
if complete_team_data.budget_duration is not None:
|
if complete_team_data.budget_duration is not None:
|
||||||
duration_s = _duration_in_seconds(duration=complete_team_data.budget_duration)
|
duration_s = _duration_in_seconds(duration=complete_team_data.budget_duration)
|
||||||
|
@ -365,6 +372,15 @@ async def update_team(
|
||||||
# set the budget_reset_at in DB
|
# set the budget_reset_at in DB
|
||||||
updated_kv["budget_reset_at"] = reset_at
|
updated_kv["budget_reset_at"] = reset_at
|
||||||
|
|
||||||
|
# check if user is trying to update tags for team
|
||||||
|
if "tags" in updated_kv and updated_kv["tags"] is not None:
|
||||||
|
# remove tags from updated_kv
|
||||||
|
_tags = updated_kv.pop("tags")
|
||||||
|
if "metadata" in updated_kv and updated_kv["metadata"] is not None:
|
||||||
|
updated_kv["metadata"]["tags"] = _tags
|
||||||
|
else:
|
||||||
|
updated_kv["metadata"] = {"tags": _tags}
|
||||||
|
|
||||||
updated_kv = prisma_client.jsonify_object(data=updated_kv)
|
updated_kv = prisma_client.jsonify_object(data=updated_kv)
|
||||||
team_row: Optional[
|
team_row: Optional[
|
||||||
LiteLLM_TeamTable
|
LiteLLM_TeamTable
|
||||||
|
|
|
@ -4,16 +4,20 @@ model_list:
|
||||||
model: openai/fake
|
model: openai/fake
|
||||||
api_key: fake-key
|
api_key: fake-key
|
||||||
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
||||||
- model_name: Salesforce/Llama-Rank-V1
|
tags: ["teamA"] # 👈 Key Change
|
||||||
|
model_info:
|
||||||
|
id: "team-a-model" # used for identifying model in response headers
|
||||||
|
- model_name: fake-openai-endpoint
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: together_ai/Salesforce/Llama-Rank-V1
|
model: openai/fake
|
||||||
api_key: os.environ/TOGETHERAI_API_KEY
|
api_key: fake-key
|
||||||
- model_name: rerank-english-v3.0
|
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
||||||
litellm_params:
|
tags: ["teamB"] # 👈 Key Change
|
||||||
model: cohere/rerank-english-v3.0
|
model_info:
|
||||||
api_key: os.environ/COHERE_API_KEY
|
id: "team-b-model" # used for identifying model in response headers
|
||||||
|
|
||||||
|
router_settings:
|
||||||
|
enable_tag_filtering: True # 👈 Key Change
|
||||||
|
|
||||||
# default off mode
|
general_settings:
|
||||||
litellm_settings:
|
master_key: sk-1234
|
||||||
set_verbose: True
|
|
|
@ -20,9 +20,6 @@ async def get_deployments_for_tag(
|
||||||
request_kwargs: Optional[Dict[Any, Any]] = None,
|
request_kwargs: Optional[Dict[Any, Any]] = None,
|
||||||
healthy_deployments: Optional[Union[List[Any], Dict[Any, Any]]] = None,
|
healthy_deployments: Optional[Union[List[Any], Dict[Any, Any]]] = None,
|
||||||
):
|
):
|
||||||
"""
|
|
||||||
if request_kwargs contains {"metadata": {"tier": "free"}} or {"metadata": {"tier": "paid"}}, then routes the request to free/paid tier models
|
|
||||||
"""
|
|
||||||
if llm_router_instance.enable_tag_filtering is not True:
|
if llm_router_instance.enable_tag_filtering is not True:
|
||||||
return healthy_deployments
|
return healthy_deployments
|
||||||
|
|
||||||
|
|
|
@ -3033,3 +3033,62 @@ async def test_regenerate_api_key(prisma_client):
|
||||||
assert new_key.key_name == f"sk-...{new_key.key[-4:]}"
|
assert new_key.key_name == f"sk-...{new_key.key[-4:]}"
|
||||||
|
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio()
|
||||||
|
async def test_team_tags(prisma_client):
|
||||||
|
"""
|
||||||
|
- Test setting tags on a team
|
||||||
|
- Assert this is returned when calling /team/info
|
||||||
|
- Team/update with tags should update the tags
|
||||||
|
- Assert new tags are returned when calling /team/info
|
||||||
|
"""
|
||||||
|
litellm.set_verbose = True
|
||||||
|
setattr(litellm.proxy.proxy_server, "prisma_client", prisma_client)
|
||||||
|
setattr(litellm.proxy.proxy_server, "master_key", "sk-1234")
|
||||||
|
await litellm.proxy.proxy_server.prisma_client.connect()
|
||||||
|
|
||||||
|
_new_team = NewTeamRequest(
|
||||||
|
team_alias="test-teamA",
|
||||||
|
tags=["teamA"],
|
||||||
|
)
|
||||||
|
|
||||||
|
new_team_response = await new_team(
|
||||||
|
data=_new_team,
|
||||||
|
user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
|
||||||
|
http_request=Request(scope={"type": "http"}),
|
||||||
|
)
|
||||||
|
|
||||||
|
print("new_team_response", new_team_response)
|
||||||
|
|
||||||
|
# call /team/info
|
||||||
|
team_info_response = await team_info(
|
||||||
|
team_id=new_team_response["team_id"],
|
||||||
|
user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
|
||||||
|
http_request=Request(scope={"type": "http"}),
|
||||||
|
)
|
||||||
|
print("team_info_response", team_info_response)
|
||||||
|
|
||||||
|
assert team_info_response["team_info"].metadata["tags"] == ["teamA"]
|
||||||
|
|
||||||
|
# team update with tags
|
||||||
|
team_update_response = await update_team(
|
||||||
|
data=UpdateTeamRequest(
|
||||||
|
team_id=new_team_response["team_id"],
|
||||||
|
tags=["teamA", "teamB"],
|
||||||
|
),
|
||||||
|
user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
|
||||||
|
http_request=Request(scope={"type": "http"}),
|
||||||
|
)
|
||||||
|
|
||||||
|
print("team_update_response", team_update_response)
|
||||||
|
|
||||||
|
# call /team/info again
|
||||||
|
team_info_response = await team_info(
|
||||||
|
team_id=new_team_response["team_id"],
|
||||||
|
user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
|
||||||
|
http_request=Request(scope={"type": "http"}),
|
||||||
|
)
|
||||||
|
|
||||||
|
print("team_info_response", team_info_response)
|
||||||
|
assert team_info_response["team_info"].metadata["tags"] == ["teamA", "teamB"]
|
||||||
|
|
136
tests/otel_tests/test_team_tag_routing.py
Normal file
136
tests/otel_tests/test_team_tag_routing.py
Normal file
|
@ -0,0 +1,136 @@
|
||||||
|
# What this tests ?
|
||||||
|
## Set tags on a team and then make a request to /chat/completions
|
||||||
|
import pytest
|
||||||
|
import asyncio
|
||||||
|
import aiohttp, openai
|
||||||
|
from openai import OpenAI, AsyncOpenAI
|
||||||
|
from typing import Optional, List, Union
|
||||||
|
import uuid
|
||||||
|
|
||||||
|
LITELLM_MASTER_KEY = "sk-1234"
|
||||||
|
|
||||||
|
|
||||||
|
async def chat_completion(
|
||||||
|
session, key, model: Union[str, List] = "fake-openai-endpoint"
|
||||||
|
):
|
||||||
|
url = "http://0.0.0.0:4000/chat/completions"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {key}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
data = {
|
||||||
|
"model": model,
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": f"Hello! {str(uuid.uuid4())}"},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
async with session.post(url, headers=headers, json=data) as response:
|
||||||
|
status = response.status
|
||||||
|
response_text = await response.text()
|
||||||
|
|
||||||
|
if status != 200:
|
||||||
|
raise Exception(response_text)
|
||||||
|
|
||||||
|
return await response.json(), response.headers
|
||||||
|
|
||||||
|
|
||||||
|
async def create_team_with_tags(session, key, tags: List[str]):
|
||||||
|
url = "http://0.0.0.0:4000/team/new"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {key}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
data = {
|
||||||
|
"tags": tags,
|
||||||
|
}
|
||||||
|
|
||||||
|
async with session.post(url, headers=headers, json=data) as response:
|
||||||
|
status = response.status
|
||||||
|
response_text = await response.text()
|
||||||
|
|
||||||
|
if status != 200:
|
||||||
|
raise Exception(response_text)
|
||||||
|
|
||||||
|
return await response.json()
|
||||||
|
|
||||||
|
|
||||||
|
async def create_key_with_team(session, key, team_id: str):
|
||||||
|
url = f"http://0.0.0.0:4000/key/generate"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {key}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
data = {
|
||||||
|
"team_id": team_id,
|
||||||
|
}
|
||||||
|
async with session.post(url, headers=headers, json=data) as response:
|
||||||
|
status = response.status
|
||||||
|
response_text = await response.text()
|
||||||
|
|
||||||
|
if status != 200:
|
||||||
|
raise Exception(response_text)
|
||||||
|
|
||||||
|
return await response.json()
|
||||||
|
|
||||||
|
|
||||||
|
async def model_info_get_call(session, key, model_id: str):
|
||||||
|
# make get call pass "litellm_model_id" in query params
|
||||||
|
url = f"http://0.0.0.0:4000/model/info?litellm_model_id={model_id}"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {key}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
async with session.get(url, headers=headers) as response:
|
||||||
|
status = response.status
|
||||||
|
response_text = await response.text()
|
||||||
|
|
||||||
|
if status != 200:
|
||||||
|
raise Exception(response_text)
|
||||||
|
|
||||||
|
return await response.json()
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio()
|
||||||
|
async def test_team_tag_routing():
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
key = LITELLM_MASTER_KEY
|
||||||
|
team_a_data = await create_team_with_tags(session, key, ["teamA"])
|
||||||
|
team_a_id = team_a_data["team_id"]
|
||||||
|
|
||||||
|
team_b_data = await create_team_with_tags(session, key, ["teamB"])
|
||||||
|
team_b_id = team_b_data["team_id"]
|
||||||
|
|
||||||
|
key_with_team_a = await create_key_with_team(session, key, team_a_id)
|
||||||
|
print(key_with_team_a)
|
||||||
|
_key_with_team_a = key_with_team_a["key"]
|
||||||
|
for _ in range(5):
|
||||||
|
response_a, headers = await chat_completion(session, _key_with_team_a)
|
||||||
|
headers = dict(headers)
|
||||||
|
print(response_a)
|
||||||
|
print(headers)
|
||||||
|
assert (
|
||||||
|
headers["x-litellm-model-id"] == "team-a-model"
|
||||||
|
), "Model ID should be teamA"
|
||||||
|
|
||||||
|
key_with_team_b = await create_key_with_team(session, key, team_b_id)
|
||||||
|
_key_with_team_b = key_with_team_b["key"]
|
||||||
|
for _ in range(5):
|
||||||
|
response_b, headers = await chat_completion(session, _key_with_team_b)
|
||||||
|
headers = dict(headers)
|
||||||
|
print(response_b)
|
||||||
|
print(headers)
|
||||||
|
assert (
|
||||||
|
headers["x-litellm-model-id"] == "team-b-model"
|
||||||
|
), "Model ID should be teamB"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio()
|
||||||
|
async def test_chat_completion_with_no_tags():
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
key = LITELLM_MASTER_KEY
|
||||||
|
response, headers = await chat_completion(session, key)
|
||||||
|
headers = dict(headers)
|
||||||
|
print(response)
|
||||||
|
print(headers)
|
||||||
|
assert response is not None
|
Loading…
Add table
Add a link
Reference in a new issue