Merge pull request #5432 from BerriAI/litellm_add_tag_control_team

[Feat-Proxy] Set tags per team - (use tag based routing for team)
2024-08-29 17:34:58 -07:00 · 2024-08-29 17:34:58 -07:00 · 9444b34711
commit 9444b34711
parent ef16738720 e329c4509a
8 changed files with 382 additions and 31 deletions
--- a/docs/my-website/docs/proxy/tag_routing.md
+++ b/docs/my-website/docs/proxy/tag_routing.md
@ -1,7 +1,11 @@
 # Tag Based Routing
 Route requests based on tags. 
-This is useful for implementing free / paid tiers for users
+This is useful for 
 - Implementing free / paid tiers for users
 - Controlling model access per team, example Team A can access gpt-4 deployment A, Team B can access gpt-4 deployment B
 ## Quick Start
 ### 1. Define tags on config.yaml 
@ -131,3 +135,124 @@ Response
 }
 }
 ```
 ## ✨ Team based tag routing (Enterprise)
 LiteLLM Proxy supports team-based tag routing, allowing you to associate specific tags with teams and route requests accordingly. Example **Team A can access gpt-4 deployment A, Team B can access gpt-4 deployment B**
 :::info
 This is an enterprise feature, [Contact us here to get a free trial](https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat)
 :::
 Here's how to set up and use team-based tag routing using curl commands:
 1. **Enable tag filtering in your proxy configuration:**
   In your `proxy_config.yaml`, ensure you have the following setting:
   ```yaml
   model_list:
    - model_name: fake-openai-endpoint
      litellm_params:
        model: openai/fake
        api_key: fake-key
        api_base: https://exampleopenaiendpoint-production.up.railway.app/
        tags: ["teamA"] # 👈 Key Change
      model_info:
        id: "team-a-model" # used for identifying model in response headers
    - model_name: fake-openai-endpoint
      litellm_params:
        model: openai/fake
        api_key: fake-key
        api_base: https://exampleopenaiendpoint-production.up.railway.app/
        tags: ["teamB"] # 👈 Key Change
      model_info:
        id: "team-b-model" # used for identifying model in response headers
  router_settings:
    enable_tag_filtering: True # 👈 Key Change
  general_settings: 
    master_key: sk-1234 
    ```
 2. **Create teams with tags:**
   Use the `/team/new` endpoint to create teams with specific tags:
   ```shell
   # Create Team A
   curl -X POST http://0.0.0.0:4000/team/new \
     -H "Authorization: Bearer sk-1234" \
     -H "Content-Type: application/json" \
     -d '{"tags": ["teamA"]}'
   ```
   ```shell
   # Create Team B
   curl -X POST http://0.0.0.0:4000/team/new \
     -H "Authorization: Bearer sk-1234" \
     -H "Content-Type: application/json" \
     -d '{"tags": ["teamB"]}'
   ```
   These commands will return JSON responses containing the `team_id` for each team.
 3. **Generate keys for team members:**
   Use the `/key/generate` endpoint to create keys associated with specific teams:
   ```shell
   # Generate key for Team A
   curl -X POST http://0.0.0.0:4000/key/generate \
     -H "Authorization: Bearer sk-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_id": "team_a_id_here"}'
   ```
   ```shell
   # Generate key for Team B
   curl -X POST http://0.0.0.0:4000/key/generate \
     -H "Authorization: Bearer sk-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_id": "team_b_id_here"}'
   ```
   Replace `team_a_id_here` and `team_b_id_here` with the actual team IDs received from step 2.
 4. **Verify routing:**
   Check the `x-litellm-model-id` header in the response to confirm that the request was routed to the correct model based on the team's tags. You can use the `-i` flag with curl to include the response headers:
   Request with Team A's key (including headers)
   ```shell
   curl -i -X POST http://0.0.0.0:4000/chat/completions \
     -H "Authorization: Bearer team_a_key_here" \
     -H "Content-Type: application/json" \
     -d '{
       "model": "fake-openai-endpoint",
       "messages": [
         {"role": "user", "content": "Hello!"}
       ]
     }'
   ```
   In the response headers, you should see:
   ```
   x-litellm-model-id: team-a-model
   ```
   Similarly, when using Team B's key, you should see:
   ```
   x-litellm-model-id: team-b-model
   ```
 By following these steps and using these curl commands, you can implement and test team-based tag routing in your LiteLLM Proxy setup, ensuring that different teams are routed to the appropriate models or deployments based on their assigned tags.
 ## Other Tag Based Features
 - [Track spend per tag](cost_tracking#-custom-tags)
 - [Setup Budgets per Virtual Key, Team](users)
--- a/litellm/proxy/_types.py
+++ b/litellm/proxy/_types.py
@ -813,6 +813,7 @@ class TeamBase(LiteLLMBase):
 class NewTeamRequest(TeamBase):
    model_aliases: Optional[dict] = None
    tags: Optional[list] = None
    model_config = ConfigDict(protected_namespaces=())
@ -883,6 +884,7 @@ class UpdateTeamRequest(LiteLLMBase):
    models: Optional[list] = None
    blocked: Optional[bool] = None
    budget_duration: Optional[str] = None
    tags: Optional[list] = None
 class ResetTeamBudgetRequest(LiteLLMBase):
--- a/litellm/proxy/example_config_yaml/otel_test_config.yaml
+++ b/litellm/proxy/example_config_yaml/otel_test_config.yaml
@ -4,11 +4,23 @@ model_list:
     model: openai/fake
     api_key: fake-key
     api_base: https://exampleopenaiendpoint-production.up.railway.app/
-  - model_name: rerank-english-v3.0
+     tags: ["teamA"]
   model_info:
     id: "team-a-model"
 - model_name: fake-openai-endpoint
   litellm_params:
     model: openai/fake
     api_key: fake-key
     api_base: https://exampleopenaiendpoint-production.up.railway.app/
     tags: ["teamB"]
   model_info:
     id: "team-b-model"
 - model_name: rerank-english-v3.0  # Fixed indentation here
   litellm_params:
     model: cohere/rerank-english-v3.0
     api_key: os.environ/COHERE_API_KEY
 litellm_settings:
  cache: true
  callbacks: ["otel"]
--- a/litellm/proxy/management_endpoints/team_endpoints.py
+++ b/litellm/proxy/management_endpoints/team_endpoints.py
@ -224,6 +224,13 @@ async def new_team(
        model_id=_model_id,
    )
    # Set tags on the new team
    if data.tags is not None:
        if complete_team_data.metadata is None:
            complete_team_data.metadata = {"tags": data.tags}
        else:
            complete_team_data.metadata["tags"] = data.tags
    # If budget_duration is set, set `budget_reset_at`
    if complete_team_data.budget_duration is not None:
        duration_s = _duration_in_seconds(duration=complete_team_data.budget_duration)
@ -365,6 +372,15 @@ async def update_team(
        # set the budget_reset_at in DB
        updated_kv["budget_reset_at"] = reset_at
    # check if user is trying to update tags for team
    if "tags" in updated_kv and updated_kv["tags"] is not None:
        # remove tags from updated_kv
        _tags = updated_kv.pop("tags")
        if "metadata" in updated_kv and updated_kv["metadata"] is not None:
            updated_kv["metadata"]["tags"] = _tags
        else:
            updated_kv["metadata"] = {"tags": _tags}
    updated_kv = prisma_client.jsonify_object(data=updated_kv)
    team_row: Optional[
        LiteLLM_TeamTable
--- a/litellm/proxy/proxy_config.yaml
+++ b/litellm/proxy/proxy_config.yaml
@ -4,16 +4,20 @@ model_list:
     model: openai/fake
     api_key: fake-key
     api_base: https://exampleopenaiendpoint-production.up.railway.app/
-  - model_name: Salesforce/Llama-Rank-V1
+     tags: ["teamA"] # 👈 Key Change
   model_info:
     id: "team-a-model" # used for identifying model in response headers
 - model_name: fake-openai-endpoint
   litellm_params:
-      model: together_ai/Salesforce/Llama-Rank-V1
+     model: openai/fake
-      api_key: os.environ/TOGETHERAI_API_KEY
+     api_key: fake-key
-  - model_name: rerank-english-v3.0
+     api_base: https://exampleopenaiendpoint-production.up.railway.app/
-    litellm_params:
+     tags: ["teamB"] # 👈 Key Change
-      model: cohere/rerank-english-v3.0
+   model_info:
-      api_key: os.environ/COHERE_API_KEY
+     id: "team-b-model" # used for identifying model in response headers
 router_settings:
 enable_tag_filtering: True # 👈 Key Change
-# default off mode
+general_settings: 
-litellm_settings:
+ master_key: sk-1234 
  set_verbose: True
--- a/litellm/router_strategy/tag_based_routing.py
+++ b/litellm/router_strategy/tag_based_routing.py
@ -20,9 +20,6 @@ async def get_deployments_for_tag(
    request_kwargs: Optional[Dict[Any, Any]] = None,
    healthy_deployments: Optional[Union[List[Any], Dict[Any, Any]]] = None,
 ):
    """
    if request_kwargs contains {"metadata": {"tier": "free"}} or {"metadata": {"tier": "paid"}}, then routes the request to free/paid tier models
    """
    if llm_router_instance.enable_tag_filtering is not True:
        return healthy_deployments
--- a/litellm/tests/test_key_generate_prisma.py
+++ b/litellm/tests/test_key_generate_prisma.py
@ -3033,3 +3033,62 @@ async def test_regenerate_api_key(prisma_client):
    assert new_key.key_name == f"sk-...{new_key.key[-4:]}"
    pass
@pytest.mark.asyncio()
 async def test_team_tags(prisma_client):
    """
    - Test setting tags on a team
    - Assert this is returned when calling /team/info
    - Team/update with tags should update the tags
    - Assert new tags are returned when calling /team/info
    """
    litellm.set_verbose = True
    setattr(litellm.proxy.proxy_server, "prisma_client", prisma_client)
    setattr(litellm.proxy.proxy_server, "master_key", "sk-1234")
    await litellm.proxy.proxy_server.prisma_client.connect()
    _new_team = NewTeamRequest(
        team_alias="test-teamA",
        tags=["teamA"],
    )
    new_team_response = await new_team(
        data=_new_team,
        user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
        http_request=Request(scope={"type": "http"}),
    )
    print("new_team_response", new_team_response)
    # call /team/info
    team_info_response = await team_info(
        team_id=new_team_response["team_id"],
        user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
        http_request=Request(scope={"type": "http"}),
    )
    print("team_info_response", team_info_response)
    assert team_info_response["team_info"].metadata["tags"] == ["teamA"]
    # team update with tags
    team_update_response = await update_team(
        data=UpdateTeamRequest(
            team_id=new_team_response["team_id"],
            tags=["teamA", "teamB"],
        ),
        user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
        http_request=Request(scope={"type": "http"}),
    )
    print("team_update_response", team_update_response)
    # call /team/info again
    team_info_response = await team_info(
        team_id=new_team_response["team_id"],
        user_api_key_dict=UserAPIKeyAuth(user_role=LitellmUserRoles.PROXY_ADMIN),
        http_request=Request(scope={"type": "http"}),
    )
    print("team_info_response", team_info_response)
    assert team_info_response["team_info"].metadata["tags"] == ["teamA", "teamB"]
--- a/tests/otel_tests/test_team_tag_routing.py
+++ b/tests/otel_tests/test_team_tag_routing.py
@ -0,0 +1,136 @@
 # What this tests ?
 ## Set tags on a team and then make a request to /chat/completions
 import pytest
 import asyncio
 import aiohttp, openai
 from openai import OpenAI, AsyncOpenAI
 from typing import Optional, List, Union
 import uuid
 LITELLM_MASTER_KEY = "sk-1234"
 async def chat_completion(
    session, key, model: Union[str, List] = "fake-openai-endpoint"
 ):
    url = "http://0.0.0.0:4000/chat/completions"
    headers = {
        "Authorization": f"Bearer {key}",
        "Content-Type": "application/json",
    }
    data = {
        "model": model,
        "messages": [
            {"role": "user", "content": f"Hello! {str(uuid.uuid4())}"},
        ],
    }
    async with session.post(url, headers=headers, json=data) as response:
        status = response.status
        response_text = await response.text()
        if status != 200:
            raise Exception(response_text)
        return await response.json(), response.headers
 async def create_team_with_tags(session, key, tags: List[str]):
    url = "http://0.0.0.0:4000/team/new"
    headers = {
        "Authorization": f"Bearer {key}",
        "Content-Type": "application/json",
    }
    data = {
        "tags": tags,
    }
    async with session.post(url, headers=headers, json=data) as response:
        status = response.status
        response_text = await response.text()
        if status != 200:
            raise Exception(response_text)
        return await response.json()
 async def create_key_with_team(session, key, team_id: str):
    url = f"http://0.0.0.0:4000/key/generate"
    headers = {
        "Authorization": f"Bearer {key}",
        "Content-Type": "application/json",
    }
    data = {
        "team_id": team_id,
    }
    async with session.post(url, headers=headers, json=data) as response:
        status = response.status
        response_text = await response.text()
        if status != 200:
            raise Exception(response_text)
        return await response.json()
 async def model_info_get_call(session, key, model_id: str):
    # make get call pass "litellm_model_id" in query params
    url = f"http://0.0.0.0:4000/model/info?litellm_model_id={model_id}"
    headers = {
        "Authorization": f"Bearer {key}",
        "Content-Type": "application/json",
    }
    async with session.get(url, headers=headers) as response:
        status = response.status
        response_text = await response.text()
        if status != 200:
            raise Exception(response_text)
        return await response.json()
@pytest.mark.asyncio()
 async def test_team_tag_routing():
    async with aiohttp.ClientSession() as session:
        key = LITELLM_MASTER_KEY
        team_a_data = await create_team_with_tags(session, key, ["teamA"])
        team_a_id = team_a_data["team_id"]
        team_b_data = await create_team_with_tags(session, key, ["teamB"])
        team_b_id = team_b_data["team_id"]
        key_with_team_a = await create_key_with_team(session, key, team_a_id)
        print(key_with_team_a)
        _key_with_team_a = key_with_team_a["key"]
        for _ in range(5):
            response_a, headers = await chat_completion(session, _key_with_team_a)
            headers = dict(headers)
            print(response_a)
            print(headers)
            assert (
                headers["x-litellm-model-id"] == "team-a-model"
            ), "Model ID should be teamA"
        key_with_team_b = await create_key_with_team(session, key, team_b_id)
        _key_with_team_b = key_with_team_b["key"]
        for _ in range(5):
            response_b, headers = await chat_completion(session, _key_with_team_b)
            headers = dict(headers)
            print(response_b)
            print(headers)
            assert (
                headers["x-litellm-model-id"] == "team-b-model"
            ), "Model ID should be teamB"
@pytest.mark.asyncio()
 async def test_chat_completion_with_no_tags():
    async with aiohttp.ClientSession() as session:
        key = LITELLM_MASTER_KEY
        response, headers = await chat_completion(session, key)
        headers = dict(headers)
        print(response)
        print(headers)
        assert response is not None