[Docs] v1.66.0-stable fixes (#9953)

* add categories for spend tracking improvements * xai reasoning usage * docs tag management * docs tag based routing * [Beta] Routing based * docs tag based routing * docs tag routing * docs enterprise web search
2025-04-24 18:24:20 +00:00 · 2025-04-12 16:57:25 -07:00 · 2025-04-12 16:57:25 -07:00 · c86e678809
commit c86e678809
parent eb998ee1c0
9 changed files with 335 additions and 13 deletions
--- a/docs/my-website/docs/providers/vertex.md
+++ b/docs/my-website/docs/providers/vertex.md
@ -347,7 +347,7 @@ Return a `list[Recipe]`
 completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })
 ```

-### **Grounding**
+### **Grounding - Web Search**

 Add Google Search Result grounding to vertex ai calls. 

@ -358,7 +358,7 @@ See the grounding metadata with `response_obj._hidden_params["vertex_ai_groundin
 <Tabs>
 <TabItem value="sdk" label="SDK">

-```python 
+```python showLineNumbers
 from litellm import completion 

 ## SETUP ENVIRONMENT
@ -377,14 +377,36 @@ print(resp)
 </TabItem>
 <TabItem value="proxy" label="PROXY">

-```bash
+<Tabs>
+<TabItem value="openai" label="OpenAI Python SDK">
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
+    base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
+)
+
+response = client.chat.completions.create(
+    model="gemini-pro",
+    messages=[{"role": "user", "content": "Who won the world cup?"}],
+    tools=[{"googleSearchRetrieval": {}}],
+)
+
+print(response)
+```
+</TabItem>
+<TabItem value="curl" label="cURL">
+
+```bash showLineNumbers
 curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gemini-pro",
    "messages": [
-      {"role": "user", "content": "Hello, Claude!"}
+      {"role": "user", "content": "Who won the world cup?"}
    ],
   "tools": [
        {
@ -394,12 +416,82 @@ curl http://localhost:4000/v1/chat/completions \
  }'

 ```
+</TabItem>
+</Tabs>

 </TabItem>
 </Tabs>

 You can also use the `enterpriseWebSearch` tool for an [enterprise compliant search](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/web-grounding-enterprise).

+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python showLineNumbers
+from litellm import completion 
+
+## SETUP ENVIRONMENT
+# !gcloud auth application-default login - run this to add vertex credentials to your env
+
+tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH
+
+resp = litellm.completion(
+                    model="vertex_ai/gemini-1.0-pro-001",
+                    messages=[{"role": "user", "content": "Who won the world cup?"}],
+                    tools=tools,
+                )
+
+print(resp)
+```
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+<Tabs>
+<TabItem value="openai" label="OpenAI Python SDK">
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
+    base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
+)
+
+response = client.chat.completions.create(
+    model="gemini-pro",
+    messages=[{"role": "user", "content": "Who won the world cup?"}],
+    tools=[{"enterpriseWebSearch": {}}],
+)
+
+print(response)
+```
+</TabItem>
+<TabItem value="curl" label="cURL">
+
+```bash showLineNumbers
+curl http://localhost:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-1234" \
+  -d '{
+    "model": "gemini-pro",
+    "messages": [
+      {"role": "user", "content": "Who won the world cup?"}
+    ],
+   "tools": [
+        {
+            "enterpriseWebSearch": {} 
+        }
+    ]
+  }'
+
+```
+</TabItem>
+</Tabs>
+
+</TabItem>
+</Tabs>
+
+
 #### **Moving from Vertex AI SDK to LiteLLM (GROUNDING)**


--- a/docs/my-website/docs/providers/xai.md
+++ b/docs/my-website/docs/providers/xai.md
@ -176,3 +176,81 @@ Here's how to call a XAI model with the LiteLLM Proxy Server
  </Tabs>


+## Reasoning Usage
+
+LiteLLM supports reasoning usage for xAI models.
+
+<Tabs>
+
+<TabItem value="python" label="LiteLLM Python SDK">
+
+```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
+import litellm
+response = litellm.completion(
+    model="xai/grok-3-mini-beta",
+    messages=[{"role": "user", "content": "What is 101*3?"}],
+    reasoning_effort="low",
+)
+
+print("Reasoning Content:")
+print(response.choices[0].message.reasoning_content)
+
+print("\nFinal Response:")
+print(completion.choices[0].message.content)
+
+print("\nNumber of completion tokens (input):")
+print(completion.usage.completion_tokens)
+
+print("\nNumber of reasoning tokens (input):")
+print(completion.usage.completion_tokens_details.reasoning_tokens)
+```
+</TabItem>
+
+<TabItem value="curl" label="LiteLLM Proxy - OpenAI SDK Usage">
+
+```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
+import openai
+client = openai.OpenAI(
+    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
+    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
+)
+
+response = client.chat.completions.create(
+    model="xai/grok-3-mini-beta",
+    messages=[{"role": "user", "content": "What is 101*3?"}],
+    reasoning_effort="low",
+)
+
+print("Reasoning Content:")
+print(response.choices[0].message.reasoning_content)
+
+print("\nFinal Response:")
+print(completion.choices[0].message.content)
+
+print("\nNumber of completion tokens (input):")
+print(completion.usage.completion_tokens)
+
+print("\nNumber of reasoning tokens (input):")
+print(completion.usage.completion_tokens_details.reasoning_tokens)
+```
+
+</TabItem>
+</Tabs>
+
+**Example Response:**
+
+```shell
+Reasoning Content:
+Let me calculate 101 multiplied by 3:
+101 * 3 = 303.
+I can double-check that: 100 * 3 is 300, and 1 * 3 is 3, so 300 + 3 = 303. Yes, that's correct.
+
+Final Response:
+The result of 101 multiplied by 3 is 303.
+
+Number of completion tokens (input):
+14
+
+Number of reasoning tokens (input):
+310
+```
--- a/docs/my-website/docs/tutorials/tag_management.md
+++ b/docs/my-website/docs/tutorials/tag_management.md
@ -0,0 +1,145 @@
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# [Beta] Routing based on request metadata
+
+Create routing rules based on request metadata.
+
+## Setup
+
+Add the following to your litellm proxy config yaml file.
+
+```yaml showLineNumbers title="litellm proxy config.yaml"
+router_settings:
+  enable_tag_filtering: True # 👈 Key Change
+```
+
+## 1. Create a tag
+
+On the LiteLLM UI, navigate to Experimental > Tag Management > Create Tag.
+
+Create a tag called `private-data` and only select the allowed models for requests with this tag. Once created, you will see the tag in the Tag Management page.
+
+<Image img={require('../../img/tag_create.png')}  style={{ width: '800px', height: 'auto' }} />
+
+
+## 2. Test Tag Routing
+
+Now we will test the tag based routing rules.
+
+### 2.1 Invalid model
+
+This request will fail since we send `tags=private-data` but the model `gpt-4o` is not in the allowed models for the `private-data` tag.
+
+<Image img={require('../../img/tag_invalid.png')}  style={{ width: '800px', height: 'auto' }} />
+
+<br />
+
+Here is an example sending the same request using the OpenAI Python SDK.
+<Tabs>
+<TabItem value="python" label="OpenAI Python SDK">
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="sk-1234",
+    base_url="http://0.0.0.0:4000/v1/"
+)
+
+response = client.chat.completions.create(
+    model="gpt-4o",
+    messages=[
+        {"role": "user", "content": "Hello, how are you?"}
+    ],
+    extra_body={
+        "tags": "private-data"
+    }
+)
+```
+
+</TabItem>
+<TabItem value="curl" label="cURL">
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+-d '{
+  "model": "gpt-4o",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Hello, how are you?"
+    }
+  ],
+  "tags": "private-data"
+}'
+```
+
+</TabItem>
+</Tabs>
+
+<br />
+
+### 2.2 Valid model
+
+This request will succeed since we send `tags=private-data` and the model `us.anthropic.claude-3-7-sonnet-20250219-v1:0` is in the allowed models for the `private-data` tag.
+
+<Image img={require('../../img/tag_valid.png')}  style={{ width: '800px', height: 'auto' }} />
+
+Here is an example sending the same request using the OpenAI Python SDK.
+
+<Tabs>
+<TabItem value="python" label="OpenAI Python SDK">
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="sk-1234",
+    base_url="http://0.0.0.0:4000/v1/"
+)
+
+response = client.chat.completions.create(
+    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    messages=[
+        {"role": "user", "content": "Hello, how are you?"}
+    ],
+    extra_body={
+        "tags": "private-data"
+    }
+)
+```
+
+</TabItem>
+<TabItem value="curl" label="cURL">
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+-d '{
+  "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Hello, how are you?"
+    }
+  ],
+  "tags": "private-data"
+}'
+```
+
+</TabItem>
+</Tabs>
+
+
+
+## Additional Tag Features
+- [Sending tags in request headers](https://docs.litellm.ai/docs/proxy/tag_routing#calling-via-request-header)
+- [Tag based routing](https://docs.litellm.ai/docs/proxy/tag_routing)
+- [Track spend per tag](cost_tracking#-custom-tags)
+- [Setup Budgets per Virtual Key, Team](users)
+
--- a/docs/my-website/img/tag_create.png
+++ b/docs/my-website/img/tag_create.png
--- a/docs/my-website/img/tag_invalid.png
+++ b/docs/my-website/img/tag_invalid.png
--- a/docs/my-website/img/tag_valid.png
+++ b/docs/my-website/img/tag_valid.png
--- a/docs/my-website/release_notes/v1.66.0-stable/index.md
+++ b/docs/my-website/release_notes/v1.66.0-stable/index.md
@ -46,7 +46,8 @@ v1.66.0-stable is live now, here are the key highlights of this release
 ## Key Highlights
 - **Microsoft SSO Auto-sync**: Auto-sync groups and group members from Azure Entra ID to LiteLLM
 - **Unified File IDs**: Use the same file id across LLM API providers. 
- **New Models**: `xAI grok-3` support, `realtime api` cost tracking and logging
+- **Realtime API Cost Tracking**: Track cost of realtime api calls
+- **xAI grok-3**: Added support for `xai/grok-3` models
 - **Security Fixes**: Fixed [CVE-2025-0330](https://www.cve.org/CVERecord?id=CVE-2025-0330) and [CVE-2024-6825](https://www.cve.org/CVERecord?id=CVE-2024-6825) vulnerabilities

 Let's dive in.
@ -79,7 +80,7 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)
    2. Added reasoning_effort support for `xai/grok-3-mini-beta` model family [PR](https://github.com/BerriAI/litellm/pull/9932)

 - Hugging Face
-    1. Hugging Face - Added inference providers support [PR](https://github.com/BerriAI/litellm/pull/9773)
+    1. Hugging Face - Added inference providers support [Getting Started](https://docs.litellm.ai/docs/providers/huggingface#serverless-inference-providers)

 - Azure
    1. Azure - Added azure/gpt-4o-realtime-audio cost tracking [PR](https://github.com/BerriAI/litellm/pull/9893)
@ -109,13 +110,15 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)


 ## Spend Tracking Improvements
-
-1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
-2. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
-3. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
-4. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
-5. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
-6. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)
+- OpenAI, Azure
+    1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
+- Anthropic
+    1. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
+    2. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
+    3. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
+- General
+    1. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
+    2. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)


 ## Management Endpoints / UI
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -444,6 +444,7 @@ const sidebars = {
      items: [
        "tutorials/openweb_ui",
        "tutorials/msft_sso",
+        "tutorials/tag_management",
        'tutorials/litellm_proxy_aporia',
        {
          type: "category",
--- a/litellm/proxy/proxy_config.yaml
+++ b/litellm/proxy/proxy_config.yaml
@ -19,3 +19,6 @@ litellm_settings:
        success_callback: ["langfuse", "s3"]
        langfuse_secret: secret-workflows-key
        langfuse_public_key: public-workflows-key
+
+router_settings:
+  enable_tag_filtering: True # 👈 Key Change