diff --git a/docs/my-website/docs/providers/vertex.md b/docs/my-website/docs/providers/vertex.md
index cdd3fce6c6..476cc8a453 100644
--- a/docs/my-website/docs/providers/vertex.md
+++ b/docs/my-website/docs/providers/vertex.md
@@ -347,7 +347,7 @@ Return a `list[Recipe]`
completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })
```
-### **Grounding**
+### **Grounding - Web Search**
Add Google Search Result grounding to vertex ai calls.
@@ -358,7 +358,7 @@ See the grounding metadata with `response_obj._hidden_params["vertex_ai_groundin
-```python
+```python showLineNumbers
from litellm import completion
## SETUP ENVIRONMENT
@@ -377,14 +377,36 @@ print(resp)
-```bash
+
+
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+ api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
+ base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
+)
+
+response = client.chat.completions.create(
+ model="gemini-pro",
+ messages=[{"role": "user", "content": "Who won the world cup?"}],
+ tools=[{"googleSearchRetrieval": {}}],
+)
+
+print(response)
+```
+
+
+
+```bash showLineNumbers
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-pro",
"messages": [
- {"role": "user", "content": "Hello, Claude!"}
+ {"role": "user", "content": "Who won the world cup?"}
],
"tools": [
{
@@ -394,12 +416,82 @@ curl http://localhost:4000/v1/chat/completions \
}'
```
+
+
You can also use the `enterpriseWebSearch` tool for an [enterprise compliant search](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/web-grounding-enterprise).
+
+
+
+```python showLineNumbers
+from litellm import completion
+
+## SETUP ENVIRONMENT
+# !gcloud auth application-default login - run this to add vertex credentials to your env
+
+tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH
+
+resp = litellm.completion(
+ model="vertex_ai/gemini-1.0-pro-001",
+ messages=[{"role": "user", "content": "Who won the world cup?"}],
+ tools=tools,
+ )
+
+print(resp)
+```
+
+
+
+
+
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+ api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
+ base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
+)
+
+response = client.chat.completions.create(
+ model="gemini-pro",
+ messages=[{"role": "user", "content": "Who won the world cup?"}],
+ tools=[{"enterpriseWebSearch": {}}],
+)
+
+print(response)
+```
+
+
+
+```bash showLineNumbers
+curl http://localhost:4000/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer sk-1234" \
+ -d '{
+ "model": "gemini-pro",
+ "messages": [
+ {"role": "user", "content": "Who won the world cup?"}
+ ],
+ "tools": [
+ {
+ "enterpriseWebSearch": {}
+ }
+ ]
+ }'
+
+```
+
+
+
+
+
+
+
#### **Moving from Vertex AI SDK to LiteLLM (GROUNDING)**
diff --git a/docs/my-website/docs/providers/xai.md b/docs/my-website/docs/providers/xai.md
index a951b6bb9e..49a3640991 100644
--- a/docs/my-website/docs/providers/xai.md
+++ b/docs/my-website/docs/providers/xai.md
@@ -176,3 +176,81 @@ Here's how to call a XAI model with the LiteLLM Proxy Server
+## Reasoning Usage
+
+LiteLLM supports reasoning usage for xAI models.
+
+
+
+
+
+```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
+import litellm
+response = litellm.completion(
+ model="xai/grok-3-mini-beta",
+ messages=[{"role": "user", "content": "What is 101*3?"}],
+ reasoning_effort="low",
+)
+
+print("Reasoning Content:")
+print(response.choices[0].message.reasoning_content)
+
+print("\nFinal Response:")
+print(completion.choices[0].message.content)
+
+print("\nNumber of completion tokens (input):")
+print(completion.usage.completion_tokens)
+
+print("\nNumber of reasoning tokens (input):")
+print(completion.usage.completion_tokens_details.reasoning_tokens)
+```
+
+
+
+
+```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
+import openai
+client = openai.OpenAI(
+ api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
+ base_url="http://0.0.0.0:4000" # litellm-proxy-base url
+)
+
+response = client.chat.completions.create(
+ model="xai/grok-3-mini-beta",
+ messages=[{"role": "user", "content": "What is 101*3?"}],
+ reasoning_effort="low",
+)
+
+print("Reasoning Content:")
+print(response.choices[0].message.reasoning_content)
+
+print("\nFinal Response:")
+print(completion.choices[0].message.content)
+
+print("\nNumber of completion tokens (input):")
+print(completion.usage.completion_tokens)
+
+print("\nNumber of reasoning tokens (input):")
+print(completion.usage.completion_tokens_details.reasoning_tokens)
+```
+
+
+
+
+**Example Response:**
+
+```shell
+Reasoning Content:
+Let me calculate 101 multiplied by 3:
+101 * 3 = 303.
+I can double-check that: 100 * 3 is 300, and 1 * 3 is 3, so 300 + 3 = 303. Yes, that's correct.
+
+Final Response:
+The result of 101 multiplied by 3 is 303.
+
+Number of completion tokens (input):
+14
+
+Number of reasoning tokens (input):
+310
+```
diff --git a/docs/my-website/docs/tutorials/tag_management.md b/docs/my-website/docs/tutorials/tag_management.md
new file mode 100644
index 0000000000..9b00db47d1
--- /dev/null
+++ b/docs/my-website/docs/tutorials/tag_management.md
@@ -0,0 +1,145 @@
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# [Beta] Routing based on request metadata
+
+Create routing rules based on request metadata.
+
+## Setup
+
+Add the following to your litellm proxy config yaml file.
+
+```yaml showLineNumbers title="litellm proxy config.yaml"
+router_settings:
+ enable_tag_filtering: True # 👈 Key Change
+```
+
+## 1. Create a tag
+
+On the LiteLLM UI, navigate to Experimental > Tag Management > Create Tag.
+
+Create a tag called `private-data` and only select the allowed models for requests with this tag. Once created, you will see the tag in the Tag Management page.
+
+
+
+
+## 2. Test Tag Routing
+
+Now we will test the tag based routing rules.
+
+### 2.1 Invalid model
+
+This request will fail since we send `tags=private-data` but the model `gpt-4o` is not in the allowed models for the `private-data` tag.
+
+
+
+
+
+Here is an example sending the same request using the OpenAI Python SDK.
+
+
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+ api_key="sk-1234",
+ base_url="http://0.0.0.0:4000/v1/"
+)
+
+response = client.chat.completions.create(
+ model="gpt-4o",
+ messages=[
+ {"role": "user", "content": "Hello, how are you?"}
+ ],
+ extra_body={
+ "tags": "private-data"
+ }
+)
+```
+
+
+
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+-d '{
+ "model": "gpt-4o",
+ "messages": [
+ {
+ "role": "user",
+ "content": "Hello, how are you?"
+ }
+ ],
+ "tags": "private-data"
+}'
+```
+
+
+
+
+
+
+### 2.2 Valid model
+
+This request will succeed since we send `tags=private-data` and the model `us.anthropic.claude-3-7-sonnet-20250219-v1:0` is in the allowed models for the `private-data` tag.
+
+
+
+Here is an example sending the same request using the OpenAI Python SDK.
+
+
+
+
+```python showLineNumbers
+from openai import OpenAI
+
+client = OpenAI(
+ api_key="sk-1234",
+ base_url="http://0.0.0.0:4000/v1/"
+)
+
+response = client.chat.completions.create(
+ model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+ messages=[
+ {"role": "user", "content": "Hello, how are you?"}
+ ],
+ extra_body={
+ "tags": "private-data"
+ }
+)
+```
+
+
+
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+-d '{
+ "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+ "messages": [
+ {
+ "role": "user",
+ "content": "Hello, how are you?"
+ }
+ ],
+ "tags": "private-data"
+}'
+```
+
+
+
+
+
+
+## Additional Tag Features
+- [Sending tags in request headers](https://docs.litellm.ai/docs/proxy/tag_routing#calling-via-request-header)
+- [Tag based routing](https://docs.litellm.ai/docs/proxy/tag_routing)
+- [Track spend per tag](cost_tracking#-custom-tags)
+- [Setup Budgets per Virtual Key, Team](users)
+
diff --git a/docs/my-website/img/tag_create.png b/docs/my-website/img/tag_create.png
new file mode 100644
index 0000000000..d515b3a9f4
Binary files /dev/null and b/docs/my-website/img/tag_create.png differ
diff --git a/docs/my-website/img/tag_invalid.png b/docs/my-website/img/tag_invalid.png
new file mode 100644
index 0000000000..e12f7197b1
Binary files /dev/null and b/docs/my-website/img/tag_invalid.png differ
diff --git a/docs/my-website/img/tag_valid.png b/docs/my-website/img/tag_valid.png
new file mode 100644
index 0000000000..3b6e121d12
Binary files /dev/null and b/docs/my-website/img/tag_valid.png differ
diff --git a/docs/my-website/release_notes/v1.66.0-stable/index.md b/docs/my-website/release_notes/v1.66.0-stable/index.md
index 503970d1ee..65024792cd 100644
--- a/docs/my-website/release_notes/v1.66.0-stable/index.md
+++ b/docs/my-website/release_notes/v1.66.0-stable/index.md
@@ -46,7 +46,8 @@ v1.66.0-stable is live now, here are the key highlights of this release
## Key Highlights
- **Microsoft SSO Auto-sync**: Auto-sync groups and group members from Azure Entra ID to LiteLLM
- **Unified File IDs**: Use the same file id across LLM API providers.
-- **New Models**: `xAI grok-3` support, `realtime api` cost tracking and logging
+- **Realtime API Cost Tracking**: Track cost of realtime api calls
+- **xAI grok-3**: Added support for `xai/grok-3` models
- **Security Fixes**: Fixed [CVE-2025-0330](https://www.cve.org/CVERecord?id=CVE-2025-0330) and [CVE-2024-6825](https://www.cve.org/CVERecord?id=CVE-2024-6825) vulnerabilities
Let's dive in.
@@ -79,7 +80,7 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)
2. Added reasoning_effort support for `xai/grok-3-mini-beta` model family [PR](https://github.com/BerriAI/litellm/pull/9932)
- Hugging Face
- 1. Hugging Face - Added inference providers support [PR](https://github.com/BerriAI/litellm/pull/9773)
+ 1. Hugging Face - Added inference providers support [Getting Started](https://docs.litellm.ai/docs/providers/huggingface#serverless-inference-providers)
- Azure
1. Azure - Added azure/gpt-4o-realtime-audio cost tracking [PR](https://github.com/BerriAI/litellm/pull/9893)
@@ -109,13 +110,15 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)
## Spend Tracking Improvements
-
-1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
-2. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
-3. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
-4. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
-5. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
-6. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)
+- OpenAI, Azure
+ 1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
+- Anthropic
+ 1. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
+ 2. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
+ 3. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
+- General
+ 1. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
+ 2. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)
## Management Endpoints / UI
diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js
index 9057f9ac88..3d4e09fda3 100644
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@@ -444,6 +444,7 @@ const sidebars = {
items: [
"tutorials/openweb_ui",
"tutorials/msft_sso",
+ "tutorials/tag_management",
'tutorials/litellm_proxy_aporia',
{
type: "category",
diff --git a/litellm/proxy/proxy_config.yaml b/litellm/proxy/proxy_config.yaml
index 23de923db7..1d32d2d71e 100644
--- a/litellm/proxy/proxy_config.yaml
+++ b/litellm/proxy/proxy_config.yaml
@@ -19,3 +19,6 @@ litellm_settings:
success_callback: ["langfuse", "s3"]
langfuse_secret: secret-workflows-key
langfuse_public_key: public-workflows-key
+
+router_settings:
+ enable_tag_filtering: True # 👈 Key Change
\ No newline at end of file