[Docs] v1.66.0-stable fixes (#9953)

* add categories for spend tracking improvements

* xai reasoning usage

* docs tag management

* docs tag based routing

* [Beta] Routing based

* docs tag based routing

* docs tag routing

* docs enterprise web search
This commit is contained in:
Ishaan Jaff 2025-04-12 16:57:25 -07:00 committed by GitHub
parent eb998ee1c0
commit c86e678809
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 335 additions and 13 deletions

View file

@ -347,7 +347,7 @@ Return a `list[Recipe]`
completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })
```
### **Grounding**
### **Grounding - Web Search**
Add Google Search Result grounding to vertex ai calls.
@ -358,7 +358,7 @@ See the grounding metadata with `response_obj._hidden_params["vertex_ai_groundin
<Tabs>
<TabItem value="sdk" label="SDK">
```python
```python showLineNumbers
from litellm import completion
## SETUP ENVIRONMENT
@ -377,14 +377,36 @@ print(resp)
</TabItem>
<TabItem value="proxy" label="PROXY">
```bash
<Tabs>
<TabItem value="openai" label="OpenAI Python SDK">
```python showLineNumbers
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-pro",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=[{"googleSearchRetrieval": {}}],
)
print(response)
```
</TabItem>
<TabItem value="curl" label="cURL">
```bash showLineNumbers
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-pro",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
{"role": "user", "content": "Who won the world cup?"}
],
"tools": [
{
@ -394,12 +416,82 @@ curl http://localhost:4000/v1/chat/completions \
}'
```
</TabItem>
</Tabs>
</TabItem>
</Tabs>
You can also use the `enterpriseWebSearch` tool for an [enterprise compliant search](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/web-grounding-enterprise).
<Tabs>
<TabItem value="sdk" label="SDK">
```python showLineNumbers
from litellm import completion
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH
resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
)
print(resp)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
<Tabs>
<TabItem value="openai" label="OpenAI Python SDK">
```python showLineNumbers
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-pro",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=[{"enterpriseWebSearch": {}}],
)
print(response)
```
</TabItem>
<TabItem value="curl" label="cURL">
```bash showLineNumbers
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-pro",
"messages": [
{"role": "user", "content": "Who won the world cup?"}
],
"tools": [
{
"enterpriseWebSearch": {}
}
]
}'
```
</TabItem>
</Tabs>
</TabItem>
</Tabs>
#### **Moving from Vertex AI SDK to LiteLLM (GROUNDING)**

View file

@ -176,3 +176,81 @@ Here's how to call a XAI model with the LiteLLM Proxy Server
</Tabs>
## Reasoning Usage
LiteLLM supports reasoning usage for xAI models.
<Tabs>
<TabItem value="python" label="LiteLLM Python SDK">
```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
import litellm
response = litellm.completion(
model="xai/grok-3-mini-beta",
messages=[{"role": "user", "content": "What is 101*3?"}],
reasoning_effort="low",
)
print("Reasoning Content:")
print(response.choices[0].message.reasoning_content)
print("\nFinal Response:")
print(completion.choices[0].message.content)
print("\nNumber of completion tokens (input):")
print(completion.usage.completion_tokens)
print("\nNumber of reasoning tokens (input):")
print(completion.usage.completion_tokens_details.reasoning_tokens)
```
</TabItem>
<TabItem value="curl" label="LiteLLM Proxy - OpenAI SDK Usage">
```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="xai/grok-3-mini-beta",
messages=[{"role": "user", "content": "What is 101*3?"}],
reasoning_effort="low",
)
print("Reasoning Content:")
print(response.choices[0].message.reasoning_content)
print("\nFinal Response:")
print(completion.choices[0].message.content)
print("\nNumber of completion tokens (input):")
print(completion.usage.completion_tokens)
print("\nNumber of reasoning tokens (input):")
print(completion.usage.completion_tokens_details.reasoning_tokens)
```
</TabItem>
</Tabs>
**Example Response:**
```shell
Reasoning Content:
Let me calculate 101 multiplied by 3:
101 * 3 = 303.
I can double-check that: 100 * 3 is 300, and 1 * 3 is 3, so 300 + 3 = 303. Yes, that's correct.
Final Response:
The result of 101 multiplied by 3 is 303.
Number of completion tokens (input):
14
Number of reasoning tokens (input):
310
```

View file

@ -0,0 +1,145 @@
import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# [Beta] Routing based on request metadata
Create routing rules based on request metadata.
## Setup
Add the following to your litellm proxy config yaml file.
```yaml showLineNumbers title="litellm proxy config.yaml"
router_settings:
enable_tag_filtering: True # 👈 Key Change
```
## 1. Create a tag
On the LiteLLM UI, navigate to Experimental > Tag Management > Create Tag.
Create a tag called `private-data` and only select the allowed models for requests with this tag. Once created, you will see the tag in the Tag Management page.
<Image img={require('../../img/tag_create.png')} style={{ width: '800px', height: 'auto' }} />
## 2. Test Tag Routing
Now we will test the tag based routing rules.
### 2.1 Invalid model
This request will fail since we send `tags=private-data` but the model `gpt-4o` is not in the allowed models for the `private-data` tag.
<Image img={require('../../img/tag_invalid.png')} style={{ width: '800px', height: 'auto' }} />
<br />
Here is an example sending the same request using the OpenAI Python SDK.
<Tabs>
<TabItem value="python" label="OpenAI Python SDK">
```python showLineNumbers
from openai import OpenAI
client = OpenAI(
api_key="sk-1234",
base_url="http://0.0.0.0:4000/v1/"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
extra_body={
"tags": "private-data"
}
)
```
</TabItem>
<TabItem value="curl" label="cURL">
```bash
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"tags": "private-data"
}'
```
</TabItem>
</Tabs>
<br />
### 2.2 Valid model
This request will succeed since we send `tags=private-data` and the model `us.anthropic.claude-3-7-sonnet-20250219-v1:0` is in the allowed models for the `private-data` tag.
<Image img={require('../../img/tag_valid.png')} style={{ width: '800px', height: 'auto' }} />
Here is an example sending the same request using the OpenAI Python SDK.
<Tabs>
<TabItem value="python" label="OpenAI Python SDK">
```python showLineNumbers
from openai import OpenAI
client = OpenAI(
api_key="sk-1234",
base_url="http://0.0.0.0:4000/v1/"
)
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
extra_body={
"tags": "private-data"
}
)
```
</TabItem>
<TabItem value="curl" label="cURL">
```bash
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"tags": "private-data"
}'
```
</TabItem>
</Tabs>
## Additional Tag Features
- [Sending tags in request headers](https://docs.litellm.ai/docs/proxy/tag_routing#calling-via-request-header)
- [Tag based routing](https://docs.litellm.ai/docs/proxy/tag_routing)
- [Track spend per tag](cost_tracking#-custom-tags)
- [Setup Budgets per Virtual Key, Team](users)

Binary file not shown.

After

Width:  |  Height:  |  Size: 250 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 319 KiB

View file

@ -46,7 +46,8 @@ v1.66.0-stable is live now, here are the key highlights of this release
## Key Highlights
- **Microsoft SSO Auto-sync**: Auto-sync groups and group members from Azure Entra ID to LiteLLM
- **Unified File IDs**: Use the same file id across LLM API providers.
- **New Models**: `xAI grok-3` support, `realtime api` cost tracking and logging
- **Realtime API Cost Tracking**: Track cost of realtime api calls
- **xAI grok-3**: Added support for `xai/grok-3` models
- **Security Fixes**: Fixed [CVE-2025-0330](https://www.cve.org/CVERecord?id=CVE-2025-0330) and [CVE-2024-6825](https://www.cve.org/CVERecord?id=CVE-2024-6825) vulnerabilities
Let's dive in.
@ -79,7 +80,7 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)
2. Added reasoning_effort support for `xai/grok-3-mini-beta` model family [PR](https://github.com/BerriAI/litellm/pull/9932)
- Hugging Face
1. Hugging Face - Added inference providers support [PR](https://github.com/BerriAI/litellm/pull/9773)
1. Hugging Face - Added inference providers support [Getting Started](https://docs.litellm.ai/docs/providers/huggingface#serverless-inference-providers)
- Azure
1. Azure - Added azure/gpt-4o-realtime-audio cost tracking [PR](https://github.com/BerriAI/litellm/pull/9893)
@ -109,13 +110,15 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)
## Spend Tracking Improvements
1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
2. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
3. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
4. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
5. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
6. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)
- OpenAI, Azure
1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
- Anthropic
1. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
2. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
3. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
- General
1. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
2. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)
## Management Endpoints / UI

View file

@ -444,6 +444,7 @@ const sidebars = {
items: [
"tutorials/openweb_ui",
"tutorials/msft_sso",
"tutorials/tag_management",
'tutorials/litellm_proxy_aporia',
{
type: "category",

View file

@ -19,3 +19,6 @@ litellm_settings:
success_callback: ["langfuse", "s3"]
langfuse_secret: secret-workflows-key
langfuse_public_key: public-workflows-key
router_settings:
enable_tag_filtering: True # 👈 Key Change