mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-24 18:24:20 +00:00
[Docs] v1.66.0-stable fixes (#9953)
* add categories for spend tracking improvements * xai reasoning usage * docs tag management * docs tag based routing * [Beta] Routing based * docs tag based routing * docs tag routing * docs enterprise web search
This commit is contained in:
parent
eb998ee1c0
commit
c86e678809
9 changed files with 335 additions and 13 deletions
|
@ -347,7 +347,7 @@ Return a `list[Recipe]`
|
|||
completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })
|
||||
```
|
||||
|
||||
### **Grounding**
|
||||
### **Grounding - Web Search**
|
||||
|
||||
Add Google Search Result grounding to vertex ai calls.
|
||||
|
||||
|
@ -358,7 +358,7 @@ See the grounding metadata with `response_obj._hidden_params["vertex_ai_groundin
|
|||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
```python showLineNumbers
|
||||
from litellm import completion
|
||||
|
||||
## SETUP ENVIRONMENT
|
||||
|
@ -377,14 +377,36 @@ print(resp)
|
|||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
```bash
|
||||
<Tabs>
|
||||
<TabItem value="openai" label="OpenAI Python SDK">
|
||||
|
||||
```python showLineNumbers
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
|
||||
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gemini-pro",
|
||||
messages=[{"role": "user", "content": "Who won the world cup?"}],
|
||||
tools=[{"googleSearchRetrieval": {}}],
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
</TabItem>
|
||||
<TabItem value="curl" label="cURL">
|
||||
|
||||
```bash showLineNumbers
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "gemini-pro",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello, Claude!"}
|
||||
{"role": "user", "content": "Who won the world cup?"}
|
||||
],
|
||||
"tools": [
|
||||
{
|
||||
|
@ -394,12 +416,82 @@ curl http://localhost:4000/v1/chat/completions \
|
|||
}'
|
||||
|
||||
```
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
You can also use the `enterpriseWebSearch` tool for an [enterprise compliant search](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/web-grounding-enterprise).
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python showLineNumbers
|
||||
from litellm import completion
|
||||
|
||||
## SETUP ENVIRONMENT
|
||||
# !gcloud auth application-default login - run this to add vertex credentials to your env
|
||||
|
||||
tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH
|
||||
|
||||
resp = litellm.completion(
|
||||
model="vertex_ai/gemini-1.0-pro-001",
|
||||
messages=[{"role": "user", "content": "Who won the world cup?"}],
|
||||
tools=tools,
|
||||
)
|
||||
|
||||
print(resp)
|
||||
```
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="openai" label="OpenAI Python SDK">
|
||||
|
||||
```python showLineNumbers
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
|
||||
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gemini-pro",
|
||||
messages=[{"role": "user", "content": "Who won the world cup?"}],
|
||||
tools=[{"enterpriseWebSearch": {}}],
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
</TabItem>
|
||||
<TabItem value="curl" label="cURL">
|
||||
|
||||
```bash showLineNumbers
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "gemini-pro",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Who won the world cup?"}
|
||||
],
|
||||
"tools": [
|
||||
{
|
||||
"enterpriseWebSearch": {}
|
||||
}
|
||||
]
|
||||
}'
|
||||
|
||||
```
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
#### **Moving from Vertex AI SDK to LiteLLM (GROUNDING)**
|
||||
|
||||
|
||||
|
|
|
@ -176,3 +176,81 @@ Here's how to call a XAI model with the LiteLLM Proxy Server
|
|||
</Tabs>
|
||||
|
||||
|
||||
## Reasoning Usage
|
||||
|
||||
LiteLLM supports reasoning usage for xAI models.
|
||||
|
||||
<Tabs>
|
||||
|
||||
<TabItem value="python" label="LiteLLM Python SDK">
|
||||
|
||||
```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
|
||||
import litellm
|
||||
response = litellm.completion(
|
||||
model="xai/grok-3-mini-beta",
|
||||
messages=[{"role": "user", "content": "What is 101*3?"}],
|
||||
reasoning_effort="low",
|
||||
)
|
||||
|
||||
print("Reasoning Content:")
|
||||
print(response.choices[0].message.reasoning_content)
|
||||
|
||||
print("\nFinal Response:")
|
||||
print(completion.choices[0].message.content)
|
||||
|
||||
print("\nNumber of completion tokens (input):")
|
||||
print(completion.usage.completion_tokens)
|
||||
|
||||
print("\nNumber of reasoning tokens (input):")
|
||||
print(completion.usage.completion_tokens_details.reasoning_tokens)
|
||||
```
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="curl" label="LiteLLM Proxy - OpenAI SDK Usage">
|
||||
|
||||
```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
|
||||
import openai
|
||||
client = openai.OpenAI(
|
||||
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
|
||||
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="xai/grok-3-mini-beta",
|
||||
messages=[{"role": "user", "content": "What is 101*3?"}],
|
||||
reasoning_effort="low",
|
||||
)
|
||||
|
||||
print("Reasoning Content:")
|
||||
print(response.choices[0].message.reasoning_content)
|
||||
|
||||
print("\nFinal Response:")
|
||||
print(completion.choices[0].message.content)
|
||||
|
||||
print("\nNumber of completion tokens (input):")
|
||||
print(completion.usage.completion_tokens)
|
||||
|
||||
print("\nNumber of reasoning tokens (input):")
|
||||
print(completion.usage.completion_tokens_details.reasoning_tokens)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
**Example Response:**
|
||||
|
||||
```shell
|
||||
Reasoning Content:
|
||||
Let me calculate 101 multiplied by 3:
|
||||
101 * 3 = 303.
|
||||
I can double-check that: 100 * 3 is 300, and 1 * 3 is 3, so 300 + 3 = 303. Yes, that's correct.
|
||||
|
||||
Final Response:
|
||||
The result of 101 multiplied by 3 is 303.
|
||||
|
||||
Number of completion tokens (input):
|
||||
14
|
||||
|
||||
Number of reasoning tokens (input):
|
||||
310
|
||||
```
|
||||
|
|
145
docs/my-website/docs/tutorials/tag_management.md
Normal file
145
docs/my-website/docs/tutorials/tag_management.md
Normal file
|
@ -0,0 +1,145 @@
|
|||
import Image from '@theme/IdealImage';
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# [Beta] Routing based on request metadata
|
||||
|
||||
Create routing rules based on request metadata.
|
||||
|
||||
## Setup
|
||||
|
||||
Add the following to your litellm proxy config yaml file.
|
||||
|
||||
```yaml showLineNumbers title="litellm proxy config.yaml"
|
||||
router_settings:
|
||||
enable_tag_filtering: True # 👈 Key Change
|
||||
```
|
||||
|
||||
## 1. Create a tag
|
||||
|
||||
On the LiteLLM UI, navigate to Experimental > Tag Management > Create Tag.
|
||||
|
||||
Create a tag called `private-data` and only select the allowed models for requests with this tag. Once created, you will see the tag in the Tag Management page.
|
||||
|
||||
<Image img={require('../../img/tag_create.png')} style={{ width: '800px', height: 'auto' }} />
|
||||
|
||||
|
||||
## 2. Test Tag Routing
|
||||
|
||||
Now we will test the tag based routing rules.
|
||||
|
||||
### 2.1 Invalid model
|
||||
|
||||
This request will fail since we send `tags=private-data` but the model `gpt-4o` is not in the allowed models for the `private-data` tag.
|
||||
|
||||
<Image img={require('../../img/tag_invalid.png')} style={{ width: '800px', height: 'auto' }} />
|
||||
|
||||
<br />
|
||||
|
||||
Here is an example sending the same request using the OpenAI Python SDK.
|
||||
<Tabs>
|
||||
<TabItem value="python" label="OpenAI Python SDK">
|
||||
|
||||
```python showLineNumbers
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key="sk-1234",
|
||||
base_url="http://0.0.0.0:4000/v1/"
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o",
|
||||
messages=[
|
||||
{"role": "user", "content": "Hello, how are you?"}
|
||||
],
|
||||
extra_body={
|
||||
"tags": "private-data"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="curl" label="cURL">
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-d '{
|
||||
"model": "gpt-4o",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello, how are you?"
|
||||
}
|
||||
],
|
||||
"tags": "private-data"
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
<br />
|
||||
|
||||
### 2.2 Valid model
|
||||
|
||||
This request will succeed since we send `tags=private-data` and the model `us.anthropic.claude-3-7-sonnet-20250219-v1:0` is in the allowed models for the `private-data` tag.
|
||||
|
||||
<Image img={require('../../img/tag_valid.png')} style={{ width: '800px', height: 'auto' }} />
|
||||
|
||||
Here is an example sending the same request using the OpenAI Python SDK.
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="python" label="OpenAI Python SDK">
|
||||
|
||||
```python showLineNumbers
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key="sk-1234",
|
||||
base_url="http://0.0.0.0:4000/v1/"
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
|
||||
messages=[
|
||||
{"role": "user", "content": "Hello, how are you?"}
|
||||
],
|
||||
extra_body={
|
||||
"tags": "private-data"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="curl" label="cURL">
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-d '{
|
||||
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello, how are you?"
|
||||
}
|
||||
],
|
||||
"tags": "private-data"
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
|
||||
## Additional Tag Features
|
||||
- [Sending tags in request headers](https://docs.litellm.ai/docs/proxy/tag_routing#calling-via-request-header)
|
||||
- [Tag based routing](https://docs.litellm.ai/docs/proxy/tag_routing)
|
||||
- [Track spend per tag](cost_tracking#-custom-tags)
|
||||
- [Setup Budgets per Virtual Key, Team](users)
|
||||
|
BIN
docs/my-website/img/tag_create.png
Normal file
BIN
docs/my-website/img/tag_create.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 250 KiB |
BIN
docs/my-website/img/tag_invalid.png
Normal file
BIN
docs/my-website/img/tag_invalid.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 237 KiB |
BIN
docs/my-website/img/tag_valid.png
Normal file
BIN
docs/my-website/img/tag_valid.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 319 KiB |
|
@ -46,7 +46,8 @@ v1.66.0-stable is live now, here are the key highlights of this release
|
|||
## Key Highlights
|
||||
- **Microsoft SSO Auto-sync**: Auto-sync groups and group members from Azure Entra ID to LiteLLM
|
||||
- **Unified File IDs**: Use the same file id across LLM API providers.
|
||||
- **New Models**: `xAI grok-3` support, `realtime api` cost tracking and logging
|
||||
- **Realtime API Cost Tracking**: Track cost of realtime api calls
|
||||
- **xAI grok-3**: Added support for `xai/grok-3` models
|
||||
- **Security Fixes**: Fixed [CVE-2025-0330](https://www.cve.org/CVERecord?id=CVE-2025-0330) and [CVE-2024-6825](https://www.cve.org/CVERecord?id=CVE-2024-6825) vulnerabilities
|
||||
|
||||
Let's dive in.
|
||||
|
@ -79,7 +80,7 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)
|
|||
2. Added reasoning_effort support for `xai/grok-3-mini-beta` model family [PR](https://github.com/BerriAI/litellm/pull/9932)
|
||||
|
||||
- Hugging Face
|
||||
1. Hugging Face - Added inference providers support [PR](https://github.com/BerriAI/litellm/pull/9773)
|
||||
1. Hugging Face - Added inference providers support [Getting Started](https://docs.litellm.ai/docs/providers/huggingface#serverless-inference-providers)
|
||||
|
||||
- Azure
|
||||
1. Azure - Added azure/gpt-4o-realtime-audio cost tracking [PR](https://github.com/BerriAI/litellm/pull/9893)
|
||||
|
@ -109,13 +110,15 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso)
|
|||
|
||||
|
||||
## Spend Tracking Improvements
|
||||
|
||||
1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
|
||||
2. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
|
||||
3. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
|
||||
4. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
|
||||
5. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
|
||||
6. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)
|
||||
- OpenAI, Azure
|
||||
1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795)
|
||||
- Anthropic
|
||||
1. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834)
|
||||
2. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897)
|
||||
3. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838)
|
||||
- General
|
||||
1. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843)
|
||||
2. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855)
|
||||
|
||||
|
||||
## Management Endpoints / UI
|
||||
|
|
|
@ -444,6 +444,7 @@ const sidebars = {
|
|||
items: [
|
||||
"tutorials/openweb_ui",
|
||||
"tutorials/msft_sso",
|
||||
"tutorials/tag_management",
|
||||
'tutorials/litellm_proxy_aporia',
|
||||
{
|
||||
type: "category",
|
||||
|
|
|
@ -19,3 +19,6 @@ litellm_settings:
|
|||
success_callback: ["langfuse", "s3"]
|
||||
langfuse_secret: secret-workflows-key
|
||||
langfuse_public_key: public-workflows-key
|
||||
|
||||
router_settings:
|
||||
enable_tag_filtering: True # 👈 Key Change
|
Loading…
Add table
Add a link
Reference in a new issue