diff --git a/docs/my-website/docs/providers/vertex.md b/docs/my-website/docs/providers/vertex.md index cdd3fce6c6..476cc8a453 100644 --- a/docs/my-website/docs/providers/vertex.md +++ b/docs/my-website/docs/providers/vertex.md @@ -347,7 +347,7 @@ Return a `list[Recipe]` completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" }) ``` -### **Grounding** +### **Grounding - Web Search** Add Google Search Result grounding to vertex ai calls. @@ -358,7 +358,7 @@ See the grounding metadata with `response_obj._hidden_params["vertex_ai_groundin -```python +```python showLineNumbers from litellm import completion ## SETUP ENVIRONMENT @@ -377,14 +377,36 @@ print(resp) -```bash + + + +```python showLineNumbers +from openai import OpenAI + +client = OpenAI( + api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys + base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy +) + +response = client.chat.completions.create( + model="gemini-pro", + messages=[{"role": "user", "content": "Who won the world cup?"}], + tools=[{"googleSearchRetrieval": {}}], +) + +print(response) +``` + + + +```bash showLineNumbers curl http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-1234" \ -d '{ "model": "gemini-pro", "messages": [ - {"role": "user", "content": "Hello, Claude!"} + {"role": "user", "content": "Who won the world cup?"} ], "tools": [ { @@ -394,12 +416,82 @@ curl http://localhost:4000/v1/chat/completions \ }' ``` + + You can also use the `enterpriseWebSearch` tool for an [enterprise compliant search](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/web-grounding-enterprise). + + + +```python showLineNumbers +from litellm import completion + +## SETUP ENVIRONMENT +# !gcloud auth application-default login - run this to add vertex credentials to your env + +tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH + +resp = litellm.completion( + model="vertex_ai/gemini-1.0-pro-001", + messages=[{"role": "user", "content": "Who won the world cup?"}], + tools=tools, + ) + +print(resp) +``` + + + + + + +```python showLineNumbers +from openai import OpenAI + +client = OpenAI( + api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys + base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy +) + +response = client.chat.completions.create( + model="gemini-pro", + messages=[{"role": "user", "content": "Who won the world cup?"}], + tools=[{"enterpriseWebSearch": {}}], +) + +print(response) +``` + + + +```bash showLineNumbers +curl http://localhost:4000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer sk-1234" \ + -d '{ + "model": "gemini-pro", + "messages": [ + {"role": "user", "content": "Who won the world cup?"} + ], + "tools": [ + { + "enterpriseWebSearch": {} + } + ] + }' + +``` + + + + + + + #### **Moving from Vertex AI SDK to LiteLLM (GROUNDING)** diff --git a/docs/my-website/docs/providers/xai.md b/docs/my-website/docs/providers/xai.md index a951b6bb9e..49a3640991 100644 --- a/docs/my-website/docs/providers/xai.md +++ b/docs/my-website/docs/providers/xai.md @@ -176,3 +176,81 @@ Here's how to call a XAI model with the LiteLLM Proxy Server +## Reasoning Usage + +LiteLLM supports reasoning usage for xAI models. + + + + + +```python showLineNumbers title="reasoning with xai/grok-3-mini-beta" +import litellm +response = litellm.completion( + model="xai/grok-3-mini-beta", + messages=[{"role": "user", "content": "What is 101*3?"}], + reasoning_effort="low", +) + +print("Reasoning Content:") +print(response.choices[0].message.reasoning_content) + +print("\nFinal Response:") +print(completion.choices[0].message.content) + +print("\nNumber of completion tokens (input):") +print(completion.usage.completion_tokens) + +print("\nNumber of reasoning tokens (input):") +print(completion.usage.completion_tokens_details.reasoning_tokens) +``` + + + + +```python showLineNumbers title="reasoning with xai/grok-3-mini-beta" +import openai +client = openai.OpenAI( + api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys + base_url="http://0.0.0.0:4000" # litellm-proxy-base url +) + +response = client.chat.completions.create( + model="xai/grok-3-mini-beta", + messages=[{"role": "user", "content": "What is 101*3?"}], + reasoning_effort="low", +) + +print("Reasoning Content:") +print(response.choices[0].message.reasoning_content) + +print("\nFinal Response:") +print(completion.choices[0].message.content) + +print("\nNumber of completion tokens (input):") +print(completion.usage.completion_tokens) + +print("\nNumber of reasoning tokens (input):") +print(completion.usage.completion_tokens_details.reasoning_tokens) +``` + + + + +**Example Response:** + +```shell +Reasoning Content: +Let me calculate 101 multiplied by 3: +101 * 3 = 303. +I can double-check that: 100 * 3 is 300, and 1 * 3 is 3, so 300 + 3 = 303. Yes, that's correct. + +Final Response: +The result of 101 multiplied by 3 is 303. + +Number of completion tokens (input): +14 + +Number of reasoning tokens (input): +310 +``` diff --git a/docs/my-website/docs/tutorials/tag_management.md b/docs/my-website/docs/tutorials/tag_management.md new file mode 100644 index 0000000000..9b00db47d1 --- /dev/null +++ b/docs/my-website/docs/tutorials/tag_management.md @@ -0,0 +1,145 @@ +import Image from '@theme/IdealImage'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# [Beta] Routing based on request metadata + +Create routing rules based on request metadata. + +## Setup + +Add the following to your litellm proxy config yaml file. + +```yaml showLineNumbers title="litellm proxy config.yaml" +router_settings: + enable_tag_filtering: True # 👈 Key Change +``` + +## 1. Create a tag + +On the LiteLLM UI, navigate to Experimental > Tag Management > Create Tag. + +Create a tag called `private-data` and only select the allowed models for requests with this tag. Once created, you will see the tag in the Tag Management page. + + + + +## 2. Test Tag Routing + +Now we will test the tag based routing rules. + +### 2.1 Invalid model + +This request will fail since we send `tags=private-data` but the model `gpt-4o` is not in the allowed models for the `private-data` tag. + + + +
+ +Here is an example sending the same request using the OpenAI Python SDK. + + + +```python showLineNumbers +from openai import OpenAI + +client = OpenAI( + api_key="sk-1234", + base_url="http://0.0.0.0:4000/v1/" +) + +response = client.chat.completions.create( + model="gpt-4o", + messages=[ + {"role": "user", "content": "Hello, how are you?"} + ], + extra_body={ + "tags": "private-data" + } +) +``` + + + + +```bash +curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \ +-H 'Content-Type: application/json' \ +-H 'Authorization: Bearer sk-1234' \ +-d '{ + "model": "gpt-4o", + "messages": [ + { + "role": "user", + "content": "Hello, how are you?" + } + ], + "tags": "private-data" +}' +``` + + + + +
+ +### 2.2 Valid model + +This request will succeed since we send `tags=private-data` and the model `us.anthropic.claude-3-7-sonnet-20250219-v1:0` is in the allowed models for the `private-data` tag. + + + +Here is an example sending the same request using the OpenAI Python SDK. + + + + +```python showLineNumbers +from openai import OpenAI + +client = OpenAI( + api_key="sk-1234", + base_url="http://0.0.0.0:4000/v1/" +) + +response = client.chat.completions.create( + model="us.anthropic.claude-3-7-sonnet-20250219-v1:0", + messages=[ + {"role": "user", "content": "Hello, how are you?"} + ], + extra_body={ + "tags": "private-data" + } +) +``` + + + + +```bash +curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \ +-H 'Content-Type: application/json' \ +-H 'Authorization: Bearer sk-1234' \ +-d '{ + "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", + "messages": [ + { + "role": "user", + "content": "Hello, how are you?" + } + ], + "tags": "private-data" +}' +``` + + + + + + +## Additional Tag Features +- [Sending tags in request headers](https://docs.litellm.ai/docs/proxy/tag_routing#calling-via-request-header) +- [Tag based routing](https://docs.litellm.ai/docs/proxy/tag_routing) +- [Track spend per tag](cost_tracking#-custom-tags) +- [Setup Budgets per Virtual Key, Team](users) + diff --git a/docs/my-website/img/tag_create.png b/docs/my-website/img/tag_create.png new file mode 100644 index 0000000000..d515b3a9f4 Binary files /dev/null and b/docs/my-website/img/tag_create.png differ diff --git a/docs/my-website/img/tag_invalid.png b/docs/my-website/img/tag_invalid.png new file mode 100644 index 0000000000..e12f7197b1 Binary files /dev/null and b/docs/my-website/img/tag_invalid.png differ diff --git a/docs/my-website/img/tag_valid.png b/docs/my-website/img/tag_valid.png new file mode 100644 index 0000000000..3b6e121d12 Binary files /dev/null and b/docs/my-website/img/tag_valid.png differ diff --git a/docs/my-website/release_notes/v1.66.0-stable/index.md b/docs/my-website/release_notes/v1.66.0-stable/index.md index 503970d1ee..65024792cd 100644 --- a/docs/my-website/release_notes/v1.66.0-stable/index.md +++ b/docs/my-website/release_notes/v1.66.0-stable/index.md @@ -46,7 +46,8 @@ v1.66.0-stable is live now, here are the key highlights of this release ## Key Highlights - **Microsoft SSO Auto-sync**: Auto-sync groups and group members from Azure Entra ID to LiteLLM - **Unified File IDs**: Use the same file id across LLM API providers. -- **New Models**: `xAI grok-3` support, `realtime api` cost tracking and logging +- **Realtime API Cost Tracking**: Track cost of realtime api calls +- **xAI grok-3**: Added support for `xai/grok-3` models - **Security Fixes**: Fixed [CVE-2025-0330](https://www.cve.org/CVERecord?id=CVE-2025-0330) and [CVE-2024-6825](https://www.cve.org/CVERecord?id=CVE-2024-6825) vulnerabilities Let's dive in. @@ -79,7 +80,7 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso) 2. Added reasoning_effort support for `xai/grok-3-mini-beta` model family [PR](https://github.com/BerriAI/litellm/pull/9932) - Hugging Face - 1. Hugging Face - Added inference providers support [PR](https://github.com/BerriAI/litellm/pull/9773) + 1. Hugging Face - Added inference providers support [Getting Started](https://docs.litellm.ai/docs/providers/huggingface#serverless-inference-providers) - Azure 1. Azure - Added azure/gpt-4o-realtime-audio cost tracking [PR](https://github.com/BerriAI/litellm/pull/9893) @@ -109,13 +110,15 @@ Get started with this [here](https://docs.litellm.ai/docs/tutorials/msft_sso) ## Spend Tracking Improvements - -1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795) -2. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834) -3. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897) -4. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838) -5. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843) -6. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855) +- OpenAI, Azure + 1. Realtime API Cost tracking with token usage metrics in spend logs [PR](https://github.com/BerriAI/litellm/pull/9795) +- Anthropic + 1. Fixed Claude Haiku cache read pricing per token [PR](https://github.com/BerriAI/litellm/pull/9834) + 2. Added cost tracking for Claude responses with base_model [PR](https://github.com/BerriAI/litellm/pull/9897) + 3. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db [PR](https://github.com/BerriAI/litellm/pull/9838) +- General + 1. Added token tracking and log usage object in spend logs [PR](https://github.com/BerriAI/litellm/pull/9843) + 2. Handle custom pricing at deployment level [PR](https://github.com/BerriAI/litellm/pull/9855) ## Management Endpoints / UI diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js index 9057f9ac88..3d4e09fda3 100644 --- a/docs/my-website/sidebars.js +++ b/docs/my-website/sidebars.js @@ -444,6 +444,7 @@ const sidebars = { items: [ "tutorials/openweb_ui", "tutorials/msft_sso", + "tutorials/tag_management", 'tutorials/litellm_proxy_aporia', { type: "category", diff --git a/litellm/proxy/proxy_config.yaml b/litellm/proxy/proxy_config.yaml index 23de923db7..1d32d2d71e 100644 --- a/litellm/proxy/proxy_config.yaml +++ b/litellm/proxy/proxy_config.yaml @@ -19,3 +19,6 @@ litellm_settings: success_callback: ["langfuse", "s3"] langfuse_secret: secret-workflows-key langfuse_public_key: public-workflows-key + +router_settings: + enable_tag_filtering: True # 👈 Key Change \ No newline at end of file