Fix UI Flicker in Dashboard (#10261 )

test: handle service unavailable error
bump: version 1.67.2 → 1.67.3
2025-04-24 18:24:20 +00:00 · 2025-04-23 23:27:44 -07:00 · 2025-04-23 22:10:46 -07:00 · 2025-04-23 22:09:25 -07:00 · 2025-04-23 22:09:14 -07:00 · 2025-04-23 22:02:02 -07:00
240 changed files with 9293 additions and 882 deletions
--- a/.env.example
+++ b/.env.example
@ -20,6 +20,8 @@ REPLICATE_API_TOKEN = ""
 ANTHROPIC_API_KEY = ""
 # Infisical
 INFISICAL_TOKEN = ""
+# INFINITY
+INFINITY_API_KEY = ""

 # Development Configs
 LITELLM_MASTER_KEY = "sk-1234"
--- a/.gitignore
+++ b/.gitignore
@ -86,4 +86,5 @@ litellm/proxy/db/migrations/0_init/migration.sql
 litellm/proxy/db/migrations/*
 litellm/proxy/migrations/*config.yaml
 litellm/proxy/migrations/*
+config.yaml
 tests/litellm/litellm_core_utils/llm_cost_calc/log.txt
--- a/deploy/charts/litellm-helm/templates/migrations-job.yaml
+++ b/deploy/charts/litellm-helm/templates/migrations-job.yaml
@ -16,6 +16,7 @@ spec:
        {{- toYaml . | nindent 8 }}
        {{- end }}
    spec:
+      serviceAccountName: {{ include "litellm.serviceAccountName" . }}
      containers:
        - name: prisma-migrations
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default (printf "main-%s" .Chart.AppVersion) }}"
--- a/docs/my-website/docs/completion/audio.md
+++ b/docs/my-website/docs/completion/audio.md
@ -3,7 +3,7 @@ import TabItem from '@theme/TabItem';

 # Using Audio Models

-How to send / receieve audio to a `/chat/completions` endpoint
+How to send / receive audio to a `/chat/completions` endpoint


 ## Audio Output from a model
--- a/docs/my-website/docs/completion/document_understanding.md
+++ b/docs/my-website/docs/completion/document_understanding.md
@ -3,7 +3,7 @@ import TabItem from '@theme/TabItem';

 # Using PDF Input

-How to send / receieve pdf's (other document types) to a `/chat/completions` endpoint
+How to send / receive pdf's (other document types) to a `/chat/completions` endpoint

 Works for:
 - Vertex AI models (Gemini + Anthropic)
--- a/docs/my-website/docs/completion/vision.md
+++ b/docs/my-website/docs/completion/vision.md
@ -194,7 +194,7 @@ Expected Response

 ## Explicitly specify image type 

-If you have images without a mime-type, or if litellm is incorrectly inferring the mime type of your image (e.g. calling `gs://` url's with vertex ai), you can set this explicity via the `format` param. 
+If you have images without a mime-type, or if litellm is incorrectly inferring the mime type of your image (e.g. calling `gs://` url's with vertex ai), you can set this explicitly via the `format` param. 

 ```python
 "image_url": {
--- a/docs/my-website/docs/image_generation.md
+++ b/docs/my-website/docs/image_generation.md
@ -20,9 +20,9 @@ print(f"response: {response}")

 ```yaml
 model_list:
-  - model_name: dall-e-2 ### RECEIVED MODEL NAME ###
+  - model_name: gpt-image-1 ### RECEIVED MODEL NAME ###
    litellm_params: # all params accepted by litellm.image_generation()
-      model: azure/dall-e-2 ### MODEL NAME sent to `litellm.image_generation()` ###
+      model: azure/gpt-image-1 ### MODEL NAME sent to `litellm.image_generation()` ###
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
      rpm: 6      # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
@ -47,7 +47,7 @@ curl -X POST 'http://0.0.0.0:4000/v1/images/generations' \
 -H 'Content-Type: application/json' \
 -H 'Authorization: Bearer sk-1234' \
 -D '{
-    "model": "dall-e-2",
+    "model": "gpt-image-1",
    "prompt": "A cute baby sea otter",
    "n": 1,
    "size": "1024x1024"
@ -104,7 +104,7 @@ Any non-openai params, will be treated as provider-specific params, and sent in
    litellm_logging_obj=None,
    custom_llm_provider=None,

- `model`: *string (optional)* The model to use for image generation. Defaults to openai/dall-e-2
+- `model`: *string (optional)* The model to use for image generation. Defaults to openai/gpt-image-1

 - `n`: *int (optional)* The number of images to generate. Must be between 1 and 10. For dall-e-3, only n=1 is supported.

@ -112,7 +112,7 @@ Any non-openai params, will be treated as provider-specific params, and sent in

 - `response_format`: *string (optional)* The format in which the generated images are returned. Must be one of url or b64_json.

- `size`: *string (optional)* The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
+- `size`: *string (optional)* The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for gpt-image-1. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.

 - `timeout`: *integer* - The maximum time, in seconds, to wait for the API to respond. Defaults to 600 seconds (10 minutes).

@ -148,13 +148,14 @@ Any non-openai params, will be treated as provider-specific params, and sent in
 from litellm import image_generation
 import os
 os.environ['OPENAI_API_KEY'] = ""
-response = image_generation(model='dall-e-2', prompt="cute baby otter")
+response = image_generation(model='gpt-image-1', prompt="cute baby otter")
 ```

 | Model Name           | Function Call                               | Required OS Variables                |
 |----------------------|---------------------------------------------|--------------------------------------|
-| dall-e-2 | `image_generation(model='dall-e-2', prompt="cute baby otter")` | `os.environ['OPENAI_API_KEY']`       |
+| gpt-image-1 | `image_generation(model='gpt-image-1', prompt="cute baby otter")` | `os.environ['OPENAI_API_KEY']`       |
 | dall-e-3 | `image_generation(model='dall-e-3', prompt="cute baby otter")` | `os.environ['OPENAI_API_KEY']`       |
+| dall-e-2 | `image_generation(model='dall-e-2', prompt="cute baby otter")` | `os.environ['OPENAI_API_KEY']`       |

 ## Azure OpenAI Image Generation Models

@ -182,8 +183,9 @@ print(response)

 | Model Name           | Function Call                               |
 |----------------------|---------------------------------------------|
-| dall-e-2 | `image_generation(model="azure/<your deployment name>", prompt="cute baby otter")` |
+| gpt-image-1 | `image_generation(model="azure/<your deployment name>", prompt="cute baby otter")` |
 | dall-e-3 | `image_generation(model="azure/<your deployment name>", prompt="cute baby otter")` |
+| dall-e-2 | `image_generation(model="azure/<your deployment name>", prompt="cute baby otter")` |


 ## OpenAI Compatible Image Generation Models
--- a/docs/my-website/docs/observability/agentops_integration.md
+++ b/docs/my-website/docs/observability/agentops_integration.md
@ -0,0 +1,83 @@
+# 🖇️ AgentOps - LLM Observability Platform
+
+:::tip
+
+This is community maintained. Please make an issue if you run into a bug:
+https://github.com/BerriAI/litellm
+
+:::
+
+[AgentOps](https://docs.agentops.ai) is an observability platform that enables tracing and monitoring of LLM calls, providing detailed insights into your AI operations.
+
+## Using AgentOps with LiteLLM
+
+LiteLLM provides `success_callbacks` and `failure_callbacks`, allowing you to easily integrate AgentOps for comprehensive tracing and monitoring of your LLM operations.
+
+### Integration
+
+Use just a few lines of code to instantly trace your responses **across all providers** with AgentOps:
+Get your AgentOps API Keys from https://app.agentops.ai/
+```python
+import litellm
+
+# Configure LiteLLM to use AgentOps
+litellm.success_callback = ["agentops"]
+
+# Make your LLM calls as usual
+response = litellm.completion(
+    model="gpt-3.5-turbo",
+    messages=[{"role": "user", "content": "Hello, how are you?"}],
+)
+```
+
+Complete Code:
+
+```python
+import os
+from litellm import completion
+
+# Set env variables
+os.environ["OPENAI_API_KEY"] = "your-openai-key"
+os.environ["AGENTOPS_API_KEY"] = "your-agentops-api-key"
+
+# Configure LiteLLM to use AgentOps
+litellm.success_callback = ["agentops"]
+
+# OpenAI call
+response = completion(
+    model="gpt-4",
+    messages=[{"role": "user", "content": "Hi 👋 - I'm OpenAI"}],
+)
+
+print(response)
+```
+
+### Configuration Options
+
+The AgentOps integration can be configured through environment variables:
+
+- `AGENTOPS_API_KEY` (str, optional): Your AgentOps API key
+- `AGENTOPS_ENVIRONMENT` (str, optional): Deployment environment (defaults to "production")
+- `AGENTOPS_SERVICE_NAME` (str, optional): Service name for tracing (defaults to "agentops")
+
+### Advanced Usage
+
+You can configure additional settings through environment variables:
+
+```python
+import os
+
+# Configure AgentOps settings
+os.environ["AGENTOPS_API_KEY"] = "your-agentops-api-key"
+os.environ["AGENTOPS_ENVIRONMENT"] = "staging"
+os.environ["AGENTOPS_SERVICE_NAME"] = "my-service"
+
+# Enable AgentOps tracing
+litellm.success_callback = ["agentops"]
+```
+
+### Support
+
+For issues or questions, please refer to:
+- [AgentOps Documentation](https://docs.agentops.ai)
+- [LiteLLM Documentation](https://docs.litellm.ai) 
--- a/docs/my-website/docs/observability/greenscale_integration.md
+++ b/docs/my-website/docs/observability/greenscale_integration.md
@ -53,7 +53,7 @@ response = completion(

 ## Additional information in metadata

-You can send any additional information to Greenscale by using the `metadata` field in completion and `greenscale_` prefix. This can be useful for sending metadata about the request, such as the project and application name, customer_id, enviornment, or any other information you want to track usage. `greenscale_project` and `greenscale_application` are required fields.
+You can send any additional information to Greenscale by using the `metadata` field in completion and `greenscale_` prefix. This can be useful for sending metadata about the request, such as the project and application name, customer_id, environment, or any other information you want to track usage. `greenscale_project` and `greenscale_application` are required fields.

 ```python
 #openai call with additional metadata
--- a/docs/my-website/docs/observability/langfuse_integration.md
+++ b/docs/my-website/docs/observability/langfuse_integration.md
@ -185,7 +185,7 @@ curl --location --request POST 'http://0.0.0.0:4000/chat/completions' \
 * `trace_release`  - Release for the trace, defaults to `None`
 * `trace_metadata` - Metadata for the trace, defaults to `None`
 * `trace_user_id`  - User identifier for the trace, defaults to completion argument `user`
-* `tags`           - Tags for the trace, defeaults to `None`
+* `tags`           - Tags for the trace, defaults to `None`

 ##### Updatable Parameters on Continuation

--- a/docs/my-website/docs/pass_through/cohere.md
+++ b/docs/my-website/docs/pass_through/cohere.md
@ -4,7 +4,7 @@ Pass-through endpoints for Cohere - call provider-specific endpoint, in native f

 | Feature | Supported | Notes | 
 |-------|-------|-------|
-| Cost Tracking | ✅ | works across all integrations |
+| Cost Tracking | ✅ | Supported for `/v1/chat`, and `/v2/chat` |
 | Logging | ✅ | works across all integrations |
 | End-user Tracking | ❌ | [Tell us if you need this](https://github.com/BerriAI/litellm/issues/new) |
 | Streaming | ✅ | |
--- a/docs/my-website/docs/pass_through/mistral.md
+++ b/docs/my-website/docs/pass_through/mistral.md
@ -0,0 +1,217 @@
+# Mistral
+
+Pass-through endpoints for Mistral - call provider-specific endpoint, in native format (no translation).
+
+| Feature | Supported | Notes | 
+|-------|-------|-------|
+| Cost Tracking | ❌ | Not supported |
+| Logging | ✅ | works across all integrations |
+| End-user Tracking | ❌ | [Tell us if you need this](https://github.com/BerriAI/litellm/issues/new) |
+| Streaming | ✅ | |
+
+Just replace `https://api.mistral.ai/v1` with `LITELLM_PROXY_BASE_URL/mistral` 🚀
+
+#### **Example Usage**
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/ocr' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+-d '{
+    "model": "mistral-ocr-latest",
+    "document": {
+        "type": "image_url",
+        "image_url": "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png"
+    }
+
+}'
+```
+
+Supports **ALL** Mistral Endpoints (including streaming).
+
+## Quick Start
+
+Let's call the Mistral [`/chat/completions` endpoint](https://docs.mistral.ai/api/#tag/chat/operation/chat_completion_v1_chat_completions_post)
+
+1. Add MISTRAL_API_KEY to your environment 
+
+```bash
+export MISTRAL_API_KEY="sk-1234"
+```
+
+2. Start LiteLLM Proxy 
+
+```bash
+litellm
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+3. Test it! 
+
+Let's call the Mistral `/ocr` endpoint
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/ocr' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+-d '{
+    "model": "mistral-ocr-latest",
+    "document": {
+        "type": "image_url",
+        "image_url": "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png"
+    }
+
+}'
+```
+
+
+## Examples
+
+Anything after `http://0.0.0.0:4000/mistral` is treated as a provider-specific route, and handled accordingly.
+
+Key Changes: 
+
+| **Original Endpoint**                                | **Replace With**                  |
+|------------------------------------------------------|-----------------------------------|
+| `https://api.mistral.ai/v1`          | `http://0.0.0.0:4000/mistral` (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000")      |
+| `bearer $MISTRAL_API_KEY`                                 | `bearer anything` (use `bearer LITELLM_VIRTUAL_KEY` if Virtual Keys are setup on proxy)                    |
+
+
+### **Example 1: OCR endpoint**
+
+#### LiteLLM Proxy Call 
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/ocr' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer $LITELLM_API_KEY' \
+-d '{
+    "model": "mistral-ocr-latest",
+    "document": {
+        "type": "image_url",
+        "image_url": "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png"
+    }
+}'
+```
+
+
+#### Direct Mistral API Call 
+
+```bash
+curl https://api.mistral.ai/v1/ocr \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer ${MISTRAL_API_KEY}" \
+  -d '{
+    "model": "mistral-ocr-latest",
+    "document": {
+        "type": "document_url",
+        "document_url": "https://arxiv.org/pdf/2201.04234"
+    },
+    "include_image_base64": true
+  }'
+```
+
+### **Example 2: Chat API**
+
+#### LiteLLM Proxy Call 
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
+-d '{
+    "messages": [
+        {
+            "role": "user",
+            "content": "I am going to Paris, what should I see?"
+        }
+    ],
+    "max_tokens": 2048,
+    "temperature": 0.8,
+    "top_p": 0.1,
+    "model": "mistral-large-latest",
+}'
+```
+
+#### Direct Mistral API Call 
+
+```bash
+curl -L -X POST 'https://api.mistral.ai/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-d '{
+    "messages": [
+        {
+            "role": "user",
+            "content": "I am going to Paris, what should I see?"
+        }
+    ],
+    "max_tokens": 2048,
+    "temperature": 0.8,
+    "top_p": 0.1,
+    "model": "mistral-large-latest",
+}'
+```
+
+
+## Advanced - Use with Virtual Keys 
+
+Pre-requisites
+- [Setup proxy with DB](../proxy/virtual_keys.md#setup)
+
+Use this, to avoid giving developers the raw Mistral API key, but still letting them use Mistral endpoints.
+
+### Usage
+
+1. Setup environment
+
+```bash
+export DATABASE_URL=""
+export LITELLM_MASTER_KEY=""
+export MISTRAL_API_BASE=""
+```
+
+```bash
+litellm
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+2. Generate virtual key 
+
+```bash
+curl -X POST 'http://0.0.0.0:4000/key/generate' \
+-H 'Authorization: Bearer sk-1234' \
+-H 'Content-Type: application/json' \
+-d '{}'
+```
+
+Expected Response 
+
+```bash
+{
+    ...
+    "key": "sk-1234ewknldferwedojwojw"
+}
+```
+
+3. Test it! 
+
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234ewknldferwedojwojw' \
+  --data '{
+    "messages": [
+        {
+            "role": "user",
+            "content": "I am going to Paris, what should I see?"
+        }
+    ],
+    "max_tokens": 2048,
+    "temperature": 0.8,
+    "top_p": 0.1,
+    "model": "qwen2.5-7b-instruct",
+}'
+```
--- a/docs/my-website/docs/pass_through/vertex_ai.md
+++ b/docs/my-website/docs/pass_through/vertex_ai.md
@ -222,7 +222,7 @@ curl http://localhost:4000/vertex-ai/v1/projects/${PROJECT_ID}/locations/us-cent

 LiteLLM Proxy Server supports two methods of authentication to Vertex AI:

-1. Pass Vertex Credetials client side to proxy server
+1. Pass Vertex Credentials client side to proxy server

 2. Set Vertex AI credentials on proxy server

--- a/docs/my-website/docs/pass_through/vllm.md
+++ b/docs/my-website/docs/pass_through/vllm.md
@ -0,0 +1,185 @@
+# VLLM
+
+Pass-through endpoints for VLLM - call provider-specific endpoint, in native format (no translation).
+
+| Feature | Supported | Notes | 
+|-------|-------|-------|
+| Cost Tracking | ❌ | Not supported |
+| Logging | ✅ | works across all integrations |
+| End-user Tracking | ❌ | [Tell us if you need this](https://github.com/BerriAI/litellm/issues/new) |
+| Streaming | ✅ | |
+
+Just replace `https://my-vllm-server.com` with `LITELLM_PROXY_BASE_URL/vllm` 🚀
+
+#### **Example Usage**
+
+```bash
+curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+```
+
+Supports **ALL** VLLM Endpoints (including streaming).
+
+## Quick Start
+
+Let's call the VLLM [`/metrics` endpoint](https://vllm.readthedocs.io/en/latest/api_reference/api_reference.html)
+
+1. Add HOSTED VLLM API BASE to your environment 
+
+```bash
+export HOSTED_VLLM_API_BASE="https://my-vllm-server.com"
+```
+
+2. Start LiteLLM Proxy 
+
+```bash
+litellm
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+3. Test it! 
+
+Let's call the VLLM `/metrics` endpoint
+
+```bash
+curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234' \
+```
+
+
+## Examples
+
+Anything after `http://0.0.0.0:4000/vllm` is treated as a provider-specific route, and handled accordingly.
+
+Key Changes: 
+
+| **Original Endpoint**                                | **Replace With**                  |
+|------------------------------------------------------|-----------------------------------|
+| `https://my-vllm-server.com`          | `http://0.0.0.0:4000/vllm` (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000")      |
+| `bearer $VLLM_API_KEY`                                 | `bearer anything` (use `bearer LITELLM_VIRTUAL_KEY` if Virtual Keys are setup on proxy)                    |
+
+
+### **Example 1: Metrics endpoint**
+
+#### LiteLLM Proxy Call 
+
+```bash
+curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
+```
+
+
+#### Direct VLLM API Call 
+
+```bash
+curl -L -X GET 'https://my-vllm-server.com/metrics' \
+-H 'Content-Type: application/json' \
+```
+
+### **Example 2: Chat API**
+
+#### LiteLLM Proxy Call 
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
+-d '{
+    "messages": [
+        {
+            "role": "user",
+            "content": "I am going to Paris, what should I see?"
+        }
+    ],
+    "max_tokens": 2048,
+    "temperature": 0.8,
+    "top_p": 0.1,
+    "model": "qwen2.5-7b-instruct",
+}'
+```
+
+#### Direct VLLM API Call 
+
+```bash
+curl -L -X POST 'https://my-vllm-server.com/chat/completions' \
+-H 'Content-Type: application/json' \
+-d '{
+    "messages": [
+        {
+            "role": "user",
+            "content": "I am going to Paris, what should I see?"
+        }
+    ],
+    "max_tokens": 2048,
+    "temperature": 0.8,
+    "top_p": 0.1,
+    "model": "qwen2.5-7b-instruct",
+}'
+```
+
+
+## Advanced - Use with Virtual Keys 
+
+Pre-requisites
+- [Setup proxy with DB](../proxy/virtual_keys.md#setup)
+
+Use this, to avoid giving developers the raw Cohere API key, but still letting them use Cohere endpoints.
+
+### Usage
+
+1. Setup environment
+
+```bash
+export DATABASE_URL=""
+export LITELLM_MASTER_KEY=""
+export HOSTED_VLLM_API_BASE=""
+```
+
+```bash
+litellm
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+2. Generate virtual key 
+
+```bash
+curl -X POST 'http://0.0.0.0:4000/key/generate' \
+-H 'Authorization: Bearer sk-1234' \
+-H 'Content-Type: application/json' \
+-d '{}'
+```
+
+Expected Response 
+
+```bash
+{
+    ...
+    "key": "sk-1234ewknldferwedojwojw"
+}
+```
+
+3. Test it! 
+
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
+-H 'Content-Type: application/json' \
+-H 'Authorization: Bearer sk-1234ewknldferwedojwojw' \
+  --data '{
+    "messages": [
+        {
+            "role": "user",
+            "content": "I am going to Paris, what should I see?"
+        }
+    ],
+    "max_tokens": 2048,
+    "temperature": 0.8,
+    "top_p": 0.1,
+    "model": "qwen2.5-7b-instruct",
+}'
+```
--- a/docs/my-website/docs/providers/anthropic.md
+++ b/docs/my-website/docs/providers/anthropic.md
@ -1095,7 +1095,7 @@ response = completion(
 print(response.choices[0])
 ```
 </TabItem>
-<TabItem value="proxy" lable="PROXY">
+<TabItem value="proxy" label="PROXY">

 1. Add model to config 

--- a/docs/my-website/docs/providers/azure.md
+++ b/docs/my-website/docs/providers/azure.md
@ -483,7 +483,7 @@ response.stream_to_file(speech_file_path)
 This is a walkthrough on how to use Azure Active Directory Tokens - Microsoft Entra ID to make `litellm.completion()` calls 

 Step 1 - Download Azure CLI 
-Installation instructons: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli
+Installation instructions: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli
 ```shell
 brew update && brew install azure-cli
 ```
@ -1011,8 +1011,7 @@ Expected Response:
 | Supported Operations | `/v1/responses`|
 | Azure OpenAI Responses API | [Azure OpenAI Responses API ↗](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure) |
 | Cost Tracking, Logging Support | ✅ LiteLLM will log, track cost for Responses API Requests |
-
-
+| Supported OpenAI Params | ✅ All OpenAI params are supported, [See here](https://github.com/BerriAI/litellm/blob/0717369ae6969882d149933da48eeb8ab0e691bd/litellm/llms/openai/responses/transformation.py#L23) |

 ## Usage

--- a/docs/my-website/docs/providers/gemini.md
+++ b/docs/my-website/docs/providers/gemini.md
@ -39,14 +39,164 @@ response = completion(
 - temperature
 - top_p
 - max_tokens
+- max_completion_tokens
 - stream
 - tools
 - tool_choice
+- functions
 - response_format
 - n
 - stop
+- logprobs
+- frequency_penalty
+- modalities
+- reasoning_content
+
+**Anthropic Params**
+- thinking (used to set max budget tokens across anthropic/gemini models)
+
+[**See Updated List**](https://github.com/BerriAI/litellm/blob/main/litellm/llms/gemini/chat/transformation.py#L70)
+
+
+
+## Usage - Thinking / `reasoning_content`
+
+LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362)
+
+**Mapping**
+
+| reasoning_effort | thinking |
+| ---------------- | -------- |
+| "low"            | "budget_tokens": 1024 |
+| "medium"         | "budget_tokens": 2048 |
+| "high"           | "budget_tokens": 4096 |
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import completion
+
+resp = completion(
+    model="gemini/gemini-2.5-flash-preview-04-17",
+    messages=[{"role": "user", "content": "What is the capital of France?"}],
+    reasoning_effort="low",
+)
+
+```
+
+</TabItem>
+
+<TabItem value="proxy" label="PROXY">
+
+1. Setup config.yaml
+
+```yaml
+- model_name: gemini-2.5-flash
+  litellm_params:
+    model: gemini/gemini-2.5-flash-preview-04-17
+    api_key: os.environ/GEMINI_API_KEY
+```
+
+2. Start proxy
+
+```bash
+litellm --config /path/to/config.yaml
+```
+
+3. Test it! 
+
+```bash
+curl http://0.0.0.0:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
+  -d '{
+    "model": "gemini-2.5-flash",
+    "messages": [{"role": "user", "content": "What is the capital of France?"}],
+    "reasoning_effort": "low"
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+
+**Expected Response**
+
+```python
+ModelResponse(
+    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
+    created=1740470510,
+    model='claude-3-7-sonnet-20250219',
+    object='chat.completion',
+    system_fingerprint=None,
+    choices=[
+        Choices(
+            finish_reason='stop',
+            index=0,
+            message=Message(
+                content="The capital of France is Paris.",
+                role='assistant',
+                tool_calls=None,
+                function_call=None,
+                reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
+            ),
+        )
+    ],
+    usage=Usage(
+        completion_tokens=68,
+        prompt_tokens=42,
+        total_tokens=110,
+        completion_tokens_details=None,
+        prompt_tokens_details=PromptTokensDetailsWrapper(
+            audio_tokens=None,
+            cached_tokens=0,
+            text_tokens=None,
+            image_tokens=None
+        ),
+        cache_creation_input_tokens=0,
+        cache_read_input_tokens=0
+    )
+)
+```
+
+### Pass `thinking` to Gemini models
+
+You can also pass the `thinking` parameter to Gemini models.
+
+This is translated to Gemini's [`thinkingConfig` parameter](https://ai.google.dev/gemini-api/docs/thinking#set-budget).
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+response = litellm.completion(
+  model="gemini/gemini-2.5-flash-preview-04-17",
+  messages=[{"role": "user", "content": "What is the capital of France?"}],
+  thinking={"type": "enabled", "budget_tokens": 1024},
+)
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+```bash
+curl http://0.0.0.0:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $LITELLM_KEY" \
+  -d '{
+    "model": "gemini/gemini-2.5-flash-preview-04-17",
+    "messages": [{"role": "user", "content": "What is the capital of France?"}],
+    "thinking": {"type": "enabled", "budget_tokens": 1024}
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+
+

-[**See Updated List**](https://github.com/BerriAI/litellm/blob/1c747f3ad372399c5b95cc5696b06a5fbe53186b/litellm/llms/vertex_httpx.py#L122)

 ## Passing Gemini Specific Params
 ### Response schema 
@ -505,7 +655,7 @@ import os

 os.environ["GEMINI_API_KEY"] = ".."

-tools = [{"googleSearchRetrieval": {}}] # 👈 ADD GOOGLE SEARCH
+tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

 response = completion(
    model="gemini/gemini-2.0-flash",
@ -541,7 +691,7 @@ curl -X POST 'http://0.0.0.0:4000/chat/completions' \
 -d '{
  "model": "gemini-2.0-flash",
  "messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
-  "tools": [{"googleSearchRetrieval": {}}]
+  "tools": [{"googleSearch": {}}]
 }
 '
 ```
--- a/docs/my-website/docs/providers/infinity.md
+++ b/docs/my-website/docs/providers/infinity.md
@ -4,17 +4,16 @@ import TabItem from '@theme/TabItem';
 # Infinity

 | Property                  | Details                                                                                                    |
-|-------|-------|
-| Description | Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip|
+| ------------------------- | ---------------------------------------------------------------------------------------------------------- |
+| Description               | Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip |
 | Provider Route on LiteLLM | `infinity/`                                                                                                |
-| Supported Operations | `/rerank` |
+| Supported Operations      | `/rerank`, `/embeddings`                                                                                   |
 | Link to Provider Doc      | [Infinity ↗](https://github.com/michaelfeil/infinity)                                                      |

-
 ## **Usage - LiteLLM Python SDK**

 ```python
-from litellm import rerank
+from litellm import rerank, embedding
 import os

 os.environ["INFINITY_API_BASE"] = "http://localhost:8080"
@ -39,8 +38,8 @@ model_list:
  - model_name: custom-infinity-rerank
    litellm_params:
      model: infinity/rerank
-      api_key: os.environ/INFINITY_API_KEY
      api_base: https://localhost:8080
+      api_key: os.environ/INFINITY_API_KEY
 ```

 Start litellm
@ -51,7 +50,9 @@ litellm --config /path/to/config.yaml
 # RUNNING on http://0.0.0.0:4000
 ```

-Test request
+## Test request:
+
+### Rerank

 ```bash
 curl http://0.0.0.0:4000/rerank \
@ -70,11 +71,10 @@ curl http://0.0.0.0:4000/rerank \
  }'
 ```

-
-## Supported Cohere Rerank API Params
+#### Supported Cohere Rerank API Params

 | Param              | Type        | Description                                     |
-|-------|-------|-------|
+| ------------------ | ----------- | ----------------------------------------------- |
 | `query`            | `str`       | The query to rerank the documents against       |
 | `documents`        | `list[str]` | The documents to rerank                         |
 | `top_n`            | `int`       | The number of documents to return               |
@ -138,6 +138,7 @@ response = rerank(
    raw_scores=True, # 👈 PROVIDER-SPECIFIC PARAM
 )
 ```
+
 </TabItem>

 <TabItem value="proxy" label="PROXY">
@ -179,6 +180,121 @@ curl http://0.0.0.0:4000/rerank \
    "raw_scores": True # 👈 PROVIDER-SPECIFIC PARAM
  }'
 ```
+
 </TabItem>

 </Tabs>
+
+## Embeddings
+
+LiteLLM provides an OpenAI api compatible `/embeddings` endpoint for embedding calls.
+
+**Setup**
+
+Add this to your litellm proxy config.yaml
+
+```yaml
+model_list:
+  - model_name: custom-infinity-embedding
+    litellm_params:
+      model: infinity/provider/custom-embedding-v1
+      api_base: http://localhost:8080
+      api_key: os.environ/INFINITY_API_KEY
+```
+
+### Test request:
+
+```bash
+curl http://0.0.0.0:4000/embeddings \
+  -H "Authorization: Bearer sk-1234" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "custom-infinity-embedding",
+    "input": ["hello"]
+  }'
+```
+
+#### Supported Embedding API Params
+
+| Param             | Type        | Description                                                 |
+| ----------------- | ----------- | ----------------------------------------------------------- |
+| `model`           | `str`       | The embedding model to use                                  |
+| `input`           | `list[str]` | The text inputs to generate embeddings for                  |
+| `encoding_format` | `str`       | The format to return embeddings in (e.g. "float", "base64") |
+| `modality`        | `str`       | The type of input (e.g. "text", "image", "audio")           |
+
+### Usage - Basic Examples
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import embedding
+import os
+
+os.environ["INFINITY_API_BASE"] = "http://localhost:8080"
+
+response = embedding(
+    model="infinity/bge-small",
+    input=["good morning from litellm"]
+)
+
+print(response.data[0]['embedding'])
+```
+
+</TabItem>
+
+<TabItem value="proxy" label="PROXY">
+
+```bash
+curl http://0.0.0.0:4000/embeddings \
+  -H "Authorization: Bearer sk-1234" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "custom-infinity-embedding",
+    "input": ["hello"]
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+### Usage - OpenAI Client
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+  api_key="<LITELLM_MASTER_KEY>",
+  base_url="<LITELLM_URL>"
+)
+
+response = client.embeddings.create(
+  model="bge-small",
+  input=["The food was delicious and the waiter..."],
+  encoding_format="float"
+)
+
+print(response.data[0].embedding)
+```
+
+</TabItem>
+
+<TabItem value="proxy" label="PROXY">
+
+```bash
+curl http://0.0.0.0:4000/embeddings \
+  -H "Authorization: Bearer sk-1234" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "bge-small",
+    "input": ["The food was delicious and the waiter..."],
+    "encoding_format": "float"
+  }'
+```
+
+</TabItem>
+</Tabs>
--- a/docs/my-website/docs/providers/openai.md
+++ b/docs/my-website/docs/providers/openai.md
@ -163,6 +163,12 @@ os.environ["OPENAI_API_BASE"] = "openaiai-api-base"     # OPTIONAL

 | Model Name            | Function Call                                                   |
 |-----------------------|-----------------------------------------------------------------|
+| gpt-4.1 | `response = completion(model="gpt-4.1", messages=messages)` |
+| gpt-4.1-mini | `response = completion(model="gpt-4.1-mini", messages=messages)` |
+| gpt-4.1-nano | `response = completion(model="gpt-4.1-nano", messages=messages)` |
+| o4-mini | `response = completion(model="o4-mini", messages=messages)` |
+| o3-mini | `response = completion(model="o3-mini", messages=messages)` |
+| o3 | `response = completion(model="o3", messages=messages)` |
 | o1-mini | `response = completion(model="o1-mini", messages=messages)` |
 | o1-preview | `response = completion(model="o1-preview", messages=messages)` |
 | gpt-4o-mini  | `response = completion(model="gpt-4o-mini", messages=messages)` |
--- a/docs/my-website/docs/providers/vertex.md
+++ b/docs/my-website/docs/providers/vertex.md
@ -364,7 +364,7 @@ from litellm import completion
 ## SETUP ENVIRONMENT
 # !gcloud auth application-default login - run this to add vertex credentials to your env

-tools = [{"googleSearchRetrieval": {}}] # 👈 ADD GOOGLE SEARCH
+tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

 resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
@ -391,7 +391,7 @@ client = OpenAI(
 response = client.chat.completions.create(
    model="gemini-pro",
    messages=[{"role": "user", "content": "Who won the world cup?"}],
-    tools=[{"googleSearchRetrieval": {}}],
+    tools=[{"googleSearch": {}}],
 )

 print(response)
@ -410,7 +410,7 @@ curl http://localhost:4000/v1/chat/completions \
    ],
   "tools": [
        {
-            "googleSearchRetrieval": {} 
+            "googleSearch": {} 
        }
    ]
  }'
@ -529,7 +529,7 @@ from litellm import completion

 # !gcloud auth application-default login - run this to add vertex credentials to your env

-tools = [{"googleSearchRetrieval": {"disable_attributon": False}}] # 👈 ADD GOOGLE SEARCH
+tools = [{"googleSearch": {"disable_attributon": False}}] # 👈 ADD GOOGLE SEARCH

 resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
@ -542,9 +542,157 @@ print(resp)
 ```


+### **Thinking / `reasoning_content`**
+
+LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362)
+
+**Mapping**
+
+| reasoning_effort | thinking |
+| ---------------- | -------- |
+| "low"            | "budget_tokens": 1024 |
+| "medium"         | "budget_tokens": 2048 |
+| "high"           | "budget_tokens": 4096 |
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import completion
+
+# !gcloud auth application-default login - run this to add vertex credentials to your env
+
+resp = completion(
+    model="vertex_ai/gemini-2.5-flash-preview-04-17",
+    messages=[{"role": "user", "content": "What is the capital of France?"}],
+    reasoning_effort="low",
+    vertex_project="project-id",
+    vertex_location="us-central1"
+)
+
+```
+
+</TabItem>
+
+<TabItem value="proxy" label="PROXY">
+
+1. Setup config.yaml
+
+```yaml
+- model_name: gemini-2.5-flash
+  litellm_params:
+    model: vertex_ai/gemini-2.5-flash-preview-04-17
+    vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
+    vertex_project: "project-id"
+    vertex_location: "us-central1"
+```
+
+2. Start proxy
+
+```bash
+litellm --config /path/to/config.yaml
+```
+
+3. Test it! 
+
+```bash
+curl http://0.0.0.0:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
+  -d '{
+    "model": "gemini-2.5-flash",
+    "messages": [{"role": "user", "content": "What is the capital of France?"}],
+    "reasoning_effort": "low"
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+
+**Expected Response**
+
+```python
+ModelResponse(
+    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
+    created=1740470510,
+    model='claude-3-7-sonnet-20250219',
+    object='chat.completion',
+    system_fingerprint=None,
+    choices=[
+        Choices(
+            finish_reason='stop',
+            index=0,
+            message=Message(
+                content="The capital of France is Paris.",
+                role='assistant',
+                tool_calls=None,
+                function_call=None,
+                reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
+            ),
+        )
+    ],
+    usage=Usage(
+        completion_tokens=68,
+        prompt_tokens=42,
+        total_tokens=110,
+        completion_tokens_details=None,
+        prompt_tokens_details=PromptTokensDetailsWrapper(
+            audio_tokens=None,
+            cached_tokens=0,
+            text_tokens=None,
+            image_tokens=None
+        ),
+        cache_creation_input_tokens=0,
+        cache_read_input_tokens=0
+    )
+)
+```
+
+#### Pass `thinking` to Gemini models
+
+You can also pass the `thinking` parameter to Gemini models.
+
+This is translated to Gemini's [`thinkingConfig` parameter](https://ai.google.dev/gemini-api/docs/thinking#set-budget).
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import completion
+
+# !gcloud auth application-default login - run this to add vertex credentials to your env
+
+response = litellm.completion(
+  model="vertex_ai/gemini-2.5-flash-preview-04-17",
+  messages=[{"role": "user", "content": "What is the capital of France?"}],
+  thinking={"type": "enabled", "budget_tokens": 1024},
+  vertex_project="project-id",
+  vertex_location="us-central1"
+)
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+```bash
+curl http://0.0.0.0:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $LITELLM_KEY" \
+  -d '{
+    "model": "vertex_ai/gemini-2.5-flash-preview-04-17",
+    "messages": [{"role": "user", "content": "What is the capital of France?"}],
+    "thinking": {"type": "enabled", "budget_tokens": 1024}
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+
 ### **Context Caching**

-Use Vertex AI context caching is supported by calling provider api directly. (Unified Endpoint support comin soon.).
+Use Vertex AI context caching is supported by calling provider api directly. (Unified Endpoint support coming soon.).

 [**Go straight to provider**](../pass_through/vertex_ai.md#context-caching)

@ -762,7 +910,7 @@ export VERTEXAI_PROJECT="my-test-project" # ONLY use if model project is differe


 ## Specifying Safety Settings 
-In certain use-cases you may need to make calls to the models and pass [safety settigns](https://ai.google.dev/docs/safety_setting_gemini) different from the defaults. To do so, simple pass the `safety_settings` argument to `completion` or `acompletion`. For example:
+In certain use-cases you may need to make calls to the models and pass [safety settings](https://ai.google.dev/docs/safety_setting_gemini) different from the defaults. To do so, simple pass the `safety_settings` argument to `completion` or `acompletion`. For example:

 ### Set per model/request

@ -1902,7 +2050,7 @@ response = completion(
 print(response.choices[0])
 ```
 </TabItem>
-<TabItem value="proxy" lable="PROXY">
+<TabItem value="proxy" label="PROXY">

 1. Add model to config 

--- a/docs/my-website/docs/providers/vllm.md
+++ b/docs/my-website/docs/providers/vllm.md
@ -161,6 +161,120 @@ curl -L -X POST 'http://0.0.0.0:4000/embeddings' \

 Example Implementation from VLLM [here](https://github.com/vllm-project/vllm/pull/10020)

+<Tabs>
+<TabItem value="files_message" label="(Unified) Files Message">
+
+Use this to send a video url to VLLM + Gemini in the same format, using OpenAI's `files` message type.
+
+There are two ways to send a video url to VLLM:
+
+1. Pass the video url directly
+
+```
+{"type": "file", "file": {"file_id": video_url}},
+```
+
+2. Pass the video data as base64
+
+```
+{"type": "file", "file": {"file_data": f"data:video/mp4;base64,{video_data_base64}"}}
+```
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import completion
+
+messages=[
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "Summarize the following video"
+            },
+            {
+                "type": "file",
+                "file": {
+                    "file_id": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
+                }
+            }
+        ]
+    }
+]
+
+# call vllm 
+os.environ["HOSTED_VLLM_API_BASE"] = "https://hosted-vllm-api.co"
+os.environ["HOSTED_VLLM_API_KEY"] = "" # [optional], if your VLLM server requires an API key
+response = completion(
+    model="hosted_vllm/qwen", # pass the vllm model name
+    messages=messages,
+)
+
+# call gemini 
+os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
+response = completion(
+    model="gemini/gemini-1.5-flash", # pass the gemini model name
+    messages=messages,
+)
+
+print(response)
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+1. Setup config.yaml
+
+```yaml
+model_list:
+    - model_name: my-model
+      litellm_params:
+        model: hosted_vllm/qwen  # add hosted_vllm/ prefix to route as OpenAI provider
+        api_base: https://hosted-vllm-api.co      # add api base for OpenAI compatible provider
+    - model_name: my-gemini-model
+      litellm_params:
+        model: gemini/gemini-1.5-flash  # add gemini/ prefix to route as Google AI Studio provider
+        api_key: os.environ/GEMINI_API_KEY
+```
+
+2. Start the proxy 
+
+```bash
+$ litellm --config /path/to/config.yaml
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+3. Test it! 
+
+```bash
+curl -X POST http://0.0.0.0:4000/chat/completions \
+-H "Authorization: Bearer sk-1234" \
+-H "Content-Type: application/json" \
+-d '{
+    "model": "my-model",
+    "messages": [
+        {"role": "user", "content": 
+            [
+                {"type": "text", "text": "Summarize the following video"},
+                {"type": "file", "file": {"file_id": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}}
+            ]
+        }
+    ]
+}'
+```
+
+</TabItem>
+</Tabs>
+
+
+</TabItem>
+<TabItem value="video_url" label="(VLLM-specific) Video Message">
+
+Use this to send a video url to VLLM in it's native message format (`video_url`).
+
 There are two ways to send a video url to VLLM:

 1. Pass the video url directly
@ -249,6 +363,10 @@ curl -X POST http://0.0.0.0:4000/chat/completions \
 </Tabs>


+</TabItem>
+</Tabs>
+
+
 ## (Deprecated) for `vllm pip package` 
 ### Using - `litellm.completion`

--- a/docs/my-website/docs/proxy/admin_ui_sso.md
+++ b/docs/my-website/docs/proxy/admin_ui_sso.md
@ -243,12 +243,12 @@ We allow you to pass a local image or a an http/https url of your image

 Set `UI_LOGO_PATH` on your env. We recommend using a hosted image, it's a lot easier to set up and configure / debug

-Exaple setting Hosted image
+Example setting Hosted image
 ```shell
 UI_LOGO_PATH="https://litellm-logo-aws-marketplace.s3.us-west-2.amazonaws.com/berriai-logo-github.png"
 ```

-Exaple setting a local image (on your container)
+Example setting a local image (on your container)
 ```shell
 UI_LOGO_PATH="ui_images/logo.jpg"
 ```
--- a/docs/my-website/docs/proxy/alerting.md
+++ b/docs/my-website/docs/proxy/alerting.md
@ -213,7 +213,7 @@ model_list:
 general_settings: 
  master_key: sk-1234
  alerting: ["slack"]
-  alerting_threshold: 0.0001 # (Seconds) set an artifically low threshold for testing alerting
+  alerting_threshold: 0.0001 # (Seconds) set an artificially low threshold for testing alerting
  alert_to_webhook_url: {
    "llm_exceptions": "https://hooks.slack.com/services/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",
    "llm_too_slow": "https://hooks.slack.com/services/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",
@ -247,7 +247,7 @@ model_list:
 general_settings: 
  master_key: sk-1234
  alerting: ["slack"]
-  alerting_threshold: 0.0001 # (Seconds) set an artifically low threshold for testing alerting
+  alerting_threshold: 0.0001 # (Seconds) set an artificially low threshold for testing alerting
  alert_to_webhook_url: {
    "llm_exceptions": ["os.environ/SLACK_WEBHOOK_URL", "os.environ/SLACK_WEBHOOK_URL_2"],
    "llm_too_slow": ["https://webhook.site/7843a980-a494-4967-80fb-d502dbc16886", "https://webhook.site/28cfb179-f4fb-4408-8129-729ff55cf213"],
@ -425,7 +425,7 @@ curl -X GET --location 'http://0.0.0.0:4000/health/services?service=webhook' \
 - `projected_exceeded_date` *str or null*: The date when the budget is projected to be exceeded, returned when 'soft_budget' is set for key (optional).
 - `projected_spend` *float or null*: The projected spend amount, returned when 'soft_budget' is set for key (optional).
 - `event` *Literal["budget_crossed", "threshold_crossed", "projected_limit_exceeded"]*: The type of event that triggered the webhook. Possible values are:
-    * "spend_tracked": Emitted whenver spend is tracked for a customer id. 
+    * "spend_tracked": Emitted whenever spend is tracked for a customer id. 
    * "budget_crossed": Indicates that the spend has exceeded the max budget.
    * "threshold_crossed": Indicates that spend has crossed a threshold (currently sent when 85% and 95% of budget is reached).
    * "projected_limit_exceeded": For "key" only - Indicates that the projected spend is expected to exceed the soft budget threshold.
@ -480,7 +480,7 @@ LLM-related Alerts
 | `cooldown_deployment` | Alerts when a deployment is put into cooldown | ✅ |
 | `new_model_added` | Notifications when a new model is added to litellm proxy through /model/new| ✅ |
 | `outage_alerts` | Alerts when a specific LLM deployment is facing an outage | ✅ |
-| `region_outage_alerts` | Alerts when a specfic LLM region is facing an outage. Example us-east-1 | ✅ |
+| `region_outage_alerts` | Alerts when a specific LLM region is facing an outage. Example us-east-1 | ✅ |

 Budget and Spend Alerts

--- a/docs/my-website/docs/proxy/config_settings.md
+++ b/docs/my-website/docs/proxy/config_settings.md
@ -299,6 +299,9 @@ router_settings:
 |------|-------------|
 | ACTIONS_ID_TOKEN_REQUEST_TOKEN | Token for requesting ID in GitHub Actions
 | ACTIONS_ID_TOKEN_REQUEST_URL | URL for requesting ID token in GitHub Actions
+| AGENTOPS_ENVIRONMENT | Environment for AgentOps logging integration
+| AGENTOPS_API_KEY | API Key for AgentOps logging integration
+| AGENTOPS_SERVICE_NAME | Service Name for AgentOps logging integration
 | AISPEND_ACCOUNT_ID | Account ID for AI Spend
 | AISPEND_API_KEY | API Key for AI Spend
 | ALLOWED_EMAIL_DOMAINS | List of email domains allowed for access
--- a/docs/my-website/docs/proxy/custom_pricing.md
+++ b/docs/my-website/docs/proxy/custom_pricing.md
@ -56,7 +56,7 @@ model_list:
      model: azure/<your_deployment_name>
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
-      api_version: os.envrion/AZURE_API_VERSION
+      api_version: os.environ/AZURE_API_VERSION
    model_info:
      input_cost_per_token: 0.000421 # 👈 ONLY to track cost per token
      output_cost_per_token: 0.000520 # 👈 ONLY to track cost per token
--- a/docs/my-website/docs/proxy/db_deadlocks.md
+++ b/docs/my-website/docs/proxy/db_deadlocks.md
@ -19,7 +19,7 @@ LiteLLM writes `UPDATE` and `UPSERT` queries to the DB. When using 10+ instances

 ### Stage 1. Each instance writes updates to redis

-Each instance will accumlate the spend updates for a key, user, team, etc and write the updates to a redis queue. 
+Each instance will accumulate the spend updates for a key, user, team, etc and write the updates to a redis queue. 

 <Image img={require('../../img/deadlock_fix_1.png')}  style={{ width: '900px', height: 'auto' }} />
 <p style={{textAlign: 'left', color: '#666'}}>
--- a/docs/my-website/docs/proxy/deploy.md
+++ b/docs/my-website/docs/proxy/deploy.md
@ -22,7 +22,7 @@ echo 'LITELLM_MASTER_KEY="sk-1234"' > .env

 # Add the litellm salt key - you cannot change this after adding a model
 # It is used to encrypt / decrypt your LLM API Key credentials
-# We recommned - https://1password.com/password-generator/ 
+# We recommend - https://1password.com/password-generator/ 
 # password generator to get a random hash for litellm salt key
 echo 'LITELLM_SALT_KEY="sk-1234"' >> .env

@ -125,7 +125,7 @@ CMD ["--port", "4000", "--config", "config.yaml", "--detailed_debug"]

 ### Build from litellm `pip` package

-Follow these instructons to build a docker container from the litellm pip package. If your company has a strict requirement around security / building images you can follow these steps.
+Follow these instructions to build a docker container from the litellm pip package. If your company has a strict requirement around security / building images you can follow these steps.

 Dockerfile 

@ -999,7 +999,7 @@ services:
      - "4000:4000" # Map the container port to the host, change the host port if necessary
    volumes:
      - ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
-    # You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
+    # You can change the port or number of workers as per your requirements or pass any new supported CLI argument. Make sure the port passed here matches with the container port defined above in `ports` value
    command: [ "--config", "/app/config.yaml", "--port", "4000", "--num_workers", "8" ]

 # ...rest of your docker-compose config if any
--- a/docs/my-website/docs/proxy/enterprise.md
+++ b/docs/my-website/docs/proxy/enterprise.md
@ -691,7 +691,7 @@ curl --request POST \
 <TabItem value="admin_only_routes" label="Test `admin_only_routes`">


-**Successfull Request**
+**Successful Request**

 ```shell
 curl --location 'http://0.0.0.0:4000/key/generate' \
@ -729,7 +729,7 @@ curl --location 'http://0.0.0.0:4000/key/generate' \
 <TabItem value="allowed_routes" label="Test `allowed_routes`">


-**Successfull Request**
+**Successful Request**

 ```shell
 curl http://localhost:4000/chat/completions \
--- a/docs/my-website/docs/proxy/guardrails/quick_start.md
+++ b/docs/my-website/docs/proxy/guardrails/quick_start.md
@ -164,7 +164,7 @@ curl -i http://localhost:4000/v1/chat/completions \

 **Expected response**

-Your response headers will incude `x-litellm-applied-guardrails` with the guardrail applied 
+Your response headers will include `x-litellm-applied-guardrails` with the guardrail applied 

 ```
 x-litellm-applied-guardrails: aporia-pre-guard
--- a/docs/my-website/docs/proxy/logging.md
+++ b/docs/my-website/docs/proxy/logging.md
@ -277,7 +277,7 @@ Found under `kwargs["standard_logging_object"]`. This is a standard payload, log

 ## Langfuse

-We will use the `--config` to set `litellm.success_callback = ["langfuse"]` this will log all successfull LLM calls to langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your environment
+We will use the `--config` to set `litellm.success_callback = ["langfuse"]` this will log all successful LLM calls to langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your environment

 **Step 1** Install langfuse

@ -535,11 +535,11 @@ print(response)
 Use this if you want to control which LiteLLM-specific fields are logged as tags by the LiteLLM proxy. By default LiteLLM Proxy logs no LiteLLM-specific fields

 | LiteLLM specific field    | Description                                                                             | Example Value                                  |
-|------------------------|-------------------------------------------------------|------------------------------------------------|
-| `cache_hit`            | Indicates whether a cache hit occured (True) or not (False)   | `true`, `false`                                |
-| `cache_key`            | The Cache key used for this request                | `d2b758c****`|
-| `proxy_base_url`       | The base URL for the proxy server, the value of env var `PROXY_BASE_URL` on your server                | `https://proxy.example.com`|
-| `user_api_key_alias`   | An alias for the LiteLLM Virtual Key.| `prod-app1`        |
+|---------------------------|-----------------------------------------------------------------------------------------|------------------------------------------------|
+| `cache_hit`               | Indicates whether a cache hit occurred (True) or not (False)                            | `true`, `false`                                |
+| `cache_key`               | The Cache key used for this request                                                     | `d2b758c****`                                  |
+| `proxy_base_url`          | The base URL for the proxy server, the value of env var `PROXY_BASE_URL` on your server | `https://proxy.example.com`                    |
+| `user_api_key_alias`      | An alias for the LiteLLM Virtual Key.                                                   | `prod-app1`                                    |
 | `user_api_key_user_id`    | The unique ID associated with a user's API key.                                         | `user_123`, `user_456`                         |
 | `user_api_key_user_email` | The email associated with a user's API key.                                             | `user@example.com`, `admin@example.com`        |
 | `user_api_key_team_alias` | An alias for a team associated with an API key.                                         | `team_alpha`, `dev_team`                       |
@ -1190,7 +1190,7 @@ We will use the `--config` to set

 - `litellm.success_callback = ["s3"]` 

-This will log all successfull LLM calls to s3 Bucket
+This will log all successful LLM calls to s3 Bucket

 **Step 1** Set AWS Credentials in .env

@ -1279,7 +1279,7 @@ Log LLM Logs to [Azure Data Lake Storage](https://learn.microsoft.com/en-us/azur

 | Property | Details |
 |----------|---------|
-| Description | Log LLM Input/Output to Azure Blob Storag (Bucket) |
+| Description | Log LLM Input/Output to Azure Blob Storage (Bucket) |
 | Azure Docs on Data Lake Storage | [Azure Data Lake Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction) |


@ -1360,7 +1360,7 @@ LiteLLM Supports logging to the following Datdog Integrations:
 <Tabs>
 <TabItem value="datadog" label="Datadog Logs">

-We will use the `--config` to set `litellm.callbacks = ["datadog"]` this will log all successfull LLM calls to DataDog
+We will use the `--config` to set `litellm.callbacks = ["datadog"]` this will log all successful LLM calls to DataDog

 **Step 1**: Create a `config.yaml` file and set `litellm_settings`: `success_callback`

@ -1636,7 +1636,7 @@ class MyCustomHandler(CustomLogger):
            litellm_params = kwargs.get("litellm_params", {})
            metadata = litellm_params.get("metadata", {})   # headers passed to LiteLLM proxy, can be found here

-            # Acess Exceptions & Traceback
+            # Access Exceptions & Traceback
            exception_event = kwargs.get("exception", None)
            traceback_event = kwargs.get("traceback_exception", None)

@ -2205,7 +2205,7 @@ We will use the `--config` to set
 - `litellm.success_callback = ["dynamodb"]` 
 - `litellm.dynamodb_table_name = "your-table-name"`

-This will log all successfull LLM calls to DynamoDB
+This will log all successful LLM calls to DynamoDB

 **Step 1** Set AWS Credentials in .env

@ -2370,7 +2370,7 @@ litellm --test

 [Athina](https://athina.ai/) allows you to log LLM Input/Output for monitoring, analytics, and observability.

-We will use the `--config` to set `litellm.success_callback = ["athina"]` this will log all successfull LLM calls to athina
+We will use the `--config` to set `litellm.success_callback = ["athina"]` this will log all successful LLM calls to athina

 **Step 1** Set Athina API key

--- a/docs/my-website/docs/proxy/model_discovery.md
+++ b/docs/my-website/docs/proxy/model_discovery.md
@ -0,0 +1,108 @@
+# Model Discovery
+
+Use this to give users an accurate list of models available behind provider endpoint, when calling `/v1/models` for wildcard models.
+
+## Supported Models
+
+- Fireworks AI
+- OpenAI
+- Gemini
+- LiteLLM Proxy
+- Topaz
+- Anthropic
+- XAI
+- VLLM
+- Vertex AI
+
+### Usage
+
+**1. Setup config.yaml**
+
+```yaml
+model_list:
+    - model_name: xai/*
+      litellm_params:
+        model: xai/*
+        api_key: os.environ/XAI_API_KEY
+
+litellm_settings:
+    check_provider_endpoint: true # 👈 Enable checking provider endpoint for wildcard models
+```
+
+**2. Start proxy**
+
+```bash
+litellm --config /path/to/config.yaml
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+**3. Call `/v1/models`**
+
+```bash
+curl -X GET "http://localhost:4000/v1/models" -H "Authorization: Bearer $LITELLM_KEY"
+```
+
+Expected response
+
+```json
+{
+    "data": [
+        {
+            "id": "xai/grok-2-1212",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-2-vision-1212",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-3-beta",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-3-fast-beta",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-3-mini-beta",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-3-mini-fast-beta",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-beta",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-vision-beta",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        },
+        {
+            "id": "xai/grok-2-image-1212",
+            "object": "model",
+            "created": 1677610602,
+            "owned_by": "openai"
+        }
+    ],
+    "object": "list"
+}
+```
--- a/docs/my-website/docs/proxy/prod.md
+++ b/docs/my-website/docs/proxy/prod.md
@ -61,7 +61,7 @@ CMD ["--port", "4000", "--config", "./proxy_server_config.yaml"]

 ## 3. Use Redis 'port','host', 'password'. NOT 'redis_url'

-If you decide to use Redis, DO NOT use 'redis_url'. We recommend usig redis port, host, and password params. 
+If you decide to use Redis, DO NOT use 'redis_url'. We recommend using redis port, host, and password params. 

 `redis_url`is 80 RPS slower

@ -169,7 +169,7 @@ If you plan on using the DB, set a salt key for encrypting/decrypting variables

 Do not change this after adding a model. It is used to encrypt / decrypt your LLM API Key credentials

-We recommned - https://1password.com/password-generator/ password generator to get a random hash for litellm salt key.
+We recommend - https://1password.com/password-generator/ password generator to get a random hash for litellm salt key.

 ```bash
 export LITELLM_SALT_KEY="sk-1234"
--- a/docs/my-website/docs/proxy/temporary_budget_increase.md
+++ b/docs/my-website/docs/proxy/temporary_budget_increase.md
@ -3,7 +3,7 @@
 Set temporary budget increase for a LiteLLM Virtual Key. Use this if you get asked to increase the budget for a key temporarily.


-| Heirarchy | Supported | 
+| Hierarchy | Supported | 
 |-----------|-----------|
 | LiteLLM Virtual Key | ✅ |
 | User | ❌ |
--- a/docs/my-website/docs/proxy/ui_credentials.md
+++ b/docs/my-website/docs/proxy/ui_credentials.md
@ -4,7 +4,7 @@ import TabItem from '@theme/TabItem';

 # Adding LLM Credentials

-You can add LLM provider credentials on the UI. Once you add credentials you can re-use them when adding new models
+You can add LLM provider credentials on the UI. Once you add credentials you can reuse them when adding new models

 ## Add a credential + model

--- a/docs/my-website/docs/proxy/virtual_keys.md
+++ b/docs/my-website/docs/proxy/virtual_keys.md
@ -23,7 +23,7 @@ Requirements:
  - ** Set on config.yaml** set your master key under `general_settings:master_key`, example below
  - ** Set env variable** set `LITELLM_MASTER_KEY`

-(the proxy Dockerfile checks if the `DATABASE_URL` is set and then intializes the DB connection)
+(the proxy Dockerfile checks if the `DATABASE_URL` is set and then initializes the DB connection)

 ```shell
 export DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname>
@ -333,7 +333,7 @@ curl http://localhost:4000/v1/chat/completions \

 **Expected Response**

-Expect to see a successfull response from the litellm proxy since the key passed in `X-Litellm-Key` is valid
+Expect to see a successful response from the litellm proxy since the key passed in `X-Litellm-Key` is valid
 ```shell
 {"id":"chatcmpl-f9b2b79a7c30477ab93cd0e717d1773e","choices":[{"finish_reason":"stop","index":0,"message":{"content":"\n\nHello there, how may I assist you today?","role":"assistant","tool_calls":null,"function_call":null}}],"created":1677652288,"model":"gpt-3.5-turbo-0125","object":"chat.completion","system_fingerprint":"fp_44709d6fcb","usage":{"completion_tokens":12,"prompt_tokens":9,"total_tokens":21}
 ```
--- a/docs/my-website/docs/reasoning_content.md
+++ b/docs/my-website/docs/reasoning_content.md
@ -16,6 +16,8 @@ Supported Providers:
 - Vertex AI (Anthropic) (`vertexai/`)
 - OpenRouter (`openrouter/`)
 - XAI (`xai/`)
+- Google AI Studio (`google/`)
+- Vertex AI (`vertex_ai/`)

 LiteLLM will standardize the `reasoning_content` in the response and `thinking_blocks` in the assistant message.

@ -23,7 +25,7 @@ LiteLLM will standardize the `reasoning_content` in the response and `thinking_b
 "message": {
    ...
    "reasoning_content": "The capital of France is Paris.",
-    "thinking_blocks": [
+    "thinking_blocks": [ # only returned for Anthropic models
        {
            "type": "thinking",
            "thinking": "The capital of France is Paris.",
--- a/docs/my-website/docs/response_api.md
+++ b/docs/my-website/docs/response_api.md
@ -14,22 +14,22 @@ LiteLLM provides a BETA endpoint in the spec of [OpenAI's `/responses` API](http
 | Fallbacks | ✅ | Works between supported models |
 | Loadbalancing | ✅ | Works between supported models |
 | Supported LiteLLM Versions | 1.63.8+ | |
-| Supported LLM providers | `openai` | |
+| Supported LLM providers | **All LiteLLM supported providers** | `openai`, `anthropic`, `bedrock`, `vertex_ai`, `gemini`, `azure`, `azure_ai` etc. |

 ## Usage

-## Create a model response
+### LiteLLM Python SDK

 <Tabs>
-<TabItem value="litellm-sdk" label="LiteLLM SDK">
+<TabItem value="openai" label="OpenAI">

 #### Non-streaming
-```python showLineNumbers
+```python showLineNumbers title="OpenAI Non-streaming Response"
 import litellm

 # Non-streaming response
 response = litellm.responses(
-    model="o1-pro",
+    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
 )
@ -38,12 +38,12 @@ print(response)
 ```

 #### Streaming
-```python showLineNumbers
+```python showLineNumbers title="OpenAI Streaming Response"
 import litellm

 # Streaming response
 response = litellm.responses(
-    model="o1-pro",
+    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
 )
@ -53,58 +53,169 @@ for event in response:
 ```

 </TabItem>
-<TabItem value="proxy" label="OpenAI SDK with LiteLLM Proxy">

-First, add this to your litellm proxy config.yaml:
-```yaml showLineNumbers
-model_list:
-  - model_name: o1-pro
-    litellm_params:
-      model: openai/o1-pro
-      api_key: os.environ/OPENAI_API_KEY
-```
-
-Start your LiteLLM proxy:
-```bash
-litellm --config /path/to/config.yaml
-
-# RUNNING on http://0.0.0.0:4000
-```
-
-Then use the OpenAI SDK pointed to your proxy:
+<TabItem value="anthropic" label="Anthropic">

 #### Non-streaming
-```python showLineNumbers
-from openai import OpenAI
+```python showLineNumbers title="Anthropic Non-streaming Response"
+import litellm
+import os

-# Initialize client with your proxy URL
-client = OpenAI(
-    base_url="http://localhost:4000",  # Your proxy URL
-    api_key="your-api-key"             # Your proxy API key
-)
+# Set API key
+os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"

 # Non-streaming response
-response = client.responses.create(
-    model="o1-pro",
-    input="Tell me a three sentence bedtime story about a unicorn."
+response = litellm.responses(
+    model="anthropic/claude-3-5-sonnet-20240620",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    max_output_tokens=100
 )

 print(response)
 ```

 #### Streaming
-```python showLineNumbers
-from openai import OpenAI
+```python showLineNumbers title="Anthropic Streaming Response"
+import litellm
+import os

-# Initialize client with your proxy URL
-client = OpenAI(
-    base_url="http://localhost:4000",  # Your proxy URL
-    api_key="your-api-key"             # Your proxy API key
-)
+# Set API key
+os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"

 # Streaming response
-response = client.responses.create(
-    model="o1-pro",
+response = litellm.responses(
+    model="anthropic/claude-3-5-sonnet-20240620",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+
+<TabItem value="vertex" label="Vertex AI">
+
+#### Non-streaming
+```python showLineNumbers title="Vertex AI Non-streaming Response"
+import litellm
+import os
+
+# Set credentials - Vertex AI uses application default credentials
+# Run 'gcloud auth application-default login' to authenticate
+os.environ["VERTEXAI_PROJECT"] = "your-gcp-project-id"
+os.environ["VERTEXAI_LOCATION"] = "us-central1"
+
+# Non-streaming response
+response = litellm.responses(
+    model="vertex_ai/gemini-1.5-pro",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    max_output_tokens=100
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="Vertex AI Streaming Response"
+import litellm
+import os
+
+# Set credentials - Vertex AI uses application default credentials
+# Run 'gcloud auth application-default login' to authenticate
+os.environ["VERTEXAI_PROJECT"] = "your-gcp-project-id"
+os.environ["VERTEXAI_LOCATION"] = "us-central1"
+
+# Streaming response
+response = litellm.responses(
+    model="vertex_ai/gemini-1.5-pro",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+
+<TabItem value="bedrock" label="AWS Bedrock">
+
+#### Non-streaming
+```python showLineNumbers title="AWS Bedrock Non-streaming Response"
+import litellm
+import os
+
+# Set AWS credentials
+os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key-id"
+os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-access-key"
+os.environ["AWS_REGION_NAME"] = "us-west-2"  # or your AWS region
+
+# Non-streaming response
+response = litellm.responses(
+    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    max_output_tokens=100
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="AWS Bedrock Streaming Response"
+import litellm
+import os
+
+# Set AWS credentials
+os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key-id"
+os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-access-key"
+os.environ["AWS_REGION_NAME"] = "us-west-2"  # or your AWS region
+
+# Streaming response
+response = litellm.responses(
+    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+
+<TabItem value="gemini" label="Google AI Studio">
+
+#### Non-streaming
+```python showLineNumbers title="Google AI Studio Non-streaming Response"
+import litellm
+import os
+
+# Set API key for Google AI Studio
+os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
+
+# Non-streaming response
+response = litellm.responses(
+    model="gemini/gemini-1.5-flash",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    max_output_tokens=100
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="Google AI Studio Streaming Response"
+import litellm
+import os
+
+# Set API key for Google AI Studio
+os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
+
+# Streaming response
+response = litellm.responses(
+    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
 )
@ -116,10 +227,407 @@ for event in response:
 </TabItem>
 </Tabs>

+### LiteLLM Proxy with OpenAI SDK

-## **Supported Providers**
+First, set up and start your LiteLLM proxy server.

-| Provider    | Link to Usage      |
-|-------------|--------------------|
-| OpenAI|   [Usage](#usage)                 |
-| Azure OpenAI|   [Usage](../docs/providers/azure#responses-api)                 |  
+```bash title="Start LiteLLM Proxy Server"
+litellm --config /path/to/config.yaml
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+<Tabs>
+<TabItem value="openai" label="OpenAI">
+
+First, add this to your litellm proxy config.yaml:
+```yaml showLineNumbers title="OpenAI Proxy Configuration"
+model_list:
+  - model_name: openai/o1-pro
+    litellm_params:
+      model: openai/o1-pro
+      api_key: os.environ/OPENAI_API_KEY
+```
+
+#### Non-streaming
+```python showLineNumbers title="OpenAI Proxy Non-streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Non-streaming response
+response = client.responses.create(
+    model="openai/o1-pro",
+    input="Tell me a three sentence bedtime story about a unicorn."
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="OpenAI Proxy Streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Streaming response
+response = client.responses.create(
+    model="openai/o1-pro",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+
+<TabItem value="anthropic" label="Anthropic">
+
+First, add this to your litellm proxy config.yaml:
+```yaml showLineNumbers title="Anthropic Proxy Configuration"
+model_list:
+  - model_name: anthropic/claude-3-5-sonnet-20240620
+    litellm_params:
+      model: anthropic/claude-3-5-sonnet-20240620
+      api_key: os.environ/ANTHROPIC_API_KEY
+```
+
+#### Non-streaming
+```python showLineNumbers title="Anthropic Proxy Non-streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Non-streaming response
+response = client.responses.create(
+    model="anthropic/claude-3-5-sonnet-20240620",
+    input="Tell me a three sentence bedtime story about a unicorn."
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="Anthropic Proxy Streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Streaming response
+response = client.responses.create(
+    model="anthropic/claude-3-5-sonnet-20240620",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+
+<TabItem value="vertex" label="Vertex AI">
+
+First, add this to your litellm proxy config.yaml:
+```yaml showLineNumbers title="Vertex AI Proxy Configuration"
+model_list:
+  - model_name: vertex_ai/gemini-1.5-pro
+    litellm_params:
+      model: vertex_ai/gemini-1.5-pro
+      vertex_project: your-gcp-project-id
+      vertex_location: us-central1
+```
+
+#### Non-streaming
+```python showLineNumbers title="Vertex AI Proxy Non-streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Non-streaming response
+response = client.responses.create(
+    model="vertex_ai/gemini-1.5-pro",
+    input="Tell me a three sentence bedtime story about a unicorn."
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="Vertex AI Proxy Streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Streaming response
+response = client.responses.create(
+    model="vertex_ai/gemini-1.5-pro",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+
+<TabItem value="bedrock" label="AWS Bedrock">
+
+First, add this to your litellm proxy config.yaml:
+```yaml showLineNumbers title="AWS Bedrock Proxy Configuration"
+model_list:
+  - model_name: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
+    litellm_params:
+      model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
+      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
+      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
+      aws_region_name: us-west-2
+```
+
+#### Non-streaming
+```python showLineNumbers title="AWS Bedrock Proxy Non-streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Non-streaming response
+response = client.responses.create(
+    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
+    input="Tell me a three sentence bedtime story about a unicorn."
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="AWS Bedrock Proxy Streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Streaming response
+response = client.responses.create(
+    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+
+<TabItem value="gemini" label="Google AI Studio">
+
+First, add this to your litellm proxy config.yaml:
+```yaml showLineNumbers title="Google AI Studio Proxy Configuration"
+model_list:
+  - model_name: gemini/gemini-1.5-flash
+    litellm_params:
+      model: gemini/gemini-1.5-flash
+      api_key: os.environ/GEMINI_API_KEY
+```
+
+#### Non-streaming
+```python showLineNumbers title="Google AI Studio Proxy Non-streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Non-streaming response
+response = client.responses.create(
+    model="gemini/gemini-1.5-flash",
+    input="Tell me a three sentence bedtime story about a unicorn."
+)
+
+print(response)
+```
+
+#### Streaming
+```python showLineNumbers title="Google AI Studio Proxy Streaming Response"
+from openai import OpenAI
+
+# Initialize client with your proxy URL
+client = OpenAI(
+    base_url="http://localhost:4000",  # Your proxy URL
+    api_key="your-api-key"             # Your proxy API key
+)
+
+# Streaming response
+response = client.responses.create(
+    model="gemini/gemini-1.5-flash",
+    input="Tell me a three sentence bedtime story about a unicorn.",
+    stream=True
+)
+
+for event in response:
+    print(event)
+```
+
+</TabItem>
+</Tabs>
+
+## Supported Responses API Parameters
+
+| Provider | Supported Parameters |
+|----------|---------------------|
+| `openai` | [All Responses API parameters are supported](https://github.com/BerriAI/litellm/blob/7c3df984da8e4dff9201e4c5353fdc7a2b441831/litellm/llms/openai/responses/transformation.py#L23) |
+| `azure` | [All Responses API parameters are supported](https://github.com/BerriAI/litellm/blob/7c3df984da8e4dff9201e4c5353fdc7a2b441831/litellm/llms/openai/responses/transformation.py#L23) |
+| `anthropic` | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
+| `bedrock` | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
+| `gemini` | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
+| `vertex_ai` | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
+| `azure_ai` | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
+| All other llm api providers | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
+
+## Load Balancing with Routing Affinity
+
+When using the Responses API with multiple deployments of the same model (e.g., multiple Azure OpenAI endpoints), LiteLLM provides routing affinity for conversations. This ensures that follow-up requests using a `previous_response_id` are routed to the same deployment that generated the original response.
+
+
+#### Example Usage
+
+<Tabs>
+<TabItem value="python-sdk" label="Python SDK">
+
+```python showLineNumbers title="Python SDK with Routing Affinity"
+import litellm
+
+# Set up router with multiple deployments of the same model
+router = litellm.Router(
+    model_list=[
+        {
+            "model_name": "azure-gpt4-turbo",
+            "litellm_params": {
+                "model": "azure/gpt-4-turbo",
+                "api_key": "your-api-key-1",
+                "api_version": "2024-06-01",
+                "api_base": "https://endpoint1.openai.azure.com",
+            },
+        },
+        {
+            "model_name": "azure-gpt4-turbo",
+            "litellm_params": {
+                "model": "azure/gpt-4-turbo",
+                "api_key": "your-api-key-2",
+                "api_version": "2024-06-01",
+                "api_base": "https://endpoint2.openai.azure.com",
+            },
+        },
+    ],
+    optional_pre_call_checks=["responses_api_deployment_check"],
+)
+
+# Initial request
+response = await router.aresponses(
+    model="azure-gpt4-turbo",
+    input="Hello, who are you?",
+    truncation="auto",
+)
+
+# Store the response ID
+response_id = response.id
+
+# Follow-up request - will be automatically routed to the same deployment
+follow_up = await router.aresponses(
+    model="azure-gpt4-turbo",
+    input="Tell me more about yourself",
+    truncation="auto",
+    previous_response_id=response_id  # This ensures routing to the same deployment
+)
+```
+
+</TabItem>
+<TabItem value="proxy-server" label="Proxy Server">
+
+#### 1. Setup routing affinity on proxy config.yaml
+
+To enable routing affinity for Responses API in your LiteLLM proxy, set `optional_pre_call_checks: ["responses_api_deployment_check"]` in your proxy config.yaml.
+
+```yaml showLineNumbers title="config.yaml with Responses API Routing Affinity"
+model_list:
+  - model_name: azure-gpt4-turbo
+    litellm_params:
+      model: azure/gpt-4-turbo
+      api_key: your-api-key-1
+      api_version: 2024-06-01
+      api_base: https://endpoint1.openai.azure.com
+  - model_name: azure-gpt4-turbo
+    litellm_params:
+      model: azure/gpt-4-turbo
+      api_key: your-api-key-2
+      api_version: 2024-06-01
+      api_base: https://endpoint2.openai.azure.com
+
+router_settings:
+  optional_pre_call_checks: ["responses_api_deployment_check"]
+```
+
+#### 2. Use the OpenAI Python SDK to make requests to LiteLLM Proxy
+
+```python showLineNumbers title="OpenAI Client with Proxy Server"
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:4000",
+    api_key="your-api-key"
+)
+
+# Initial request
+response = client.responses.create(
+    model="azure-gpt4-turbo",
+    input="Hello, who are you?"
+)
+
+response_id = response.id
+
+# Follow-up request - will be automatically routed to the same deployment
+follow_up = client.responses.create(
+    model="azure-gpt4-turbo",
+    input="Tell me more about yourself",
+    previous_response_id=response_id  # This ensures routing to the same deployment
+)
+```
+
+</TabItem>
+</Tabs>
--- a/docs/my-website/docs/simple_proxy_old_doc.md
+++ b/docs/my-website/docs/simple_proxy_old_doc.md
@ -994,16 +994,16 @@ litellm --health

 ## Logging Proxy Input/Output - OpenTelemetry

-### Step 1 Start OpenTelemetry Collecter Docker Container
+### Step 1 Start OpenTelemetry Collector Docker Container
 This container sends logs to your selected destination 

-#### Install OpenTelemetry Collecter Docker Image
+#### Install OpenTelemetry Collector Docker Image
 ```shell
 docker pull otel/opentelemetry-collector:0.90.0
 docker run -p 127.0.0.1:4317:4317 -p 127.0.0.1:55679:55679 otel/opentelemetry-collector:0.90.0
 ```

-#### Set Destination paths on OpenTelemetry Collecter
+#### Set Destination paths on OpenTelemetry Collector

 Here's the OpenTelemetry yaml config to use with Elastic Search
 ```yaml
@ -1077,7 +1077,7 @@ general_settings:
 LiteLLM will read the `OTEL_ENDPOINT` environment variable to send data to your OTEL collector 

 ```python
-os.environ['OTEL_ENDPOINT'] # defauls to 127.0.0.1:4317 if not provided
+os.environ['OTEL_ENDPOINT'] # defaults to 127.0.0.1:4317 if not provided
 ```

 #### Start LiteLLM Proxy
@ -1101,8 +1101,8 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
 ```


-#### Test & View Logs on OpenTelemetry Collecter
-On successfull logging you should be able to see this log on your `OpenTelemetry Collecter` Docker Container
+#### Test & View Logs on OpenTelemetry Collector
+On successful logging you should be able to see this log on your `OpenTelemetry Collector` Docker Container
 ```shell
 Events:
 SpanEvent #0
@ -1149,7 +1149,7 @@ Here's the log view on Elastic Search. You can see the request `input`, `output`
 <Image img={require('../img/elastic_otel.png')} />

 ## Logging Proxy Input/Output - Langfuse
-We will use the `--config` to set `litellm.success_callback = ["langfuse"]` this will log all successfull LLM calls to langfuse
+We will use the `--config` to set `litellm.success_callback = ["langfuse"]` this will log all successful LLM calls to langfuse

 **Step 1** Install langfuse

--- a/docs/my-website/docs/tutorials/compare_llms.md
+++ b/docs/my-website/docs/tutorials/compare_llms.md
@ -117,7 +117,7 @@ response = completion("command-nightly", messages)
 """


-# qustions/logs you want to run the LLM on
+# questions/logs you want to run the LLM on
 questions = [
    "what is litellm?",
    "why should I use LiteLLM",
--- a/docs/my-website/docs/tutorials/gradio_integration.md
+++ b/docs/my-website/docs/tutorials/gradio_integration.md
@ -30,7 +30,7 @@ def inference(message, history):
            yield partial_message
    except Exception as e:
        print("Exception encountered:", str(e))
-        yield f"An Error occured please 'Clear' the error and try your question again"
+        yield f"An Error occurred please 'Clear' the error and try your question again"
 ```

 ### Define Chat Interface
--- a/docs/my-website/docs/tutorials/scim_litellm.md
+++ b/docs/my-website/docs/tutorials/scim_litellm.md
@ -0,0 +1,74 @@
+
+import Image from '@theme/IdealImage';
+
+# SCIM with LiteLLM
+
+Enables identity providers (Okta, Azure AD, OneLogin, etc.) to automate user and team (group) provisioning, updates, and deprovisioning on LiteLLM.
+
+
+This tutorial will walk you through the steps to connect your IDP to LiteLLM SCIM Endpoints.
+
+### Supported SSO Providers for SCIM
+Below is a list of supported SSO providers for connecting to LiteLLM SCIM Endpoints.
+- Microsoft Entra ID (Azure AD)
+- Okta
+- Google Workspace
+- OneLogin
+- Keycloak
+- Auth0
+
+
+## 1. Get your SCIM Tenant URL and Bearer Token
+
+On LiteLLM, navigate to the Settings > Admin Settings > SCIM. On this page you will create a SCIM Token, this allows your IDP to authenticate to litellm `/scim` endpoints.
+
+<Image img={require('../../img/scim_2.png')}  style={{ width: '800px', height: 'auto' }} />
+
+## 2. Connect your IDP to LiteLLM SCIM Endpoints
+
+On your IDP provider, navigate to your SSO application and select `Provisioning` > `New provisioning configuration`.
+
+On this page, paste in your litellm scim tenant url and bearer token.
+
+Once this is pasted in, click on `Test Connection` to ensure your IDP can authenticate to the LiteLLM SCIM endpoints.
+
+<Image img={require('../../img/scim_4.png')}  style={{ width: '800px', height: 'auto' }} />
+
+
+## 3. Test SCIM Connection
+
+### 3.1 Assign the group to your LiteLLM Enterprise App
+
+On your IDP Portal, navigate to `Enterprise Applications` > Select your litellm app 
+
+<Image img={require('../../img/msft_enterprise_app.png')}  style={{ width: '800px', height: 'auto' }} />
+
+<br />
+<br />
+
+Once you've selected your litellm app, click on `Users and Groups` > `Add user/group` 
+
+<Image img={require('../../img/msft_enterprise_assign_group.png')}  style={{ width: '800px', height: 'auto' }} />
+
+<br />
+
+Now select the group you created in step 1.1. And add it to the LiteLLM Enterprise App. At this point we have added `Production LLM Evals Group` to the LiteLLM Enterprise App. The next step is having LiteLLM automatically create the `Production LLM Evals Group` on the LiteLLM DB when a new user signs in.
+
+<Image img={require('../../img/msft_enterprise_select_group.png')}  style={{ width: '800px', height: 'auto' }} />
+
+
+### 3.2 Sign in to LiteLLM UI via SSO
+
+Sign into the LiteLLM UI via SSO. You should be redirected to the Entra ID SSO page. This SSO sign in flow will trigger LiteLLM to fetch the latest Groups and Members from Azure Entra ID.
+
+<Image img={require('../../img/msft_sso_sign_in.png')}  style={{ width: '800px', height: 'auto' }} />
+
+### 3.3 Check the new team on LiteLLM UI
+
+On the LiteLLM UI, Navigate to `Teams`, You should see the new team `Production LLM Evals Group` auto-created on LiteLLM. 
+
+<Image img={require('../../img/msft_auto_team.png')}  style={{ width: '900px', height: 'auto' }} />
+
+
+
+
--- a/docs/my-website/img/release_notes/new_tag_usage.png
+++ b/docs/my-website/img/release_notes/new_tag_usage.png
--- a/docs/my-website/img/release_notes/new_team_usage.png
+++ b/docs/my-website/img/release_notes/new_team_usage.png
--- a/docs/my-website/img/release_notes/new_team_usage_highlight.jpg
+++ b/docs/my-website/img/release_notes/new_team_usage_highlight.jpg
--- a/docs/my-website/img/release_notes/unified_responses_api_rn.png
+++ b/docs/my-website/img/release_notes/unified_responses_api_rn.png
--- a/docs/my-website/img/scim_0.png
+++ b/docs/my-website/img/scim_0.png
--- a/docs/my-website/img/scim_1.png
+++ b/docs/my-website/img/scim_1.png
--- a/docs/my-website/img/scim_2.png
+++ b/docs/my-website/img/scim_2.png
--- a/docs/my-website/img/scim_3.png
+++ b/docs/my-website/img/scim_3.png
--- a/docs/my-website/img/scim_4.png
+++ b/docs/my-website/img/scim_4.png
--- a/docs/my-website/img/scim_integration.png
+++ b/docs/my-website/img/scim_integration.png
--- a/docs/my-website/release_notes/v1.67.0-stable/index.md
+++ b/docs/my-website/release_notes/v1.67.0-stable/index.md
@ -0,0 +1,153 @@
+---
+title: v1.67.0-stable - SCIM Integration
+slug: v1.67.0-stable
+date: 2025-04-19T10:00:00
+authors:
+  - name: Krrish Dholakia
+    title: CEO, LiteLLM
+    url: https://www.linkedin.com/in/krish-d/
+    image_url: https://media.licdn.com/dms/image/v2/D4D03AQGrlsJ3aqpHmQ/profile-displayphoto-shrink_400_400/B4DZSAzgP7HYAg-/0/1737327772964?e=1749686400&v=beta&t=Hkl3U8Ps0VtvNxX0BNNq24b4dtX5wQaPFp6oiKCIHD8
+  - name: Ishaan Jaffer
+    title: CTO, LiteLLM
+    url: https://www.linkedin.com/in/reffajnaahsi/
+    image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
+
+tags: ["sso", "unified_file_id", "cost_tracking", "security"]
+hide_table_of_contents: false
+---
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Key Highlights
+
+- **SCIM Integration**: Enables identity providers (Okta, Azure AD, OneLogin, etc.) to automate user and team (group) provisioning, updates, and deprovisioning
+- **Team and Tag based usage tracking**: You can now see usage and spend by team and tag at 1M+ spend logs.
+- **Unified Responses API**: Support for calling Anthropic, Gemini, Groq, etc. via OpenAI's new Responses API.
+
+Let's dive in.
+
+## SCIM Integration
+
+<Image img={require('../../img/scim_integration.png')}/>
+
+This release adds SCIM support to LiteLLM. This allows your SSO provider (Okta, Azure AD, etc) to automatically create/delete users, teams, and memberships on LiteLLM. This means that when you remove a team on your SSO provider, your SSO provider will automatically delete the corresponding team on LiteLLM. 
+
+[Read more](../../docs/tutorials/scim_litellm)
+## Team and Tag based usage tracking
+
+<Image img={require('../../img/release_notes/new_team_usage_highlight.jpg')}/>
+
+
+This release improves team and tag based usage tracking at 1m+ spend logs, making it easy to monitor your LLM API Spend in production. This covers:
+
+- View **daily spend** by teams + tags
+- View **usage / spend by key**, within teams
+- View **spend by multiple tags**
+- Allow **internal users** to view spend of teams they're a member of
+
+[Read more](#management-endpoints--ui)
+
+## Unified Responses API
+
+This release allows you to call Azure OpenAI, Anthropic, AWS Bedrock, and Google Vertex AI models via the POST /v1/responses endpoint on LiteLLM. This means you can now use popular tools like [OpenAI Codex](https://docs.litellm.ai/docs/tutorials/openai_codex) with your own models. 
+
+<Image img={require('../../img/release_notes/unified_responses_api_rn.png')}/>
+
+
+[Read more](https://docs.litellm.ai/docs/response_api)
+
+
+## New Models / Updated Models
+
+- **OpenAI**
+    1. gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing - [Get Started](../../docs/providers/openai#usage), [PR](https://github.com/BerriAI/litellm/pull/9990)
+    2. o4 - correctly map o4 to openai o_series model
+- **Azure AI**
+    1. Phi-4 output cost per token fix - [PR](https://github.com/BerriAI/litellm/pull/9880)
+    2. Responses API support [Get Started](../../docs/providers/azure#azure-responses-api),[PR](https://github.com/BerriAI/litellm/pull/10116)
+- **Anthropic**
+    1. redacted message thinking support - [Get Started](../../docs/providers/anthropic#usage---thinking--reasoning_content),[PR](https://github.com/BerriAI/litellm/pull/10129)
+- **Cohere**
+    1. `/v2/chat` Passthrough endpoint support w/ cost tracking - [Get Started](../../docs/pass_through/cohere), [PR](https://github.com/BerriAI/litellm/pull/9997)
+- **Azure**
+    1. Support azure tenant_id/client_id env vars - [Get Started](../../docs/providers/azure#entra-id---use-tenant_id-client_id-client_secret), [PR](https://github.com/BerriAI/litellm/pull/9993)
+    2. Fix response_format check for 2025+ api versions - [PR](https://github.com/BerriAI/litellm/pull/9993)
+    3. Add gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing
+- **VLLM**
+    1. Files - Support 'file' message type for VLLM video url's - [Get Started](../../docs/providers/vllm#send-video-url-to-vllm), [PR](https://github.com/BerriAI/litellm/pull/10129)
+    2. Passthrough - new `/vllm/` passthrough endpoint support [Get Started](../../docs/pass_through/vllm), [PR](https://github.com/BerriAI/litellm/pull/10002)
+- **Mistral**
+    1. new `/mistral` passthrough endpoint support [Get Started](../../docs/pass_through/mistral), [PR](https://github.com/BerriAI/litellm/pull/10002)
+- **AWS**
+    1. New mapped bedrock regions - [PR](https://github.com/BerriAI/litellm/pull/9430)
+- **VertexAI / Google AI Studio**
+    1. Gemini - Response format - Retain schema field ordering for google gemini and vertex by specifying propertyOrdering - [Get Started](../../docs/providers/vertex#json-schema), [PR](https://github.com/BerriAI/litellm/pull/9828)
+    2. Gemini-2.5-flash - return reasoning content [Google AI Studio](../../docs/providers/gemini#usage---thinking--reasoning_content), [Vertex AI](../../docs/providers/vertex#thinking--reasoning_content)
+    3. Gemini-2.5-flash - pricing + model information [PR](https://github.com/BerriAI/litellm/pull/10125)
+    4. Passthrough - new `/vertex_ai/discovery` route - enables calling AgentBuilder API routes [Get Started](../../docs/pass_through/vertex_ai#supported-api-endpoints), [PR](https://github.com/BerriAI/litellm/pull/10084)
+- **Fireworks AI**
+    1. return tool calling responses in `tool_calls` field (fireworks incorrectly returns this as a json str in content) [PR](https://github.com/BerriAI/litellm/pull/10130)
+- **Triton**
+    1. Remove fixed remove bad_words / stop words from `/generate` call - [Get Started](../../docs/providers/triton-inference-server#triton-generate---chat-completion), [PR](https://github.com/BerriAI/litellm/pull/10163)
+- **Other**
+    1. Support for all litellm providers on Responses API (works with Codex) - [Get Started](../../docs/tutorials/openai_codex), [PR](https://github.com/BerriAI/litellm/pull/10132)
+    2. Fix combining multiple tool calls in streaming response - [Get Started](../../docs/completion/stream#helper-function), [PR](https://github.com/BerriAI/litellm/pull/10040)
+
+
+## Spend Tracking Improvements
+
+- **Cost Control** - inject cache control points in prompt for cost reduction [Get Started](../../docs/tutorials/prompt_caching), [PR](https://github.com/BerriAI/litellm/pull/10000)
+- **Spend Tags** - spend tags in headers - support x-litellm-tags even if tag based routing not enabled [Get Started](../../docs/proxy/request_headers#litellm-headers), [PR](https://github.com/BerriAI/litellm/pull/10000)
+- **Gemini-2.5-flash** - support cost calculation for reasoning tokens [PR](https://github.com/BerriAI/litellm/pull/10141)
+
+## Management Endpoints / UI
+- **Users**
+    1. Show created_at and updated_at on users page - [PR](https://github.com/BerriAI/litellm/pull/10033)
+- **Virtual Keys**
+    1. Filter by key alias - https://github.com/BerriAI/litellm/pull/10085
+- **Usage Tab**
+
+    1. Team based usage
+        
+        - New `LiteLLM_DailyTeamSpend` Table for aggregate team based usage logging - [PR](https://github.com/BerriAI/litellm/pull/10039)
+        
+        - New Team based usage dashboard + new `/team/daily/activity` API - [PR](https://github.com/BerriAI/litellm/pull/10081)
+        - Return team alias on /team/daily/activity API - [PR](https://github.com/BerriAI/litellm/pull/10157)
+        - allow internal user view spend for teams they belong to - [PR](https://github.com/BerriAI/litellm/pull/10157)
+        - allow viewing top keys by team - [PR](https://github.com/BerriAI/litellm/pull/10157)
+
+        <Image img={require('../../img/release_notes/new_team_usage.png')}/>
+
+    2. Tag Based Usage
+        - New `LiteLLM_DailyTagSpend` Table for aggregate tag based usage logging - [PR](https://github.com/BerriAI/litellm/pull/10071)
+        - Restrict to only Proxy Admins - [PR](https://github.com/BerriAI/litellm/pull/10157)
+        - allow viewing top keys by tag
+        - Return tags passed in request (i.e. dynamic tags) on `/tag/list` API - [PR](https://github.com/BerriAI/litellm/pull/10157)
+        <Image img={require('../../img/release_notes/new_tag_usage.png')}/>
+    3. Track prompt caching metrics in daily user, team, tag tables - [PR](https://github.com/BerriAI/litellm/pull/10029)
+    4. Show usage by key (on all up, team, and tag usage dashboards) - [PR](https://github.com/BerriAI/litellm/pull/10157)
+    5. swap old usage with new usage tab
+- **Models**
+    1. Make columns resizable/hideable - [PR](https://github.com/BerriAI/litellm/pull/10119)
+- **API Playground**
+    1. Allow internal user to call api playground - [PR](https://github.com/BerriAI/litellm/pull/10157)
+- **SCIM**
+    1. Add LiteLLM SCIM Integration for Team and User management - [Get Started](../../docs/tutorials/scim_litellm), [PR](https://github.com/BerriAI/litellm/pull/10072)
+
+
+## Logging / Guardrail Integrations
+- **GCS**
+    1. Fix gcs pub sub logging with env var GCS_PROJECT_ID - [Get Started](../../docs/observability/gcs_bucket_integration#usage), [PR](https://github.com/BerriAI/litellm/pull/10042)
+- **AIM**
+    1. Add litellm call id passing to Aim guardrails on pre and post-hooks calls - [Get Started](../../docs/proxy/guardrails/aim_security), [PR](https://github.com/BerriAI/litellm/pull/10021)
+- **Azure blob storage**
+    1. Ensure logging works in high throughput scenarios - [Get Started](../../docs/proxy/logging#azure-blob-storage), [PR](https://github.com/BerriAI/litellm/pull/9962)
+
+## General Proxy Improvements
+
+- **Support setting `litellm.modify_params` via env var** [PR](https://github.com/BerriAI/litellm/pull/9964)
+- **Model Discovery** - Check provider’s `/models` endpoints when calling proxy’s `/v1/models` endpoint - [Get Started](../../docs/proxy/model_discovery), [PR](https://github.com/BerriAI/litellm/pull/9958)
+- **`/utils/token_counter`** - fix retrieving custom tokenizer for db models - [Get Started](../../docs/proxy/configs#set-custom-tokenizer), [PR](https://github.com/BerriAI/litellm/pull/10047)
+- **Prisma migrate** - handle existing columns in db table - [PR](https://github.com/BerriAI/litellm/pull/10138)
+
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -69,6 +69,7 @@ const sidebars = {
            "proxy/clientside_auth",
            "proxy/request_headers",
            "proxy/response_headers",
+            "proxy/model_discovery",
          ],
        },
        {
@ -101,6 +102,7 @@ const sidebars = {
            "proxy/admin_ui_sso",
            "proxy/self_serve",
            "proxy/public_teams",
+            "tutorials/scim_litellm",
            "proxy/custom_sso",
            "proxy/ui_credentials",
            "proxy/ui_logs"
@ -330,6 +332,8 @@ const sidebars = {
            "pass_through/vertex_ai",
            "pass_through/google_ai_studio",
            "pass_through/cohere",
+            "pass_through/vllm",
+            "pass_through/mistral",
            "pass_through/openai_passthrough",
            "pass_through/anthropic_completion",
            "pass_through/bedrock",
@ -407,6 +411,7 @@ const sidebars = {
      type: "category",
      label: "Logging & Observability",
      items: [
+        "observability/agentops_integration",
        "observability/langfuse_integration",
        "observability/lunary_integration",
        "observability/mlflow",
--- a/litellm/init.py
+++ b/litellm/init.py
@ -113,6 +113,7 @@ _custom_logger_compatible_callbacks_literal = Literal[
    "pagerduty",
    "humanloop",
    "gcs_pubsub",
+    "agentops",
    "anthropic_cache_control_hook",
 ]
 logged_real_time_event_types: Optional[Union[List[str], Literal["*"]]] = None
@ -415,6 +416,7 @@ deepseek_models: List = []
 azure_ai_models: List = []
 jina_ai_models: List = []
 voyage_models: List = []
+infinity_models: List = []
 databricks_models: List = []
 cloudflare_models: List = []
 codestral_models: List = []
@ -556,6 +558,8 @@ def add_known_models():
            azure_ai_models.append(key)
        elif value.get("litellm_provider") == "voyage":
            voyage_models.append(key)
+        elif value.get("litellm_provider") == "infinity":
+            infinity_models.append(key)
        elif value.get("litellm_provider") == "databricks":
            databricks_models.append(key)
        elif value.get("litellm_provider") == "cloudflare":
@ -644,6 +648,7 @@ model_list = (
    + deepseek_models
    + azure_ai_models
    + voyage_models
+    + infinity_models
    + databricks_models
    + cloudflare_models
    + codestral_models
@ -699,6 +704,7 @@ models_by_provider: dict = {
    "mistral": mistral_chat_models,
    "azure_ai": azure_ai_models,
    "voyage": voyage_models,
+    "infinity": infinity_models,
    "databricks": databricks_models,
    "cloudflare": cloudflare_models,
    "codestral": codestral_models,
@ -946,6 +952,7 @@ from .llms.topaz.image_variations.transformation import TopazImageVariationConfi
 from litellm.llms.openai.completion.transformation import OpenAITextCompletionConfig
 from .llms.groq.chat.transformation import GroqChatConfig
 from .llms.voyage.embedding.transformation import VoyageEmbeddingConfig
+from .llms.infinity.embedding.transformation import InfinityEmbeddingConfig
 from .llms.azure_ai.chat.transformation import AzureAIStudioConfig
 from .llms.mistral.mistral_chat_transformation import MistralConfig
 from .llms.openai.responses.transformation import OpenAIResponsesAPIConfig
--- a/litellm/cost_calculator.py
+++ b/litellm/cost_calculator.py
@ -57,6 +57,7 @@ from litellm.llms.vertex_ai.image_generation.cost_calculator import (
 from litellm.responses.utils import ResponseAPILoggingUtils
 from litellm.types.llms.openai import (
    HttpxBinaryResponseContent,
+    ImageGenerationRequestQuality,
    OpenAIRealtimeStreamList,
    OpenAIRealtimeStreamResponseBaseObject,
    OpenAIRealtimeStreamSessionEvents,
@ -642,9 +643,9 @@ def completion_cost(  # noqa: PLR0915
                    or isinstance(completion_response, dict)
                ):  # tts returns a custom class
                    if isinstance(completion_response, dict):
-                        usage_obj: Optional[
-                            Union[dict, Usage]
-                        ] = completion_response.get("usage", {})
+                        usage_obj: Optional[Union[dict, Usage]] = (
+                            completion_response.get("usage", {})
+                        )
                    else:
                        usage_obj = getattr(completion_response, "usage", {})
                    if isinstance(usage_obj, BaseModel) and not _is_known_usage_objects(
@ -913,7 +914,7 @@ def completion_cost(  # noqa: PLR0915


 def get_response_cost_from_hidden_params(
-    hidden_params: Union[dict, BaseModel]
+    hidden_params: Union[dict, BaseModel],
 ) -> Optional[float]:
    if isinstance(hidden_params, BaseModel):
        _hidden_params_dict = hidden_params.model_dump()
@ -1101,29 +1102,36 @@ def default_image_cost_calculator(
        f"{quality}/{base_model_name}" if quality else base_model_name
    )

+    # gpt-image-1 models use low, medium, high quality. If user did not specify quality, use medium fot gpt-image-1 model family
+    model_name_with_v2_quality = (
+        f"{ImageGenerationRequestQuality.MEDIUM.value}/{base_model_name}"
+    )
+
    verbose_logger.debug(
        f"Looking up cost for models: {model_name_with_quality}, {base_model_name}"
    )

-    # Try model with quality first, fall back to base model name
-    if model_name_with_quality in litellm.model_cost:
-        cost_info = litellm.model_cost[model_name_with_quality]
-    elif base_model_name in litellm.model_cost:
-        cost_info = litellm.model_cost[base_model_name]
-    else:
-        # Try without provider prefix
    model_without_provider = f"{size_str}/{model.split('/')[-1]}"
    model_with_quality_without_provider = (
        f"{quality}/{model_without_provider}" if quality else model_without_provider
    )

-        if model_with_quality_without_provider in litellm.model_cost:
-            cost_info = litellm.model_cost[model_with_quality_without_provider]
-        elif model_without_provider in litellm.model_cost:
-            cost_info = litellm.model_cost[model_without_provider]
-        else:
+    # Try model with quality first, fall back to base model name
+    cost_info: Optional[dict] = None
+    models_to_check = [
+        model_name_with_quality,
+        base_model_name,
+        model_name_with_v2_quality,
+        model_with_quality_without_provider,
+        model_without_provider,
+    ]
+    for model in models_to_check:
+        if model in litellm.model_cost:
+            cost_info = litellm.model_cost[model]
+            break
+    if cost_info is None:
        raise Exception(
-                f"Model not found in cost map. Tried {model_name_with_quality}, {base_model_name}, {model_with_quality_without_provider}, and {model_without_provider}"
+            f"Model not found in cost map. Tried checking {models_to_check}"
        )

    return cost_info["input_cost_per_pixel"] * height * width * n
--- a/litellm/integrations/_types/open_inference.py
+++ b/litellm/integrations/_types/open_inference.py
@ -45,6 +45,14 @@ class SpanAttributes:
    """
    The name of the model being used.
    """
+    LLM_PROVIDER = "llm.provider"
+    """
+    The provider of the model, such as OpenAI, Azure, Google, etc.
+    """
+    LLM_SYSTEM = "llm.system"
+    """
+    The AI product as identified by the client or server
+    """
    LLM_PROMPTS = "llm.prompts"
    """
    Prompts provided to a completions API.
@ -65,15 +73,40 @@ class SpanAttributes:
    """
    Number of tokens in the prompt.
    """
+    LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_WRITE = "llm.token_count.prompt_details.cache_write"
+    """
+    Number of tokens in the prompt that were written to cache.
+    """
+    LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ = "llm.token_count.prompt_details.cache_read"
+    """
+    Number of tokens in the prompt that were read from cache.
+    """
+    LLM_TOKEN_COUNT_PROMPT_DETAILS_AUDIO = "llm.token_count.prompt_details.audio"
+    """
+    The number of audio input tokens presented in the prompt
+    """
    LLM_TOKEN_COUNT_COMPLETION = "llm.token_count.completion"
    """
    Number of tokens in the completion.
    """
+    LLM_TOKEN_COUNT_COMPLETION_DETAILS_REASONING = "llm.token_count.completion_details.reasoning"
+    """
+    Number of tokens used for reasoning steps in the completion.
+    """
+    LLM_TOKEN_COUNT_COMPLETION_DETAILS_AUDIO = "llm.token_count.completion_details.audio"
+    """
+    The number of audio input tokens generated by the model
+    """
    LLM_TOKEN_COUNT_TOTAL = "llm.token_count.total"
    """
    Total number of tokens, including both prompt and completion.
    """

+    LLM_TOOLS = "llm.tools"
+    """
+    List of tools that are advertised to the LLM to be able to call
+    """
+
    TOOL_NAME = "tool.name"
    """
    Name of the tool being used.
@ -112,6 +145,19 @@ class SpanAttributes:
    The id of the user
    """

+    PROMPT_VENDOR = "prompt.vendor"
+    """
+    The vendor or origin of the prompt, e.g. a prompt library, a specialized service, etc.
+    """
+    PROMPT_ID = "prompt.id"
+    """
+    A vendor-specific id used to locate the prompt.
+    """
+    PROMPT_URL = "prompt.url"
+    """
+    A vendor-specific url used to locate the prompt.
+    """
+

 class MessageAttributes:
    """
@ -151,6 +197,10 @@ class MessageAttributes:
    The JSON string representing the arguments passed to the function
    during a function call.
    """
+    MESSAGE_TOOL_CALL_ID = "message.tool_call_id"
+    """
+    The id of the tool call.
+    """


 class MessageContentAttributes:
@ -186,6 +236,25 @@ class ImageAttributes:
    """


+class AudioAttributes:
+    """
+    Attributes for audio
+    """
+
+    AUDIO_URL = "audio.url"
+    """
+    The url to an audio file
+    """
+    AUDIO_MIME_TYPE = "audio.mime_type"
+    """
+    The mime type of the audio file
+    """
+    AUDIO_TRANSCRIPT = "audio.transcript"
+    """
+    The transcript of the audio file
+    """
+
+
 class DocumentAttributes:
    """
    Attributes for a document.
@ -257,6 +326,10 @@ class ToolCallAttributes:
    Attributes for a tool call
    """

+    TOOL_CALL_ID = "tool_call.id"
+    """
+    The id of the tool call.
+    """
    TOOL_CALL_FUNCTION_NAME = "tool_call.function.name"
    """
    The name of function that is being called during a tool call.
@ -268,6 +341,18 @@ class ToolCallAttributes:
    """


+class ToolAttributes:
+    """
+    Attributes for a tools
+    """
+
+    TOOL_JSON_SCHEMA = "tool.json_schema"
+    """
+    The json schema of a tool input, It is RECOMMENDED that this be in the
+    OpenAI tool calling format: https://platform.openai.com/docs/assistants/tools
+    """
+
+
 class OpenInferenceSpanKindValues(Enum):
    TOOL = "TOOL"
    CHAIN = "CHAIN"
@ -284,3 +369,21 @@ class OpenInferenceSpanKindValues(Enum):
 class OpenInferenceMimeTypeValues(Enum):
    TEXT = "text/plain"
    JSON = "application/json"
+
+
+class OpenInferenceLLMSystemValues(Enum):
+    OPENAI = "openai"
+    ANTHROPIC = "anthropic"
+    COHERE = "cohere"
+    MISTRALAI = "mistralai"
+    VERTEXAI = "vertexai"
+
+
+class OpenInferenceLLMProviderValues(Enum):
+    OPENAI = "openai"
+    ANTHROPIC = "anthropic"
+    COHERE = "cohere"
+    MISTRALAI = "mistralai"
+    GOOGLE = "google"
+    AZURE = "azure"
+    AWS = "aws"
--- a/litellm/integrations/agentops/init.py
+++ b/litellm/integrations/agentops/init.py
@ -0,0 +1,3 @@
+from .agentops import AgentOps
+
+__all__ = ["AgentOps"] 
--- a/litellm/integrations/agentops/agentops.py
+++ b/litellm/integrations/agentops/agentops.py
@ -0,0 +1,118 @@
+"""
+AgentOps integration for LiteLLM - Provides OpenTelemetry tracing for LLM calls
+"""
+import os
+from dataclasses import dataclass
+from typing import Optional, Dict, Any
+from litellm.integrations.opentelemetry import OpenTelemetry, OpenTelemetryConfig
+from litellm.llms.custom_httpx.http_handler import _get_httpx_client
+
+@dataclass
+class AgentOpsConfig:
+    endpoint: str = "https://otlp.agentops.cloud/v1/traces"
+    api_key: Optional[str] = None
+    service_name: Optional[str] = None
+    deployment_environment: Optional[str] = None
+    auth_endpoint: str = "https://api.agentops.ai/v3/auth/token"
+
+    @classmethod
+    def from_env(cls):
+        return cls(
+            endpoint="https://otlp.agentops.cloud/v1/traces",
+            api_key=os.getenv("AGENTOPS_API_KEY"),
+            service_name=os.getenv("AGENTOPS_SERVICE_NAME", "agentops"),
+            deployment_environment=os.getenv("AGENTOPS_ENVIRONMENT", "production"),
+            auth_endpoint="https://api.agentops.ai/v3/auth/token"
+        )
+
+class AgentOps(OpenTelemetry):
+    """
+    AgentOps integration - built on top of OpenTelemetry
+
+    Example usage:
+        ```python
+        import litellm
+        
+        litellm.success_callback = ["agentops"]
+
+        response = litellm.completion(
+            model="gpt-3.5-turbo",
+            messages=[{"role": "user", "content": "Hello, how are you?"}],
+        )
+        ```
+    """
+    def __init__(
+        self,
+        config: Optional[AgentOpsConfig] = None,
+    ):
+        if config is None:
+            config = AgentOpsConfig.from_env()
+
+        # Prefetch JWT token for authentication
+        jwt_token = None
+        project_id = None
+        if config.api_key:
+            try:
+                response = self._fetch_auth_token(config.api_key, config.auth_endpoint)
+                jwt_token = response.get("token")
+                project_id = response.get("project_id")
+            except Exception:
+                pass
+
+        headers = f"Authorization=Bearer {jwt_token}" if jwt_token else None
+        
+        otel_config = OpenTelemetryConfig(
+            exporter="otlp_http",
+            endpoint=config.endpoint,
+            headers=headers
+        )
+
+        # Initialize OpenTelemetry with our config
+        super().__init__(
+            config=otel_config,
+            callback_name="agentops"
+        )
+
+        # Set AgentOps-specific resource attributes
+        resource_attrs = {
+            "service.name": config.service_name or "litellm",
+            "deployment.environment": config.deployment_environment or "production",
+            "telemetry.sdk.name": "agentops",
+        }
+        
+        if project_id:
+            resource_attrs["project.id"] = project_id
+            
+        self.resource_attributes = resource_attrs
+
+    def _fetch_auth_token(self, api_key: str, auth_endpoint: str) -> Dict[str, Any]:
+        """
+        Fetch JWT authentication token from AgentOps API
+        
+        Args:
+            api_key: AgentOps API key
+            auth_endpoint: Authentication endpoint
+            
+        Returns:
+            Dict containing JWT token and project ID
+        """
+        headers = {
+            "Content-Type": "application/json",
+            "Connection": "keep-alive",
+        }
+        
+        client = _get_httpx_client()
+        try:
+            response = client.post(
+                url=auth_endpoint,
+                headers=headers,
+                json={"api_key": api_key},
+                timeout=10
+            )
+            
+            if response.status_code != 200:
+                raise Exception(f"Failed to fetch auth token: {response.text}")
+            
+            return response.json()
+        finally:
+            client.close() 
--- a/litellm/integrations/arize/_utils.py
+++ b/litellm/integrations/arize/_utils.py
@ -1,3 +1,4 @@
+import json
 from typing import TYPE_CHECKING, Any, Optional, Union

 from litellm._logging import verbose_logger
@ -12,36 +13,141 @@ else:
    Span = Any


-def set_attributes(span: Span, kwargs, response_obj):
+def cast_as_primitive_value_type(value) -> Union[str, bool, int, float]:
+    """
+    Converts a value to an OTEL-supported primitive for Arize/Phoenix observability.
+    """
+    if value is None:
+        return ""
+    if isinstance(value, (str, bool, int, float)):
+        return value
+    try:
+        return str(value)
+    except Exception:
+        return ""
+
+
+def safe_set_attribute(span: Span, key: str, value: Any):
+    """
+    Sets a span attribute safely with OTEL-compliant primitive typing for Arize/Phoenix.
+    """
+    primitive_value = cast_as_primitive_value_type(value)
+    span.set_attribute(key, primitive_value)
+
+
+def set_attributes(span: Span, kwargs, response_obj):  # noqa: PLR0915
+    """
+    Populates span with OpenInference-compliant LLM attributes for Arize and Phoenix tracing.
+    """
    from litellm.integrations._types.open_inference import (
        MessageAttributes,
        OpenInferenceSpanKindValues,
        SpanAttributes,
+        ToolCallAttributes,
    )

    try:
+        optional_params = kwargs.get("optional_params", {})
+        litellm_params = kwargs.get("litellm_params", {})
        standard_logging_payload: Optional[StandardLoggingPayload] = kwargs.get(
            "standard_logging_object"
        )
+        if standard_logging_payload is None:
+            raise ValueError("standard_logging_object not found in kwargs")

        #############################################
        ############ LLM CALL METADATA ##############
        #############################################

-        if standard_logging_payload and (
-            metadata := standard_logging_payload["metadata"]
-        ):
-            span.set_attribute(SpanAttributes.METADATA, safe_dumps(metadata))
+        # Set custom metadata for observability and trace enrichment.
+        metadata = (
+            standard_logging_payload.get("metadata")
+            if standard_logging_payload
+            else None
+        )
+        if metadata is not None:
+            safe_set_attribute(span, SpanAttributes.METADATA, safe_dumps(metadata))

        #############################################
        ########## LLM Request Attributes ###########
        #############################################

-        # The name of the LLM a request is being made to
+        # The name of the LLM a request is being made to.
        if kwargs.get("model"):
-            span.set_attribute(SpanAttributes.LLM_MODEL_NAME, kwargs.get("model"))
+            safe_set_attribute(
+                span,
+                SpanAttributes.LLM_MODEL_NAME,
+                kwargs.get("model"),
+            )

-        span.set_attribute(
+        # The LLM request type.
+        safe_set_attribute(
+            span,
+            "llm.request.type",
+            standard_logging_payload["call_type"],
+        )
+
+        # The Generative AI Provider: Azure, OpenAI, etc.
+        safe_set_attribute(
+            span,
+            SpanAttributes.LLM_PROVIDER,
+            litellm_params.get("custom_llm_provider", "Unknown"),
+        )
+
+        # The maximum number of tokens the LLM generates for a request.
+        if optional_params.get("max_tokens"):
+            safe_set_attribute(
+                span,
+                "llm.request.max_tokens",
+                optional_params.get("max_tokens"),
+            )
+
+        # The temperature setting for the LLM request.
+        if optional_params.get("temperature"):
+            safe_set_attribute(
+                span,
+                "llm.request.temperature",
+                optional_params.get("temperature"),
+            )
+
+        # The top_p sampling setting for the LLM request.
+        if optional_params.get("top_p"):
+            safe_set_attribute(
+                span,
+                "llm.request.top_p",
+                optional_params.get("top_p"),
+            )
+
+        # Indicates whether response is streamed.
+        safe_set_attribute(
+            span,
+            "llm.is_streaming",
+            str(optional_params.get("stream", False)),
+        )
+
+        # Logs the user ID if present.
+        if optional_params.get("user"):
+            safe_set_attribute(
+                span,
+                "llm.user",
+                optional_params.get("user"),
+            )
+
+        # The unique identifier for the completion.
+        if response_obj and response_obj.get("id"):
+            safe_set_attribute(span, "llm.response.id", response_obj.get("id"))
+
+        # The model used to generate the response.
+        if response_obj and response_obj.get("model"):
+            safe_set_attribute(
+                span,
+                "llm.response.model",
+                response_obj.get("model"),
+            )
+
+        # Required by OpenInference to mark span as LLM kind.
+        safe_set_attribute(
+            span,
            SpanAttributes.OPENINFERENCE_SPAN_KIND,
            OpenInferenceSpanKindValues.LLM.value,
        )
@ -50,77 +156,132 @@ def set_attributes(span: Span, kwargs, response_obj):
        # for /chat/completions
        # https://docs.arize.com/arize/large-language-models/tracing/semantic-conventions
        if messages:
-            span.set_attribute(
+            last_message = messages[-1]
+            safe_set_attribute(
+                span,
                SpanAttributes.INPUT_VALUE,
-                messages[-1].get("content", ""),  # get the last message for input
+                last_message.get("content", ""),
            )

-            # LLM_INPUT_MESSAGES shows up under `input_messages` tab on the span page
+            # LLM_INPUT_MESSAGES shows up under `input_messages` tab on the span page.
            for idx, msg in enumerate(messages):
-                # Set the role per message
-                span.set_attribute(
-                    f"{SpanAttributes.LLM_INPUT_MESSAGES}.{idx}.{MessageAttributes.MESSAGE_ROLE}",
-                    msg["role"],
+                prefix = f"{SpanAttributes.LLM_INPUT_MESSAGES}.{idx}"
+                # Set the role per message.
+                safe_set_attribute(
+                    span, f"{prefix}.{MessageAttributes.MESSAGE_ROLE}", msg.get("role")
                )
-                # Set the content per message
-                span.set_attribute(
-                    f"{SpanAttributes.LLM_INPUT_MESSAGES}.{idx}.{MessageAttributes.MESSAGE_CONTENT}",
+                # Set the content per message.
+                safe_set_attribute(
+                    span,
+                    f"{prefix}.{MessageAttributes.MESSAGE_CONTENT}",
                    msg.get("content", ""),
                )

-        if standard_logging_payload and (
-            model_params := standard_logging_payload["model_parameters"]
-        ):
+        # Capture tools (function definitions) used in the LLM call.
+        tools = optional_params.get("tools")
+        if tools:
+            for idx, tool in enumerate(tools):
+                function = tool.get("function")
+                if not function:
+                    continue
+                prefix = f"{SpanAttributes.LLM_TOOLS}.{idx}"
+                safe_set_attribute(
+                    span, f"{prefix}.{SpanAttributes.TOOL_NAME}", function.get("name")
+                )
+                safe_set_attribute(
+                    span,
+                    f"{prefix}.{SpanAttributes.TOOL_DESCRIPTION}",
+                    function.get("description"),
+                )
+                safe_set_attribute(
+                    span,
+                    f"{prefix}.{SpanAttributes.TOOL_PARAMETERS}",
+                    json.dumps(function.get("parameters")),
+                )
+
+        # Capture tool calls made during function-calling LLM flows.
+        functions = optional_params.get("functions")
+        if functions:
+            for idx, function in enumerate(functions):
+                prefix = f"{MessageAttributes.MESSAGE_TOOL_CALLS}.{idx}"
+                safe_set_attribute(
+                    span,
+                    f"{prefix}.{ToolCallAttributes.TOOL_CALL_FUNCTION_NAME}",
+                    function.get("name"),
+                )
+
+        # Capture invocation parameters and user ID if available.
+        model_params = (
+            standard_logging_payload.get("model_parameters")
+            if standard_logging_payload
+            else None
+        )
+        if model_params:
            # The Generative AI Provider: Azure, OpenAI, etc.
-            span.set_attribute(
-                SpanAttributes.LLM_INVOCATION_PARAMETERS, safe_dumps(model_params)
+            safe_set_attribute(
+                span,
+                SpanAttributes.LLM_INVOCATION_PARAMETERS,
+                safe_dumps(model_params),
            )

            if model_params.get("user"):
                user_id = model_params.get("user")
                if user_id is not None:
-                    span.set_attribute(SpanAttributes.USER_ID, user_id)
+                    safe_set_attribute(span, SpanAttributes.USER_ID, user_id)

        #############################################
        ########## LLM Response Attributes ##########
-        # https://docs.arize.com/arize/large-language-models/tracing/semantic-conventions
        #############################################
-        if hasattr(response_obj, "get"):
-            for choice in response_obj.get("choices", []):
-                response_message = choice.get("message", {})
-                span.set_attribute(
-                    SpanAttributes.OUTPUT_VALUE, response_message.get("content", "")
-                )

-                # This shows up under `output_messages` tab on the span page
-                # This code assumes a single response
-                span.set_attribute(
-                    f"{SpanAttributes.LLM_OUTPUT_MESSAGES}.0.{MessageAttributes.MESSAGE_ROLE}",
-                    response_message.get("role"),
-                )
-                span.set_attribute(
-                    f"{SpanAttributes.LLM_OUTPUT_MESSAGES}.0.{MessageAttributes.MESSAGE_CONTENT}",
+        # Captures response tokens, message, and content.
+        if hasattr(response_obj, "get"):
+            for idx, choice in enumerate(response_obj.get("choices", [])):
+                response_message = choice.get("message", {})
+                safe_set_attribute(
+                    span,
+                    SpanAttributes.OUTPUT_VALUE,
                    response_message.get("content", ""),
                )

-            usage = response_obj.get("usage")
+                # This shows up under `output_messages` tab on the span page.
+                prefix = f"{SpanAttributes.LLM_OUTPUT_MESSAGES}.{idx}"
+                safe_set_attribute(
+                    span,
+                    f"{prefix}.{MessageAttributes.MESSAGE_ROLE}",
+                    response_message.get("role"),
+                )
+                safe_set_attribute(
+                    span,
+                    f"{prefix}.{MessageAttributes.MESSAGE_CONTENT}",
+                    response_message.get("content", ""),
+                )
+
+            # Token usage info.
+            usage = response_obj and response_obj.get("usage")
            if usage:
-                span.set_attribute(
+                safe_set_attribute(
+                    span,
                    SpanAttributes.LLM_TOKEN_COUNT_TOTAL,
                    usage.get("total_tokens"),
                )

                # The number of tokens used in the LLM response (completion).
-                span.set_attribute(
+                safe_set_attribute(
+                    span,
                    SpanAttributes.LLM_TOKEN_COUNT_COMPLETION,
                    usage.get("completion_tokens"),
                )

                # The number of tokens used in the LLM prompt.
-                span.set_attribute(
+                safe_set_attribute(
+                    span,
                    SpanAttributes.LLM_TOKEN_COUNT_PROMPT,
                    usage.get("prompt_tokens"),
                )
-        pass
+
    except Exception as e:
-        verbose_logger.error(f"Error setting arize attributes: {e}")
+        verbose_logger.error(
+            f"[Arize/Phoenix] Failed to set OpenInference span attributes: {e}"
+        )
+        if hasattr(span, "record_exception"):
+            span.record_exception(e)
--- a/litellm/integrations/datadog/datadog_llm_obs.py
+++ b/litellm/integrations/datadog/datadog_llm_obs.py
@ -13,10 +13,15 @@ import uuid
 from datetime import datetime
 from typing import Any, Dict, List, Optional, Union

+import httpx
+
 import litellm
 from litellm._logging import verbose_logger
 from litellm.integrations.custom_batch_logger import CustomBatchLogger
 from litellm.integrations.datadog.datadog import DataDogLogger
+from litellm.litellm_core_utils.prompt_templates.common_utils import (
+    handle_any_messages_to_chat_completion_str_messages_conversion,
+)
 from litellm.llms.custom_httpx.http_handler import (
    get_async_httpx_client,
    httpxSpecialProvider,
@ -106,7 +111,6 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
                },
            )

-            response.raise_for_status()
            if response.status_code != 202:
                raise Exception(
                    f"DataDogLLMObs: Unexpected response - status_code: {response.status_code}, text: {response.text}"
@ -116,6 +120,10 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
                f"DataDogLLMObs: Successfully sent batch - status_code: {response.status_code}"
            )
            self.log_queue.clear()
+        except httpx.HTTPStatusError as e:
+            verbose_logger.exception(
+                f"DataDogLLMObs: Error sending batch - {e.response.text}"
+            )
        except Exception as e:
            verbose_logger.exception(f"DataDogLLMObs: Error sending batch - {str(e)}")

@ -133,7 +141,11 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):

        metadata = kwargs.get("litellm_params", {}).get("metadata", {})

-        input_meta = InputMeta(messages=messages)  # type: ignore
+        input_meta = InputMeta(
+            messages=handle_any_messages_to_chat_completion_str_messages_conversion(
+                messages
+            )
+        )
        output_meta = OutputMeta(messages=self._get_response_messages(response_obj))

        meta = Meta(
--- a/litellm/litellm_core_utils/exception_mapping_utils.py
+++ b/litellm/litellm_core_utils/exception_mapping_utils.py
@ -311,6 +311,9 @@ def exception_type(  # type: ignore  # noqa: PLR0915
                elif (
                    "invalid_request_error" in error_str
                    and "content_policy_violation" in error_str
+                ) or (
+                    "Invalid prompt" in error_str
+                    and "violating our usage policy" in error_str
                ):
                    exception_mapping_worked = True
                    raise ContentPolicyViolationError(
--- a/litellm/litellm_core_utils/get_supported_openai_params.py
+++ b/litellm/litellm_core_utils/get_supported_openai_params.py
@ -221,6 +221,8 @@ def get_supported_openai_params(  # noqa: PLR0915
        return litellm.PredibaseConfig().get_supported_openai_params(model=model)
    elif custom_llm_provider == "voyage":
        return litellm.VoyageEmbeddingConfig().get_supported_openai_params(model=model)
+    elif custom_llm_provider == "infinity":
+        return litellm.InfinityEmbeddingConfig().get_supported_openai_params(model=model)
    elif custom_llm_provider == "triton":
        if request_type == "embeddings":
            return litellm.TritonEmbeddingConfig().get_supported_openai_params(
--- a/litellm/litellm_core_utils/litellm_logging.py
+++ b/litellm/litellm_core_utils/litellm_logging.py
@ -28,6 +28,7 @@ from litellm._logging import _is_debugging_on, verbose_logger
 from litellm.batches.batch_utils import _handle_completed_batch
 from litellm.caching.caching import DualCache, InMemoryCache
 from litellm.caching.caching_handler import LLMCachingHandler
+
 from litellm.constants import (
    DEFAULT_MOCK_RESPONSE_COMPLETION_TOKEN_COUNT,
    DEFAULT_MOCK_RESPONSE_PROMPT_TOKEN_COUNT,
@ -36,6 +37,7 @@ from litellm.cost_calculator import (
    RealtimeAPITokenUsageProcessor,
    _select_model_name_for_cost_calc,
 )
+from litellm.integrations.agentops import AgentOps
 from litellm.integrations.anthropic_cache_control_hook import AnthropicCacheControlHook
 from litellm.integrations.arize.arize import ArizeLogger
 from litellm.integrations.custom_guardrail import CustomGuardrail
@ -2685,7 +2687,15 @@ def _init_custom_logger_compatible_class(  # noqa: PLR0915
    """
    try:
        custom_logger_init_args = custom_logger_init_args or {}
-        if logging_integration == "lago":
+        if logging_integration == "agentops":  # Add AgentOps initialization
+            for callback in _in_memory_loggers:
+                if isinstance(callback, AgentOps):
+                    return callback  # type: ignore
+
+            agentops_logger = AgentOps()
+            _in_memory_loggers.append(agentops_logger)
+            return agentops_logger  # type: ignore
+        elif logging_integration == "lago":
            for callback in _in_memory_loggers:
                if isinstance(callback, LagoLogger):
                    return callback  # type: ignore
--- a/litellm/litellm_core_utils/model_param_helper.py
+++ b/litellm/litellm_core_utils/model_param_helper.py
@ -75,6 +75,10 @@ class ModelParamHelper:
        combined_kwargs = combined_kwargs.difference(exclude_kwargs)
        return combined_kwargs

+    @staticmethod
+    def get_litellm_provider_specific_params_for_chat_params() -> Set[str]:
+        return set(["thinking"])
+
    @staticmethod
    def _get_litellm_supported_chat_completion_kwargs() -> Set[str]:
        """
@ -82,11 +86,18 @@ class ModelParamHelper:

        This follows the OpenAI API Spec
        """
-        all_chat_completion_kwargs = set(
+        non_streaming_params: Set[str] = set(
            getattr(CompletionCreateParamsNonStreaming, "__annotations__", {}).keys()
-        ).union(
-            set(getattr(CompletionCreateParamsStreaming, "__annotations__", {}).keys())
        )
+        streaming_params: Set[str] = set(
+            getattr(CompletionCreateParamsStreaming, "__annotations__", {}).keys()
+        )
+        litellm_provider_specific_params: Set[str] = (
+            ModelParamHelper.get_litellm_provider_specific_params_for_chat_params()
+        )
+        all_chat_completion_kwargs: Set[str] = non_streaming_params.union(
+            streaming_params
+        ).union(litellm_provider_specific_params)
        return all_chat_completion_kwargs

    @staticmethod
--- a/litellm/litellm_core_utils/prompt_templates/common_utils.py
+++ b/litellm/litellm_core_utils/prompt_templates/common_utils.py
@ -6,7 +6,7 @@ import io
 import mimetypes
 import re
 from os import PathLike
-from typing import Dict, List, Literal, Mapping, Optional, Union, cast
+from typing import Any, Dict, List, Literal, Mapping, Optional, Union, cast

 from litellm.types.llms.openai import (
    AllMessageValues,
@ -32,6 +32,35 @@ DEFAULT_ASSISTANT_CONTINUE_MESSAGE = ChatCompletionAssistantMessage(
 )


+def handle_any_messages_to_chat_completion_str_messages_conversion(
+    messages: Any,
+) -> List[Dict[str, str]]:
+    """
+    Handles any messages to chat completion str messages conversion
+
+    Relevant Issue: https://github.com/BerriAI/litellm/issues/9494
+    """
+    import json
+
+    if isinstance(messages, list):
+        try:
+            return cast(
+                List[Dict[str, str]],
+                handle_messages_with_content_list_to_str_conversion(messages),
+            )
+        except Exception:
+            return [{"input": json.dumps(message, default=str)} for message in messages]
+    elif isinstance(messages, dict):
+        try:
+            return [{"input": json.dumps(messages, default=str)}]
+        except Exception:
+            return [{"input": str(messages)}]
+    elif isinstance(messages, str):
+        return [{"input": messages}]
+    else:
+        return [{"input": str(messages)}]
+
+
 def handle_messages_with_content_list_to_str_conversion(
    messages: List[AllMessageValues],
 ) -> List[AllMessageValues]:
--- a/litellm/llms/anthropic/experimental_pass_through/messages/handler.py
+++ b/litellm/llms/anthropic/experimental_pass_through/messages/handler.py
@ -43,7 +43,9 @@ class AnthropicMessagesHandler:
        from litellm.proxy.pass_through_endpoints.success_handler import (
            PassThroughEndpointLogging,
        )
-        from litellm.proxy.pass_through_endpoints.types import EndpointType
+        from litellm.types.passthrough_endpoints.pass_through_endpoints import (
+            EndpointType,
+        )

        # Create success handler object
        passthrough_success_handler_obj = PassThroughEndpointLogging()
@ -98,12 +100,12 @@ async def anthropic_messages(
        api_base=optional_params.api_base,
        api_key=optional_params.api_key,
    )
-    anthropic_messages_provider_config: Optional[
-        BaseAnthropicMessagesConfig
-    ] = ProviderConfigManager.get_provider_anthropic_messages_config(
+    anthropic_messages_provider_config: Optional[BaseAnthropicMessagesConfig] = (
+        ProviderConfigManager.get_provider_anthropic_messages_config(
            model=model,
            provider=litellm.LlmProviders(_custom_llm_provider),
        )
+    )
    if anthropic_messages_provider_config is None:
        raise ValueError(
            f"Anthropic messages provider config not found for model: {model}"
--- a/litellm/llms/azure/responses/transformation.py
+++ b/litellm/llms/azure/responses/transformation.py
@ -1,11 +1,14 @@
-from typing import TYPE_CHECKING, Any, Optional, cast
+from typing import TYPE_CHECKING, Any, Dict, Optional, Tuple, cast

 import httpx

 import litellm
+from litellm._logging import verbose_logger
 from litellm.llms.openai.responses.transformation import OpenAIResponsesAPIConfig
 from litellm.secret_managers.main import get_secret_str
 from litellm.types.llms.openai import *
+from litellm.types.responses.main import *
+from litellm.types.router import GenericLiteLLMParams
 from litellm.utils import _add_path_to_api_base

 if TYPE_CHECKING:
@ -41,11 +44,7 @@ class AzureOpenAIResponsesAPIConfig(OpenAIResponsesAPIConfig):
    def get_complete_url(
        self,
        api_base: Optional[str],
-        api_key: Optional[str],
-        model: str,
-        optional_params: dict,
        litellm_params: dict,
-        stream: Optional[bool] = None,
    ) -> str:
        """
        Constructs a complete URL for the API request.
@ -92,3 +91,82 @@ class AzureOpenAIResponsesAPIConfig(OpenAIResponsesAPIConfig):
        final_url = httpx.URL(new_url).copy_with(params=query_params)

        return str(final_url)
+
+    #########################################################
+    ########## DELETE RESPONSE API TRANSFORMATION ##############
+    #########################################################
+    def _construct_url_for_response_id_in_path(
+        self, api_base: str, response_id: str
+    ) -> str:
+        """
+        Constructs a URL for the API request with the response_id in the path.
+        """
+        from urllib.parse import urlparse, urlunparse
+
+        # Parse the URL to separate its components
+        parsed_url = urlparse(api_base)
+
+        # Insert the response_id at the end of the path component
+        # Remove trailing slash if present to avoid double slashes
+        path = parsed_url.path.rstrip("/")
+        new_path = f"{path}/{response_id}"
+
+        # Reconstruct the URL with all original components but with the modified path
+        constructed_url = urlunparse(
+            (
+                parsed_url.scheme,  # http, https
+                parsed_url.netloc,  # domain name, port
+                new_path,  # path with response_id added
+                parsed_url.params,  # parameters
+                parsed_url.query,  # query string
+                parsed_url.fragment,  # fragment
+            )
+        )
+        return constructed_url
+
+    def transform_delete_response_api_request(
+        self,
+        response_id: str,
+        api_base: str,
+        litellm_params: GenericLiteLLMParams,
+        headers: dict,
+    ) -> Tuple[str, Dict]:
+        """
+        Transform the delete response API request into a URL and data
+
+        Azure OpenAI API expects the following request:
+        - DELETE /openai/responses/{response_id}?api-version=xxx
+
+        This function handles URLs with query parameters by inserting the response_id
+        at the correct location (before any query parameters).
+        """
+        delete_url = self._construct_url_for_response_id_in_path(
+            api_base=api_base, response_id=response_id
+        )
+
+        data: Dict = {}
+        verbose_logger.debug(f"delete response url={delete_url}")
+        return delete_url, data
+
+    #########################################################
+    ########## GET RESPONSE API TRANSFORMATION ###############
+    #########################################################
+    def transform_get_response_api_request(
+        self,
+        response_id: str,
+        api_base: str,
+        litellm_params: GenericLiteLLMParams,
+        headers: dict,
+    ) -> Tuple[str, Dict]:
+        """
+        Transform the get response API request into a URL and data
+
+        OpenAI API expects the following request
+        - GET /v1/responses/{response_id}
+        """
+        get_url = self._construct_url_for_response_id_in_path(
+            api_base=api_base, response_id=response_id
+        )
+        data: Dict = {}
+        verbose_logger.debug(f"get response url={get_url}")
+        return get_url, data
--- a/litellm/llms/azure_ai/chat/transformation.py
+++ b/litellm/llms/azure_ai/chat/transformation.py
@ -1,3 +1,4 @@
+import enum
 from typing import Any, List, Optional, Tuple, cast
 from urllib.parse import urlparse

@ -19,6 +20,10 @@ from litellm.types.utils import ModelResponse, ProviderField
 from litellm.utils import _add_path_to_api_base, supports_tool_choice


+class AzureFoundryErrorStrings(str, enum.Enum):
+    SET_EXTRA_PARAMETERS_TO_PASS_THROUGH = "Set extra-parameters to 'pass-through'"
+
+
 class AzureAIStudioConfig(OpenAIConfig):
    def get_supported_openai_params(self, model: str) -> List:
        model_supports_tool_choice = True  # azure ai supports this by default
@ -240,12 +245,18 @@ class AzureAIStudioConfig(OpenAIConfig):
    ) -> bool:
        should_drop_params = litellm_params.get("drop_params") or litellm.drop_params
        error_text = e.response.text
+
        if should_drop_params and "Extra inputs are not permitted" in error_text:
            return True
        elif (
            "unknown field: parameter index is not a valid field" in error_text
        ):  # remove index from tool calls
            return True
+        elif (
+            AzureFoundryErrorStrings.SET_EXTRA_PARAMETERS_TO_PASS_THROUGH.value
+            in error_text
+        ):  # remove extra-parameters from tool calls
+            return True
        return super().should_retry_llm_api_inside_llm_translation_on_http_error(
            e=e, litellm_params=litellm_params
        )
@ -265,5 +276,46 @@ class AzureAIStudioConfig(OpenAIConfig):
            litellm.remove_index_from_tool_calls(
                messages=_messages,
            )
+        elif (
+            AzureFoundryErrorStrings.SET_EXTRA_PARAMETERS_TO_PASS_THROUGH.value
+            in e.response.text
+        ):
+            request_data = self._drop_extra_params_from_request_data(
+                request_data, e.response.text
+            )
        data = drop_params_from_unprocessable_entity_error(e=e, data=request_data)
        return data
+
+    def _drop_extra_params_from_request_data(
+        self, request_data: dict, error_text: str
+    ) -> dict:
+        params_to_drop = self._extract_params_to_drop_from_error_text(error_text)
+        if params_to_drop:
+            for param in params_to_drop:
+                if param in request_data:
+                    request_data.pop(param, None)
+        return request_data
+
+    def _extract_params_to_drop_from_error_text(
+        self, error_text: str
+    ) -> Optional[List[str]]:
+        """
+        Error text looks like this"
+            "Extra parameters ['stream_options', 'extra-parameters'] are not allowed when extra-parameters is not set or set to be 'error'.
+        """
+        import re
+
+        # Extract parameters within square brackets
+        match = re.search(r"\[(.*?)\]", error_text)
+        if not match:
+            return []
+
+        # Parse the extracted string into a list of parameter names
+        params_str = match.group(1)
+        params = []
+        for param in params_str.split(","):
+            # Clean up the parameter name (remove quotes, spaces)
+            clean_param = param.strip().strip("'").strip('"')
+            if clean_param:
+                params.append(clean_param)
+        return params
--- a/litellm/llms/base_llm/responses/transformation.py
+++ b/litellm/llms/base_llm/responses/transformation.py
@ -1,6 +1,6 @@
 import types
 from abc import ABC, abstractmethod
-from typing import TYPE_CHECKING, Any, Dict, Optional, Union
+from typing import TYPE_CHECKING, Any, Dict, Optional, Tuple, Union

 import httpx

@ -10,6 +10,7 @@ from litellm.types.llms.openai import (
    ResponsesAPIResponse,
    ResponsesAPIStreamingResponse,
 )
+from litellm.types.responses.main import *
 from litellm.types.router import GenericLiteLLMParams

 if TYPE_CHECKING:
@ -73,11 +74,7 @@ class BaseResponsesAPIConfig(ABC):
    def get_complete_url(
        self,
        api_base: Optional[str],
-        api_key: Optional[str],
-        model: str,
-        optional_params: dict,
        litellm_params: dict,
-        stream: Optional[bool] = None,
    ) -> str:
        """
        OPTIONAL
@ -122,6 +119,56 @@ class BaseResponsesAPIConfig(ABC):
        """
        pass

+    #########################################################
+    ########## DELETE RESPONSE API TRANSFORMATION ##############
+    #########################################################
+    @abstractmethod
+    def transform_delete_response_api_request(
+        self,
+        response_id: str,
+        api_base: str,
+        litellm_params: GenericLiteLLMParams,
+        headers: dict,
+    ) -> Tuple[str, Dict]:
+        pass
+
+    @abstractmethod
+    def transform_delete_response_api_response(
+        self,
+        raw_response: httpx.Response,
+        logging_obj: LiteLLMLoggingObj,
+    ) -> DeleteResponseResult:
+        pass
+
+    #########################################################
+    ########## END DELETE RESPONSE API TRANSFORMATION #######
+    #########################################################
+
+    #########################################################
+    ########## GET RESPONSE API TRANSFORMATION ###############
+    #########################################################
+    @abstractmethod
+    def transform_get_response_api_request(
+        self,
+        response_id: str,
+        api_base: str,
+        litellm_params: GenericLiteLLMParams,
+        headers: dict,
+    ) -> Tuple[str, Dict]:
+        pass
+    
+    @abstractmethod
+    def transform_get_response_api_response(
+        self,
+        raw_response: httpx.Response,
+        logging_obj: LiteLLMLoggingObj,
+    ) -> ResponsesAPIResponse:
+        pass
+
+    #########################################################
+    ########## END GET RESPONSE API TRANSFORMATION ##########
+    #########################################################
+    
    def get_error_class(
        self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
    ) -> BaseLLMException:
--- a/litellm/llms/bedrock/chat/converse_transformation.py
+++ b/litellm/llms/bedrock/chat/converse_transformation.py
@ -107,6 +107,15 @@ class AmazonConverseConfig(BaseConfig):
            "response_format",
        ]

+        if (
+            "arn" in model
+        ):  # we can't infer the model from the arn, so just add all params
+            supported_params.append("tools")
+            supported_params.append("tool_choice")
+            supported_params.append("thinking")
+            supported_params.append("reasoning_effort")
+            return supported_params
+
        ## Filter out 'cross-region' from model name
        base_model = BedrockModelInfo.get_base_model(model)

@ -376,25 +385,27 @@ class AmazonConverseConfig(BaseConfig):
        system_content_blocks: List[SystemContentBlock] = []
        for idx, message in enumerate(messages):
            if message["role"] == "system":
-                _system_content_block: Optional[SystemContentBlock] = None
-                _cache_point_block: Optional[SystemContentBlock] = None
-                if isinstance(message["content"], str) and len(message["content"]) > 0:
-                    _system_content_block = SystemContentBlock(text=message["content"])
-                    _cache_point_block = self._get_cache_point_block(
+                system_prompt_indices.append(idx)
+                if isinstance(message["content"], str) and message["content"]:
+                    system_content_blocks.append(
+                        SystemContentBlock(text=message["content"])
+                    )
+                    cache_block = self._get_cache_point_block(
                        message, block_type="system"
                    )
+                    if cache_block:
+                        system_content_blocks.append(cache_block)
                elif isinstance(message["content"], list):
                    for m in message["content"]:
-                        if m.get("type", "") == "text" and len(m["text"]) > 0:
-                            _system_content_block = SystemContentBlock(text=m["text"])
-                            _cache_point_block = self._get_cache_point_block(
+                        if m.get("type") == "text" and m.get("text"):
+                            system_content_blocks.append(
+                                SystemContentBlock(text=m["text"])
+                            )
+                            cache_block = self._get_cache_point_block(
                                m, block_type="system"
                            )
-                if _system_content_block is not None:
-                    system_content_blocks.append(_system_content_block)
-                if _cache_point_block is not None:
-                    system_content_blocks.append(_cache_point_block)
-                system_prompt_indices.append(idx)
+                            if cache_block:
+                                system_content_blocks.append(cache_block)
        if len(system_prompt_indices) > 0:
            for idx in reversed(system_prompt_indices):
                messages.pop(idx)
--- a/litellm/llms/custom_httpx/http_handler.py
+++ b/litellm/llms/custom_httpx/http_handler.py
@ -650,6 +650,49 @@ class HTTPHandler:
        except Exception as e:
            raise e

+    def delete(
+        self,
+        url: str,
+        data: Optional[Union[dict, str]] = None,  # type: ignore
+        json: Optional[dict] = None,
+        params: Optional[dict] = None,
+        headers: Optional[dict] = None,
+        timeout: Optional[Union[float, httpx.Timeout]] = None,
+        stream: bool = False,
+    ):
+        try:
+            if timeout is not None:
+                req = self.client.build_request(
+                    "DELETE", url, data=data, json=json, params=params, headers=headers, timeout=timeout  # type: ignore
+                )
+            else:
+                req = self.client.build_request(
+                    "DELETE", url, data=data, json=json, params=params, headers=headers  # type: ignore
+                )
+            response = self.client.send(req, stream=stream)
+            response.raise_for_status()
+            return response
+        except httpx.TimeoutException:
+            raise litellm.Timeout(
+                message=f"Connection timed out after {timeout} seconds.",
+                model="default-model-name",
+                llm_provider="litellm-httpx-handler",
+            )
+        except httpx.HTTPStatusError as e:
+            if stream is True:
+                setattr(e, "message", mask_sensitive_info(e.response.read()))
+                setattr(e, "text", mask_sensitive_info(e.response.read()))
+            else:
+                error_text = mask_sensitive_info(e.response.text)
+                setattr(e, "message", error_text)
+                setattr(e, "text", error_text)
+
+            setattr(e, "status_code", e.response.status_code)
+
+            raise e
+        except Exception as e:
+            raise e
+
    def __del__(self) -> None:
        try:
            self.close()
--- a/litellm/llms/custom_httpx/llm_http_handler.py
+++ b/litellm/llms/custom_httpx/llm_http_handler.py
@ -36,6 +36,7 @@ from litellm.types.llms.openai import (
    ResponsesAPIResponse,
 )
 from litellm.types.rerank import OptionalRerankParams, RerankResponse
+from litellm.types.responses.main import DeleteResponseResult
 from litellm.types.router import GenericLiteLLMParams
 from litellm.types.utils import EmbeddingResponse, FileTypes, TranscriptionResponse
 from litellm.utils import CustomStreamWrapper, ModelResponse, ProviderConfigManager
@ -1015,6 +1016,7 @@ class BaseLLMHTTPHandler:
        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
        _is_async: bool = False,
        fake_stream: bool = False,
+        litellm_metadata: Optional[Dict[str, Any]] = None,
    ) -> Union[
        ResponsesAPIResponse,
        BaseResponsesAPIStreamingIterator,
@ -1041,6 +1043,7 @@ class BaseLLMHTTPHandler:
                timeout=timeout,
                client=client if isinstance(client, AsyncHTTPHandler) else None,
                fake_stream=fake_stream,
+                litellm_metadata=litellm_metadata,
            )

        if client is None or not isinstance(client, HTTPHandler):
@ -1064,11 +1067,7 @@ class BaseLLMHTTPHandler:

        api_base = responses_api_provider_config.get_complete_url(
            api_base=litellm_params.api_base,
-            api_key=litellm_params.api_key,
-            model=model,
-            optional_params=response_api_optional_request_params,
            litellm_params=dict(litellm_params),
-            stream=stream,
        )

        data = responses_api_provider_config.transform_responses_api_request(
@ -1113,6 +1112,8 @@ class BaseLLMHTTPHandler:
                        model=model,
                        logging_obj=logging_obj,
                        responses_api_provider_config=responses_api_provider_config,
+                        litellm_metadata=litellm_metadata,
+                        custom_llm_provider=custom_llm_provider,
                    )

                return SyncResponsesAPIStreamingIterator(
@ -1120,6 +1121,8 @@ class BaseLLMHTTPHandler:
                    model=model,
                    logging_obj=logging_obj,
                    responses_api_provider_config=responses_api_provider_config,
+                    litellm_metadata=litellm_metadata,
+                    custom_llm_provider=custom_llm_provider,
                )
            else:
                # For non-streaming requests
@ -1156,6 +1159,7 @@ class BaseLLMHTTPHandler:
        timeout: Optional[Union[float, httpx.Timeout]] = None,
        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
        fake_stream: bool = False,
+        litellm_metadata: Optional[Dict[str, Any]] = None,
    ) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]:
        """
        Async version of the responses API handler.
@ -1183,11 +1187,7 @@ class BaseLLMHTTPHandler:

        api_base = responses_api_provider_config.get_complete_url(
            api_base=litellm_params.api_base,
-            api_key=litellm_params.api_key,
-            model=model,
-            optional_params=response_api_optional_request_params,
            litellm_params=dict(litellm_params),
-            stream=stream,
        )

        data = responses_api_provider_config.transform_responses_api_request(
@ -1234,6 +1234,8 @@ class BaseLLMHTTPHandler:
                        model=model,
                        logging_obj=logging_obj,
                        responses_api_provider_config=responses_api_provider_config,
+                        litellm_metadata=litellm_metadata,
+                        custom_llm_provider=custom_llm_provider,
                    )

                # Return the streaming iterator
@ -1242,6 +1244,8 @@ class BaseLLMHTTPHandler:
                    model=model,
                    logging_obj=logging_obj,
                    responses_api_provider_config=responses_api_provider_config,
+                    litellm_metadata=litellm_metadata,
+                    custom_llm_provider=custom_llm_provider,
                )
            else:
                # For non-streaming, proceed as before
@ -1265,6 +1269,319 @@ class BaseLLMHTTPHandler:
            logging_obj=logging_obj,
        )

+    async def async_delete_response_api_handler(
+        self,
+        response_id: str,
+        responses_api_provider_config: BaseResponsesAPIConfig,
+        litellm_params: GenericLiteLLMParams,
+        logging_obj: LiteLLMLoggingObj,
+        custom_llm_provider: Optional[str],
+        extra_headers: Optional[Dict[str, Any]] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+        timeout: Optional[Union[float, httpx.Timeout]] = None,
+        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
+        _is_async: bool = False,
+    ) -> DeleteResponseResult:
+        """
+        Async version of the delete response API handler.
+        Uses async HTTP client to make requests.
+        """
+        if client is None or not isinstance(client, AsyncHTTPHandler):
+            async_httpx_client = get_async_httpx_client(
+                llm_provider=litellm.LlmProviders(custom_llm_provider),
+                params={"ssl_verify": litellm_params.get("ssl_verify", None)},
+            )
+        else:
+            async_httpx_client = client
+
+        headers = responses_api_provider_config.validate_environment(
+            api_key=litellm_params.api_key,
+            headers=extra_headers or {},
+            model="None",
+        )
+
+        if extra_headers:
+            headers.update(extra_headers)
+
+        api_base = responses_api_provider_config.get_complete_url(
+            api_base=litellm_params.api_base,
+            litellm_params=dict(litellm_params),
+        )
+
+        url, data = responses_api_provider_config.transform_delete_response_api_request(
+            response_id=response_id,
+            api_base=api_base,
+            litellm_params=litellm_params,
+            headers=headers,
+        )
+
+        ## LOGGING
+        logging_obj.pre_call(
+            input=input,
+            api_key="",
+            additional_args={
+                "complete_input_dict": data,
+                "api_base": api_base,
+                "headers": headers,
+            },
+        )
+
+        try:
+            response = await async_httpx_client.delete(
+                url=url, headers=headers, data=json.dumps(data), timeout=timeout
+            )
+
+        except Exception as e:
+            raise self._handle_error(
+                e=e,
+                provider_config=responses_api_provider_config,
+            )
+
+        return responses_api_provider_config.transform_delete_response_api_response(
+            raw_response=response,
+            logging_obj=logging_obj,
+        )
+
+    def delete_response_api_handler(
+        self,
+        response_id: str,
+        responses_api_provider_config: BaseResponsesAPIConfig,
+        litellm_params: GenericLiteLLMParams,
+        logging_obj: LiteLLMLoggingObj,
+        custom_llm_provider: Optional[str],
+        extra_headers: Optional[Dict[str, Any]] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+        timeout: Optional[Union[float, httpx.Timeout]] = None,
+        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
+        _is_async: bool = False,
+    ) -> Union[DeleteResponseResult, Coroutine[Any, Any, DeleteResponseResult]]:
+        """
+        Async version of the responses API handler.
+        Uses async HTTP client to make requests.
+        """
+        if _is_async:
+            return self.async_delete_response_api_handler(
+                response_id=response_id,
+                responses_api_provider_config=responses_api_provider_config,
+                litellm_params=litellm_params,
+                logging_obj=logging_obj,
+                custom_llm_provider=custom_llm_provider,
+                extra_headers=extra_headers,
+                extra_body=extra_body,
+                timeout=timeout,
+                client=client,
+            )
+        if client is None or not isinstance(client, HTTPHandler):
+            sync_httpx_client = _get_httpx_client(
+                params={"ssl_verify": litellm_params.get("ssl_verify", None)}
+            )
+        else:
+            sync_httpx_client = client
+
+        headers = responses_api_provider_config.validate_environment(
+            api_key=litellm_params.api_key,
+            headers=extra_headers or {},
+            model="None",
+        )
+
+        if extra_headers:
+            headers.update(extra_headers)
+
+        api_base = responses_api_provider_config.get_complete_url(
+            api_base=litellm_params.api_base,
+            litellm_params=dict(litellm_params),
+        )
+
+        url, data = responses_api_provider_config.transform_delete_response_api_request(
+            response_id=response_id,
+            api_base=api_base,
+            litellm_params=litellm_params,
+            headers=headers,
+        )
+
+        ## LOGGING
+        logging_obj.pre_call(
+            input=input,
+            api_key="",
+            additional_args={
+                "complete_input_dict": data,
+                "api_base": api_base,
+                "headers": headers,
+            },
+        )
+
+        try:
+            response = sync_httpx_client.delete(
+                url=url, headers=headers, data=json.dumps(data), timeout=timeout
+            )
+
+        except Exception as e:
+            raise self._handle_error(
+                e=e,
+                provider_config=responses_api_provider_config,
+            )
+
+        return responses_api_provider_config.transform_delete_response_api_response(
+            raw_response=response,
+            logging_obj=logging_obj,
+        )
+
+    def get_responses(
+        self,
+        response_id: str,
+        responses_api_provider_config: BaseResponsesAPIConfig,
+        litellm_params: GenericLiteLLMParams,
+        logging_obj: LiteLLMLoggingObj,
+        custom_llm_provider: Optional[str] = None,
+        extra_headers: Optional[Dict[str, Any]] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+        timeout: Optional[Union[float, httpx.Timeout]] = None,
+        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
+        _is_async: bool = False,
+    ) -> Union[ResponsesAPIResponse, Coroutine[Any, Any, ResponsesAPIResponse]]:
+        """
+        Get a response by ID
+        Uses GET /v1/responses/{response_id} endpoint in the responses API
+        """
+        if _is_async:
+            return self.async_get_responses(
+                response_id=response_id,
+                responses_api_provider_config=responses_api_provider_config,
+                litellm_params=litellm_params,
+                logging_obj=logging_obj,
+                custom_llm_provider=custom_llm_provider,
+                extra_headers=extra_headers,
+                extra_body=extra_body,
+                timeout=timeout,
+                client=client,
+            )
+        
+        if client is None or not isinstance(client, HTTPHandler):
+            sync_httpx_client = _get_httpx_client(
+                params={"ssl_verify": litellm_params.get("ssl_verify", None)}
+            )
+        else:
+            sync_httpx_client = client
+
+        headers = responses_api_provider_config.validate_environment(
+            api_key=litellm_params.api_key,
+            headers=extra_headers or {},
+            model="None",
+        )
+
+        if extra_headers:
+            headers.update(extra_headers)
+
+        api_base = responses_api_provider_config.get_complete_url(
+            api_base=litellm_params.api_base,
+            litellm_params=dict(litellm_params),
+        )
+
+        url, data = responses_api_provider_config.transform_get_response_api_request(
+            response_id=response_id,
+            api_base=api_base,
+            litellm_params=litellm_params,
+            headers=headers,
+        )
+
+        ## LOGGING
+        logging_obj.pre_call(
+            input="",
+            api_key="",
+            additional_args={
+                "complete_input_dict": data,
+                "api_base": api_base,
+                "headers": headers,
+            },
+        )
+
+        try:
+            response = sync_httpx_client.get(
+                url=url, headers=headers, params=data
+            )
+        except Exception as e:
+            raise self._handle_error(
+                e=e,
+                provider_config=responses_api_provider_config,
+            )
+
+        return responses_api_provider_config.transform_get_response_api_response(
+            raw_response=response,
+            logging_obj=logging_obj,
+        )
+
+    async def async_get_responses(
+        self,
+        response_id: str,
+        responses_api_provider_config: BaseResponsesAPIConfig,
+        litellm_params: GenericLiteLLMParams,
+        logging_obj: LiteLLMLoggingObj,
+        custom_llm_provider: Optional[str] = None,
+        extra_headers: Optional[Dict[str, Any]] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+        timeout: Optional[Union[float, httpx.Timeout]] = None,
+        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
+    ) -> ResponsesAPIResponse:
+        """
+        Async version of get_responses
+        """
+        if client is None or not isinstance(client, AsyncHTTPHandler):
+            async_httpx_client = get_async_httpx_client(
+                llm_provider=litellm.LlmProviders(custom_llm_provider),
+                params={"ssl_verify": litellm_params.get("ssl_verify", None)},
+            )
+        else:
+            async_httpx_client = client
+
+        headers = responses_api_provider_config.validate_environment(
+            api_key=litellm_params.api_key,
+            headers=extra_headers or {},
+            model="None",
+        )
+
+        if extra_headers:
+            headers.update(extra_headers)
+
+        api_base = responses_api_provider_config.get_complete_url(
+            api_base=litellm_params.api_base,
+            litellm_params=dict(litellm_params),
+        )
+
+        url, data = responses_api_provider_config.transform_get_response_api_request(
+            response_id=response_id,
+            api_base=api_base,
+            litellm_params=litellm_params,
+            headers=headers,
+        )
+
+        ## LOGGING
+        logging_obj.pre_call(
+            input="",
+            api_key="",
+            additional_args={
+                "complete_input_dict": data,
+                "api_base": api_base,
+                "headers": headers,
+            },
+        )
+
+        try:
+            response = await async_httpx_client.get(
+                url=url, headers=headers, params=data
+            )
+
+        except Exception as e:
+            verbose_logger.exception(f"Error retrieving response: {e}")
+            raise self._handle_error(
+                e=e,
+                provider_config=responses_api_provider_config,
+            )
+
+        return responses_api_provider_config.transform_get_response_api_response(
+            raw_response=response,
+            logging_obj=logging_obj,
+        )
+
    def create_file(
        self,
        create_file_data: CreateFileRequest,
--- a/litellm/llms/infinity/rerank/common_utils.py
+++ b/litellm/llms/infinity/rerank/common_utils.py
@ -1,10 +1,16 @@
+from typing import Union
 import httpx

 from litellm.llms.base_llm.chat.transformation import BaseLLMException


 class InfinityError(BaseLLMException):
-    def __init__(self, status_code, message):
+    def __init__(
+        self, 
+        status_code: int, 
+        message: str,
+        headers: Union[dict, httpx.Headers] = {}
+        ):
        self.status_code = status_code
        self.message = message
        self.request = httpx.Request(
@ -16,4 +22,5 @@ class InfinityError(BaseLLMException):
            message=message,
            request=self.request,
            response=self.response,
+            headers=headers,
        )  # Call the base class constructor with the parameters it needs
--- a/litellm/llms/infinity/embedding/handler.py
+++ b/litellm/llms/infinity/embedding/handler.py
@ -0,0 +1,5 @@
+"""
+Infinity Embedding - uses `llm_http_handler.py` to make httpx requests
+
+Request/Response transformation is handled in `transformation.py`
+"""
--- a/litellm/llms/infinity/embedding/transformation.py
+++ b/litellm/llms/infinity/embedding/transformation.py
@ -0,0 +1,141 @@
+from typing import List, Optional, Union
+
+import httpx
+
+from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
+from litellm.llms.base_llm.chat.transformation import BaseLLMException
+from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig
+from litellm.secret_managers.main import get_secret_str
+from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues
+from litellm.types.utils import EmbeddingResponse, Usage
+
+from ..common_utils import InfinityError
+
+
+class InfinityEmbeddingConfig(BaseEmbeddingConfig):
+    """
+    Reference: https://infinity.modal.michaelfeil.eu/docs
+    """
+
+    def __init__(self) -> None:
+        pass
+
+    def get_complete_url(
+        self,
+        api_base: Optional[str],
+        api_key: Optional[str],
+        model: str,
+        optional_params: dict,
+        litellm_params: dict,
+        stream: Optional[bool] = None,
+    ) -> str:
+        if api_base is None:
+            raise ValueError("api_base is required for Infinity embeddings")
+        # Remove trailing slashes and ensure clean base URL
+        api_base = api_base.rstrip("/")
+        if not api_base.endswith("/embeddings"):
+            api_base = f"{api_base}/embeddings"
+        return api_base
+
+    def validate_environment(
+        self,
+        headers: dict,
+        model: str,
+        messages: List[AllMessageValues],
+        optional_params: dict,
+        litellm_params: dict,
+        api_key: Optional[str] = None,
+        api_base: Optional[str] = None,
+    ) -> dict:
+        if api_key is None:
+            api_key = get_secret_str("INFINITY_API_KEY")
+
+        default_headers = {
+            "Authorization": f"Bearer {api_key}",
+            "accept": "application/json",
+            "Content-Type": "application/json",
+        }
+
+        # If 'Authorization' is provided in headers, it overrides the default.
+        if "Authorization" in headers:
+            default_headers["Authorization"] = headers["Authorization"]
+
+        # Merge other headers, overriding any default ones except Authorization
+        return {**default_headers, **headers}
+
+    def get_supported_openai_params(self, model: str) -> list:
+        return [
+            "encoding_format",
+            "modality",
+            "dimensions",
+        ]
+
+    def map_openai_params(
+        self,
+        non_default_params: dict,
+        optional_params: dict,
+        model: str,
+        drop_params: bool,
+    ) -> dict:
+        """
+        Map OpenAI params to Infinity params
+
+        Reference: https://infinity.modal.michaelfeil.eu/docs
+        """
+        if "encoding_format" in non_default_params:
+            optional_params["encoding_format"] = non_default_params["encoding_format"]
+        if "modality" in non_default_params:
+            optional_params["modality"] = non_default_params["modality"]
+        if "dimensions" in non_default_params:
+            optional_params["output_dimension"] = non_default_params["dimensions"]
+        return optional_params
+
+    def transform_embedding_request(
+        self,
+        model: str,
+        input: AllEmbeddingInputValues,
+        optional_params: dict,
+        headers: dict,
+    ) -> dict:
+        return {
+            "input": input,
+            "model": model,
+            **optional_params,
+        }
+
+    def transform_embedding_response(
+        self,
+        model: str,
+        raw_response: httpx.Response,
+        model_response: EmbeddingResponse,
+        logging_obj: LiteLLMLoggingObj,
+        api_key: Optional[str] = None,
+        request_data: dict = {},
+        optional_params: dict = {},
+        litellm_params: dict = {},
+    ) -> EmbeddingResponse:
+        try:
+            raw_response_json = raw_response.json()
+        except Exception:
+            raise InfinityError(
+                message=raw_response.text, status_code=raw_response.status_code
+            )
+
+        # model_response.usage
+        model_response.model = raw_response_json.get("model")
+        model_response.data = raw_response_json.get("data")
+        model_response.object = raw_response_json.get("object")
+
+        usage = Usage(
+            prompt_tokens=raw_response_json.get("usage", {}).get("prompt_tokens", 0),
+            total_tokens=raw_response_json.get("usage", {}).get("total_tokens", 0),
+        )
+        model_response.usage = usage
+        return model_response
+
+    def get_error_class(
+        self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
+    ) -> BaseLLMException:
+        return InfinityError(
+            message=error_message, status_code=status_code, headers=headers
+        )
--- a/litellm/llms/infinity/rerank/transformation.py
+++ b/litellm/llms/infinity/rerank/transformation.py
@ -22,7 +22,7 @@ from litellm.types.rerank import (
    RerankTokens,
 )

-from .common_utils import InfinityError
+from ..common_utils import InfinityError


 class InfinityRerankConfig(CohereRerankConfig):
--- a/litellm/llms/openai/responses/transformation.py
+++ b/litellm/llms/openai/responses/transformation.py
@ -7,6 +7,7 @@ from litellm._logging import verbose_logger
 from litellm.llms.base_llm.responses.transformation import BaseResponsesAPIConfig
 from litellm.secret_managers.main import get_secret_str
 from litellm.types.llms.openai import *
+from litellm.types.responses.main import *
 from litellm.types.router import GenericLiteLLMParams

 from ..common_utils import OpenAIError
@ -110,11 +111,7 @@ class OpenAIResponsesAPIConfig(BaseResponsesAPIConfig):
    def get_complete_url(
        self,
        api_base: Optional[str],
-        api_key: Optional[str],
-        model: str,
-        optional_params: dict,
        litellm_params: dict,
-        stream: Optional[bool] = None,
    ) -> str:
        """
        Get the endpoint for OpenAI responses API
@ -190,7 +187,7 @@ class OpenAIResponsesAPIConfig(BaseResponsesAPIConfig):

        model_class = event_models.get(cast(ResponsesAPIStreamEvents, event_type))
        if not model_class:
-            raise ValueError(f"Unknown event type: {event_type}")
+            return GenericEvent

        return model_class

@ -217,3 +214,75 @@ class OpenAIResponsesAPIConfig(BaseResponsesAPIConfig):
                    f"Error getting model info in OpenAIResponsesAPIConfig: {e}"
                )
        return False
+
+    #########################################################
+    ########## DELETE RESPONSE API TRANSFORMATION ##############
+    #########################################################
+    def transform_delete_response_api_request(
+        self,
+        response_id: str,
+        api_base: str,
+        litellm_params: GenericLiteLLMParams,
+        headers: dict,
+    ) -> Tuple[str, Dict]:
+        """
+        Transform the delete response API request into a URL and data
+
+        OpenAI API expects the following request
+        - DELETE /v1/responses/{response_id}
+        """
+        url = f"{api_base}/{response_id}"
+        data: Dict = {}
+        return url, data
+
+    def transform_delete_response_api_response(
+        self,
+        raw_response: httpx.Response,
+        logging_obj: LiteLLMLoggingObj,
+    ) -> DeleteResponseResult:
+        """
+        Transform the delete response API response into a DeleteResponseResult
+        """
+        try:
+            raw_response_json = raw_response.json()
+        except Exception:
+            raise OpenAIError(
+                message=raw_response.text, status_code=raw_response.status_code
+            )
+        return DeleteResponseResult(**raw_response_json)
+    
+    #########################################################
+    ########## GET RESPONSE API TRANSFORMATION ###############
+    #########################################################
+    def transform_get_response_api_request(
+        self,
+        response_id: str,
+        api_base: str,
+        litellm_params: GenericLiteLLMParams,
+        headers: dict,
+    ) -> Tuple[str, Dict]:
+        """
+        Transform the get response API request into a URL and data
+
+        OpenAI API expects the following request
+        - GET /v1/responses/{response_id}
+        """
+        url = f"{api_base}/{response_id}"
+        data: Dict = {}
+        return url, data
+    
+    def transform_get_response_api_response(
+        self,
+        raw_response: httpx.Response,
+        logging_obj: LiteLLMLoggingObj,
+    ) -> ResponsesAPIResponse:
+        """
+        Transform the get response API response into a ResponsesAPIResponse
+        """
+        try:
+            raw_response_json = raw_response.json()
+        except Exception:
+            raise OpenAIError(
+                message=raw_response.text, status_code=raw_response.status_code
+            )
+        return ResponsesAPIResponse(**raw_response_json)
--- a/litellm/llms/vertex_ai/gemini/transformation.py
+++ b/litellm/llms/vertex_ai/gemini/transformation.py
@ -216,6 +216,11 @@ def _gemini_convert_messages_with_history(  # noqa: PLR0915
                    msg_dict = messages[msg_i]  # type: ignore
                assistant_msg = ChatCompletionAssistantMessage(**msg_dict)  # type: ignore
                _message_content = assistant_msg.get("content", None)
+                reasoning_content = assistant_msg.get("reasoning_content", None)
+                if reasoning_content is not None:
+                    assistant_content.append(
+                        PartType(thought=True, text=reasoning_content)
+                    )
                if _message_content is not None and isinstance(_message_content, list):
                    _parts = []
                    for element in _message_content:
@ -223,6 +228,7 @@ def _gemini_convert_messages_with_history(  # noqa: PLR0915
                            if element["type"] == "text":
                                _part = PartType(text=element["text"])
                                _parts.append(_part)
+
                    assistant_content.extend(_parts)
                elif (
                    _message_content is not None
--- a/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py
+++ b/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py
@ -57,6 +57,7 @@ from litellm.types.llms.vertex_ai import (
    LogprobsResult,
    ToolConfig,
    Tools,
+    UsageMetadata,
 )
 from litellm.types.utils import (
    ChatCompletionTokenLogprob,
@ -390,7 +391,7 @@ class VertexGeminiConfig(VertexAIBaseConfig, BaseConfig):
        params: GeminiThinkingConfig = {}
        if thinking_enabled:
            params["includeThoughts"] = True
-        if thinking_budget:
+        if thinking_budget is not None and isinstance(thinking_budget, int):
            params["thinkingBudget"] = thinking_budget

        return params
@ -740,6 +741,23 @@ class VertexGeminiConfig(VertexAIBaseConfig, BaseConfig):

        return model_response

+    def is_candidate_token_count_inclusive(self, usage_metadata: UsageMetadata) -> bool:
+        """
+        Check if the candidate token count is inclusive of the thinking token count
+
+        if prompttokencount + candidatesTokenCount == totalTokenCount, then the candidate token count is inclusive of the thinking token count
+
+        else the candidate token count is exclusive of the thinking token count
+
+        Addresses - https://github.com/BerriAI/litellm/pull/10141#discussion_r2052272035
+        """
+        if usage_metadata.get("promptTokenCount", 0) + usage_metadata.get(
+            "candidatesTokenCount", 0
+        ) == usage_metadata.get("totalTokenCount", 0):
+            return True
+        else:
+            return False
+
    def _calculate_usage(
        self,
        completion_response: GenerateContentResponseBody,
@ -768,14 +786,23 @@ class VertexGeminiConfig(VertexAIBaseConfig, BaseConfig):
            audio_tokens=audio_tokens,
            text_tokens=text_tokens,
        )
+
+        completion_tokens = completion_response["usageMetadata"].get(
+            "candidatesTokenCount", 0
+        )
+        if (
+            not self.is_candidate_token_count_inclusive(
+                completion_response["usageMetadata"]
+            )
+            and reasoning_tokens
+        ):
+            completion_tokens = reasoning_tokens + completion_tokens
        ## GET USAGE ##
        usage = Usage(
            prompt_tokens=completion_response["usageMetadata"].get(
                "promptTokenCount", 0
            ),
-            completion_tokens=completion_response["usageMetadata"].get(
-                "candidatesTokenCount", 0
-            ),
+            completion_tokens=completion_tokens,
            total_tokens=completion_response["usageMetadata"].get("totalTokenCount", 0),
            prompt_tokens_details=prompt_tokens_details,
            reasoning_tokens=reasoning_tokens,
--- a/litellm/main.py
+++ b/litellm/main.py
@ -182,6 +182,7 @@ from .types.llms.openai import (
    ChatCompletionPredictionContentParam,
    ChatCompletionUserMessage,
    HttpxBinaryResponseContent,
+    ImageGenerationRequestQuality,
 )
 from .types.utils import (
    LITELLM_IMAGE_VARIATION_PROVIDERS,
@ -2688,9 +2689,9 @@ def completion(  # type: ignore # noqa: PLR0915
                    "aws_region_name" not in optional_params
                    or optional_params["aws_region_name"] is None
                ):
-                    optional_params[
-                        "aws_region_name"
-                    ] = aws_bedrock_client.meta.region_name
+                    optional_params["aws_region_name"] = (
+                        aws_bedrock_client.meta.region_name
+                    )

            bedrock_route = BedrockModelInfo.get_bedrock_route(model)
            if bedrock_route == "converse":
@ -3884,6 +3885,21 @@ def embedding(  # noqa: PLR0915
                aembedding=aembedding,
                litellm_params={},
            )
+        elif custom_llm_provider == "infinity":
+            response = base_llm_http_handler.embedding(
+                model=model,
+                input=input,
+                custom_llm_provider=custom_llm_provider,
+                api_base=api_base,
+                api_key=api_key,
+                logging_obj=logging,
+                timeout=timeout,
+                model_response=EmbeddingResponse(),
+                optional_params=optional_params,
+                client=client,
+                aembedding=aembedding,
+                litellm_params={},
+            )
        elif custom_llm_provider == "watsonx":
            credentials = IBMWatsonXMixin.get_watsonx_credentials(
                optional_params=optional_params, api_key=api_key, api_base=api_base
@ -4397,9 +4413,9 @@ def adapter_completion(
    new_kwargs = translation_obj.translate_completion_input_params(kwargs=kwargs)

    response: Union[ModelResponse, CustomStreamWrapper] = completion(**new_kwargs)  # type: ignore
-    translated_response: Optional[
-        Union[BaseModel, AdapterCompletionStreamWrapper]
-    ] = None
+    translated_response: Optional[Union[BaseModel, AdapterCompletionStreamWrapper]] = (
+        None
+    )
    if isinstance(response, ModelResponse):
        translated_response = translation_obj.translate_completion_output_params(
            response=response
@ -4552,7 +4568,7 @@ def image_generation(  # noqa: PLR0915
    prompt: str,
    model: Optional[str] = None,
    n: Optional[int] = None,
-    quality: Optional[str] = None,
+    quality: Optional[Union[str, ImageGenerationRequestQuality]] = None,
    response_format: Optional[str] = None,
    size: Optional[str] = None,
    style: Optional[str] = None,
@ -5819,9 +5835,9 @@ def stream_chunk_builder(  # noqa: PLR0915
        ]

        if len(content_chunks) > 0:
-            response["choices"][0]["message"][
-                "content"
-            ] = processor.get_combined_content(content_chunks)
+            response["choices"][0]["message"]["content"] = (
+                processor.get_combined_content(content_chunks)
+            )

        reasoning_chunks = [
            chunk
@ -5832,9 +5848,9 @@ def stream_chunk_builder(  # noqa: PLR0915
        ]

        if len(reasoning_chunks) > 0:
-            response["choices"][0]["message"][
-                "reasoning_content"
-            ] = processor.get_combined_reasoning_content(reasoning_chunks)
+            response["choices"][0]["message"]["reasoning_content"] = (
+                processor.get_combined_reasoning_content(reasoning_chunks)
+            )

        audio_chunks = [
            chunk
--- a/litellm/model_prices_and_context_window_backup.json
+++ b/litellm/model_prices_and_context_window_backup.json
@ -1437,6 +1437,76 @@
        "output_cost_per_pixel": 0.0,
        "litellm_provider": "openai"
    },
+    "gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 4.0054321e-8,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "low/1024-x-1024/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 1.0490417e-8,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "medium/1024-x-1024/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 4.0054321e-8,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "high/1024-x-1024/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 1.59263611e-7,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "low/1024-x-1536/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 1.0172526e-8,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "medium/1024-x-1536/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 4.0054321e-8,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "high/1024-x-1536/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 1.58945719e-7,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "low/1536-x-1024/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 1.0172526e-8,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "medium/1536-x-1024/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 4.0054321e-8,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
+    "high/1536-x-1024/gpt-image-1": {
+        "mode": "image_generation",
+        "input_cost_per_pixel": 1.58945719e-7,
+        "output_cost_per_pixel": 0.0,
+        "litellm_provider": "openai",
+        "supported_endpoints": ["/v1/images/generations"]
+    },
    "gpt-4o-transcribe": {
        "mode": "audio_transcription",
        "input_cost_per_token": 0.0000025,
@ -1472,6 +1542,72 @@
        "litellm_provider": "openai",
        "supported_endpoints": ["/v1/audio/speech"]
    },
+    "azure/computer-use-preview": {
+        "max_tokens": 1024,
+        "max_input_tokens": 8192,
+        "max_output_tokens": 1024,
+        "input_cost_per_token": 0.000003,
+        "output_cost_per_token": 0.000012,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_response_schema": true,
+        "supports_vision": true,
+        "supports_prompt_caching": false,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_reasoning": true
+    },
+    "azure/gpt-4o-audio-preview-2024-12-17": {
+        "max_tokens": 16384,
+        "max_input_tokens": 128000,
+        "max_output_tokens": 16384,
+        "input_cost_per_token": 0.0000025,
+        "input_cost_per_audio_token": 0.00004,
+        "output_cost_per_token": 0.00001,
+        "output_cost_per_audio_token": 0.00008,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions"],
+        "supported_modalities": ["text", "audio"],
+        "supported_output_modalities": ["text", "audio"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_response_schema": false,
+        "supports_vision": false,
+        "supports_prompt_caching": false,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_native_streaming": true,
+        "supports_reasoning": false
+    },
+    "azure/gpt-4o-mini-audio-preview-2024-12-17": {
+        "max_tokens": 16384,
+        "max_input_tokens": 128000,
+        "max_output_tokens": 16384,
+        "input_cost_per_token": 0.0000025,
+        "input_cost_per_audio_token": 0.00004,
+        "output_cost_per_token": 0.00001,
+        "output_cost_per_audio_token": 0.00008,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions"],
+        "supported_modalities": ["text", "audio"],
+        "supported_output_modalities": ["text", "audio"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_response_schema": false,
+        "supports_vision": false,
+        "supports_prompt_caching": false,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_native_streaming": true,
+        "supports_reasoning": false
+    },
    "azure/gpt-4.1": {
        "max_tokens": 32768,
        "max_input_tokens": 1047576,
@ -1530,6 +1666,170 @@
            "search_context_size_high": 50e-3
        }
    },
+    "azure/gpt-4.1-mini": {
+        "max_tokens": 32768,
+        "max_input_tokens": 1047576,
+        "max_output_tokens": 32768,
+        "input_cost_per_token": 0.4e-6,
+        "output_cost_per_token": 1.6e-6,
+        "input_cost_per_token_batches": 0.2e-6,
+        "output_cost_per_token_batches": 0.8e-6,
+        "cache_read_input_token_cost": 0.1e-6,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_response_schema": true,
+        "supports_vision": true,
+        "supports_prompt_caching": true,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_native_streaming": true,
+        "supports_web_search": true,
+        "search_context_cost_per_query": {
+            "search_context_size_low": 25e-3,
+            "search_context_size_medium": 27.5e-3,
+            "search_context_size_high": 30e-3
+        }
+    },
+    "azure/gpt-4.1-mini-2025-04-14": {
+        "max_tokens": 32768,
+        "max_input_tokens": 1047576,
+        "max_output_tokens": 32768,
+        "input_cost_per_token": 0.4e-6,
+        "output_cost_per_token": 1.6e-6,
+        "input_cost_per_token_batches": 0.2e-6,
+        "output_cost_per_token_batches": 0.8e-6,
+        "cache_read_input_token_cost": 0.1e-6,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_response_schema": true,
+        "supports_vision": true,
+        "supports_prompt_caching": true,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_native_streaming": true,
+        "supports_web_search": true,
+        "search_context_cost_per_query": {
+            "search_context_size_low": 25e-3,
+            "search_context_size_medium": 27.5e-3,
+            "search_context_size_high": 30e-3
+        }
+    },
+    "azure/gpt-4.1-nano": {
+        "max_tokens": 32768,
+        "max_input_tokens": 1047576,
+        "max_output_tokens": 32768,
+        "input_cost_per_token": 0.1e-6,
+        "output_cost_per_token": 0.4e-6,
+        "input_cost_per_token_batches": 0.05e-6,
+        "output_cost_per_token_batches": 0.2e-6,
+        "cache_read_input_token_cost": 0.025e-6,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_response_schema": true,
+        "supports_vision": true,
+        "supports_prompt_caching": true,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_native_streaming": true
+    },
+    "azure/gpt-4.1-nano-2025-04-14": {
+        "max_tokens": 32768,
+        "max_input_tokens": 1047576,
+        "max_output_tokens": 32768,
+        "input_cost_per_token": 0.1e-6,
+        "output_cost_per_token": 0.4e-6,
+        "input_cost_per_token_batches": 0.05e-6,
+        "output_cost_per_token_batches": 0.2e-6,
+        "cache_read_input_token_cost": 0.025e-6,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_response_schema": true,
+        "supports_vision": true,
+        "supports_prompt_caching": true,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_native_streaming": true
+    },
+    "azure/o3": {
+        "max_tokens": 100000,
+        "max_input_tokens": 200000,
+        "max_output_tokens": 100000,
+        "input_cost_per_token": 1e-5,
+        "output_cost_per_token": 4e-5,
+        "cache_read_input_token_cost": 2.5e-6,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": false,
+        "supports_vision": true,
+        "supports_prompt_caching": true,
+        "supports_response_schema": true,
+        "supports_reasoning": true,
+        "supports_tool_choice": true
+    },
+    "azure/o3-2025-04-16": {
+        "max_tokens": 100000,
+        "max_input_tokens": 200000,
+        "max_output_tokens": 100000,
+        "input_cost_per_token": 1e-5,
+        "output_cost_per_token": 4e-5,
+        "cache_read_input_token_cost": 2.5e-6,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": false,
+        "supports_vision": true,
+        "supports_prompt_caching": true,
+        "supports_response_schema": true,
+        "supports_reasoning": true,
+        "supports_tool_choice": true
+    },
+    "azure/o4-mini": {
+        "max_tokens": 100000,
+        "max_input_tokens": 200000,
+        "max_output_tokens": 100000,
+        "input_cost_per_token": 1.1e-6,
+        "output_cost_per_token": 4.4e-6,
+        "cache_read_input_token_cost": 2.75e-7,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
+        "supported_modalities": ["text", "image"],
+        "supported_output_modalities": ["text"],
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": false,
+        "supports_vision": true,
+        "supports_prompt_caching": true,
+        "supports_response_schema": true,
+        "supports_reasoning": true,
+        "supports_tool_choice": true
+    },
    "azure/gpt-4o-mini-realtime-preview-2024-12-17": {
        "max_tokens": 4096,
        "max_input_tokens": 128000,
@ -5301,6 +5601,35 @@
        "source": "https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-2.0-flash",
        "supports_tool_choice": true
    },
+    "gemini-2.5-pro-preview-03-25": {
+        "max_tokens": 65536,
+        "max_input_tokens": 1048576,
+        "max_output_tokens": 65536,
+        "max_images_per_prompt": 3000,
+        "max_videos_per_prompt": 10,
+        "max_video_length": 1,
+        "max_audio_length_hours": 8.4,
+        "max_audio_per_prompt": 1,
+        "max_pdf_size_mb": 30,
+        "input_cost_per_audio_token": 0.00000125,
+        "input_cost_per_token": 0.00000125,
+        "input_cost_per_token_above_200k_tokens": 0.0000025,
+        "output_cost_per_token": 0.00001,
+        "output_cost_per_token_above_200k_tokens": 0.000015, 
+        "litellm_provider": "vertex_ai-language-models",
+        "mode": "chat",
+        "supports_reasoning": true,
+        "supports_system_messages": true,
+        "supports_function_calling": true,
+        "supports_vision": true,
+        "supports_response_schema": true,
+        "supports_audio_output": false,
+        "supports_tool_choice": true,
+        "supported_endpoints": ["/v1/chat/completions", "/v1/completions", "/v1/batch"],
+        "supported_modalities": ["text", "image", "audio", "video"],
+        "supported_output_modalities": ["text"],
+        "source": "https://ai.google.dev/gemini-api/docs/models#gemini-2.5-flash-preview"
+    },
    "gemini/gemini-2.0-pro-exp-02-05": {
        "max_tokens": 8192,
        "max_input_tokens": 2097152,
--- a/litellm/proxy/_experimental/out/_next/static/chunks/117-1c5bfc45bfc4237d.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/117-1c5bfc45bfc4237d.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/250-a927a558002d8fb9.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/250-a927a558002d8fb9.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/250-e4cc2ceb9ff1c37a.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/250-e4cc2ceb9ff1c37a.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/261-92d8946249b3296e.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/261-92d8946249b3296e.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/3014691f-b7b79b78e27792f3.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/3014691f-b7b79b78e27792f3.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/42-014374badc35fe9b.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/42-014374badc35fe9b.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/42-6810261f4d6c8bbf.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/42-6810261f4d6c8bbf.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/699-f4066c747670f979.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/699-f4066c747670f979.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/860-ad0c91f6f8261026.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/860-ad0c91f6f8261026.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/899-54ea329f41297bf0.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/899-54ea329f41297bf0.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/978-3e0bd2034b623309.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/978-3e0bd2034b623309.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/_not-found/page-3b0daafcbe368586.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/_not-found/page-3b0daafcbe368586.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-429ad74a94df7643.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-429ad74a94df7643.js
@ -0,0 +1 @@
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{96443:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_cf7686', '__Inter_Fallback_cf7686'",fontStyle:"normal"},className:"__className_cf7686"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=96443)}),_N_E=n.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-af8319e6c59a08da.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-af8319e6c59a08da.js
@ -1 +0,0 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{6580:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_cf7686', '__Inter_Fallback_cf7686'",fontStyle:"normal"},className:"__className_cf7686"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=6580)}),_N_E=n.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub/page-3d2c374ee41b38e5.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub/page-3d2c374ee41b38e5.js
@ -1 +1 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{11790:function(e,n,u){Promise.resolve().then(u.bind(u,52829))},52829:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(92699);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1})}}},function(e){e.O(0,[42,261,250,699,971,117,744],function(){return e(e.s=11790)}),_N_E=e.O()}]);
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{21024:function(e,n,u){Promise.resolve().then(u.bind(u,52829))},52829:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(92699);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1})}}},function(e){e.O(0,[42,261,250,699,971,117,744],function(){return e(e.s=21024)}),_N_E=e.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-10ed8b988e962631.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-10ed8b988e962631.js
@ -1 +0,0 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[461],{32922:function(e,t,n){Promise.resolve().then(n.bind(n,12011))},12011:function(e,t,n){"use strict";n.r(t),n.d(t,{default:function(){return S}});var s=n(57437),o=n(2265),a=n(99376),i=n(20831),c=n(94789),l=n(12514),r=n(49804),u=n(67101),d=n(84264),m=n(49566),h=n(96761),x=n(84566),p=n(19250),f=n(14474),k=n(13634),j=n(73002),g=n(3914);function S(){let[e]=k.Z.useForm(),t=(0,a.useSearchParams)();(0,g.e)("token");let n=t.get("invitation_id"),[S,_]=(0,o.useState)(null),[w,Z]=(0,o.useState)(""),[N,b]=(0,o.useState)(""),[T,v]=(0,o.useState)(null),[y,E]=(0,o.useState)(""),[C,U]=(0,o.useState)("");return(0,o.useEffect)(()=>{n&&(0,p.W_)(n).then(e=>{let t=e.login_url;console.log("login_url:",t),E(t);let n=e.token,s=(0,f.o)(n);U(n),console.log("decoded:",s),_(s.key),console.log("decoded user email:",s.user_email),b(s.user_email),v(s.user_id)})},[n]),(0,s.jsx)("div",{className:"mx-auto w-full max-w-md mt-10",children:(0,s.jsxs)(l.Z,{children:[(0,s.jsx)(h.Z,{className:"text-sm mb-5 text-center",children:"\uD83D\uDE85 LiteLLM"}),(0,s.jsx)(h.Z,{className:"text-xl",children:"Sign up"}),(0,s.jsx)(d.Z,{children:"Claim your user account to login to Admin UI."}),(0,s.jsx)(c.Z,{className:"mt-4",title:"SSO",icon:x.GH$,color:"sky",children:(0,s.jsxs)(u.Z,{numItems:2,className:"flex justify-between items-center",children:[(0,s.jsx)(r.Z,{children:"SSO is under the Enterprise Tier."}),(0,s.jsx)(r.Z,{children:(0,s.jsx)(i.Z,{variant:"primary",className:"mb-2",children:(0,s.jsx)("a",{href:"https://forms.gle/W3U4PZpJGFHWtHyA9",target:"_blank",children:"Get Free Trial"})})})]})}),(0,s.jsxs)(k.Z,{className:"mt-10 mb-5 mx-auto",layout:"vertical",onFinish:e=>{console.log("in handle submit. accessToken:",S,"token:",C,"formValues:",e),S&&C&&(e.user_email=N,T&&n&&(0,p.m_)(S,n,T,e.password).then(e=>{var t;let n="/ui/";n+="?userID="+((null===(t=e.data)||void 0===t?void 0:t.user_id)||e.user_id),document.cookie="token="+C,console.log("redirecting to:",n),window.location.href=n}))},children:[(0,s.jsxs)(s.Fragment,{children:[(0,s.jsx)(k.Z.Item,{label:"Email Address",name:"user_email",children:(0,s.jsx)(m.Z,{type:"email",disabled:!0,value:N,defaultValue:N,className:"max-w-md"})}),(0,s.jsx)(k.Z.Item,{label:"Password",name:"password",rules:[{required:!0,message:"password required to sign up"}],help:"Create a password for your account",children:(0,s.jsx)(m.Z,{placeholder:"",type:"password",className:"max-w-md"})})]}),(0,s.jsx)("div",{className:"mt-10",children:(0,s.jsx)(j.ZP,{htmlType:"submit",children:"Sign Up"})})]})]})})}},3914:function(e,t,n){"use strict";function s(){let e=window.location.hostname,t=["Lax","Strict","None"];["/","/ui"].forEach(n=>{document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,";"),document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; domain=").concat(e,";"),t.forEach(t=>{let s="None"===t?" Secure;":"";document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; SameSite=").concat(t,";").concat(s),document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; domain=").concat(e,"; SameSite=").concat(t,";").concat(s)})}),console.log("After clearing cookies:",document.cookie)}function o(e){let t=document.cookie.split("; ").find(t=>t.startsWith(e+"="));return t?t.split("=")[1]:null}n.d(t,{b:function(){return s},e:function(){return o}})}},function(e){e.O(0,[665,42,899,250,971,117,744],function(){return e(e.s=32922)}),_N_E=e.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-4809c2f644098f19.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-4809c2f644098f19.js
@ -0,0 +1 @@
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[461],{8672:function(e,t,n){Promise.resolve().then(n.bind(n,12011))},12011:function(e,t,n){"use strict";n.r(t),n.d(t,{default:function(){return S}});var s=n(57437),o=n(2265),a=n(99376),c=n(20831),i=n(94789),l=n(12514),r=n(49804),u=n(67101),m=n(84264),d=n(49566),h=n(96761),x=n(84566),p=n(19250),f=n(14474),k=n(13634),g=n(73002),j=n(3914);function S(){let[e]=k.Z.useForm(),t=(0,a.useSearchParams)();(0,j.e)("token");let n=t.get("invitation_id"),[S,w]=(0,o.useState)(null),[Z,_]=(0,o.useState)(""),[N,b]=(0,o.useState)(""),[T,y]=(0,o.useState)(null),[E,v]=(0,o.useState)(""),[C,U]=(0,o.useState)("");return(0,o.useEffect)(()=>{n&&(0,p.W_)(n).then(e=>{let t=e.login_url;console.log("login_url:",t),v(t);let n=e.token,s=(0,f.o)(n);U(n),console.log("decoded:",s),w(s.key),console.log("decoded user email:",s.user_email),b(s.user_email),y(s.user_id)})},[n]),(0,s.jsx)("div",{className:"mx-auto w-full max-w-md mt-10",children:(0,s.jsxs)(l.Z,{children:[(0,s.jsx)(h.Z,{className:"text-sm mb-5 text-center",children:"\uD83D\uDE85 LiteLLM"}),(0,s.jsx)(h.Z,{className:"text-xl",children:"Sign up"}),(0,s.jsx)(m.Z,{children:"Claim your user account to login to Admin UI."}),(0,s.jsx)(i.Z,{className:"mt-4",title:"SSO",icon:x.GH$,color:"sky",children:(0,s.jsxs)(u.Z,{numItems:2,className:"flex justify-between items-center",children:[(0,s.jsx)(r.Z,{children:"SSO is under the Enterprise Tier."}),(0,s.jsx)(r.Z,{children:(0,s.jsx)(c.Z,{variant:"primary",className:"mb-2",children:(0,s.jsx)("a",{href:"https://forms.gle/W3U4PZpJGFHWtHyA9",target:"_blank",children:"Get Free Trial"})})})]})}),(0,s.jsxs)(k.Z,{className:"mt-10 mb-5 mx-auto",layout:"vertical",onFinish:e=>{console.log("in handle submit. accessToken:",S,"token:",C,"formValues:",e),S&&C&&(e.user_email=N,T&&n&&(0,p.m_)(S,n,T,e.password).then(e=>{let t="/ui/";t+="?login=success",document.cookie="token="+C,console.log("redirecting to:",t),window.location.href=t}))},children:[(0,s.jsxs)(s.Fragment,{children:[(0,s.jsx)(k.Z.Item,{label:"Email Address",name:"user_email",children:(0,s.jsx)(d.Z,{type:"email",disabled:!0,value:N,defaultValue:N,className:"max-w-md"})}),(0,s.jsx)(k.Z.Item,{label:"Password",name:"password",rules:[{required:!0,message:"password required to sign up"}],help:"Create a password for your account",children:(0,s.jsx)(d.Z,{placeholder:"",type:"password",className:"max-w-md"})})]}),(0,s.jsx)("div",{className:"mt-10",children:(0,s.jsx)(g.ZP,{htmlType:"submit",children:"Sign Up"})})]})]})})}},3914:function(e,t,n){"use strict";function s(){let e=window.location.hostname,t=["Lax","Strict","None"];["/","/ui"].forEach(n=>{document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,";"),document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; domain=").concat(e,";"),t.forEach(t=>{let s="None"===t?" Secure;":"";document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; SameSite=").concat(t,";").concat(s),document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; domain=").concat(e,"; SameSite=").concat(t,";").concat(s)})}),console.log("After clearing cookies:",document.cookie)}function o(e){let t=document.cookie.split("; ").find(t=>t.startsWith(e+"="));return t?t.split("=")[1]:null}n.d(t,{b:function(){return s},e:function(){return o}})}},function(e){e.O(0,[665,42,899,250,971,117,744],function(){return e(e.s=8672)}),_N_E=e.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-23f86140208820d6.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-23f86140208820d6.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-9bd76bfe1ce0a80a.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-9bd76bfe1ce0a80a.js
--- a/Show more
+++ b/Show more
Author	SHA1	Message	Date
Christian Owusu	b82af5b826	Fix UI Flicker in Dashboard (#10261 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 17s Details Helm unit test / unit-test (push) Successful in 24s Details	2025-04-23 23:27:44 -07:00
Krrish Dholakia	2adb2fc6a5	test: handle service unavailable error	2025-04-23 22:10:46 -07:00
Krrish Dholakia	620a0f4805	bump: version 1.67.2 → 1.67.3	2025-04-23 22:09:25 -07:00
Krrish Dholakia	05d617bea0	build(ui/): new ui build	2025-04-23 22:09:14 -07:00
Krish Dholakia	ab68af4ff5	Litellm multi admin fixes (#10259 ) * fix(create_user_button.tsx): do not set 'no-default-models' when user is a proxy admin * fix(user_info_view.tsx): show all user personal models * feat(user_info_view.tsx): allow giving users more personal models * feat(user_edit_view.tsx): allow proxy admin to edit user role, available models, etc.	2025-04-23 22:02:02 -07:00
Krish Dholakia	430cc60e62	Litellm fix UI login (#10260 ) * fix(user_dashboard.tsx): add token expiry logic to user dashboard if token expired redirect to `/sso/key/generate` for login * fix(user_dashboard.tsx): check key health on login - if invalid -> redirect to login handles invalid / expired key scenario * fix(user_dashboard.tsx): fix linting error * fix(page.tsx): fix invitation link flow	2025-04-23 22:01:38 -07:00
Krish Dholakia	be4152c8d5	UI - fix edit azure public model name + fix editing model name post create * test(test_router.py): add unit test confirming fallbacks with tag based routing works as expected * test: update testing * test: update test to not use gemini-pro google removed it * fix(conditional_public_model_name.tsx): edit azure public model name Fixes https://github.com/BerriAI/litellm/issues/10093 * fix(model_info_view.tsx): migrate to patch model updates Enables changing model name easily	2025-04-23 21:56:56 -07:00
Krish Dholakia	acd2c1783c	fix(converse_transformation.py): support all bedrock - openai params for arn models (#10256 ) Fixes https://github.com/BerriAI/litellm/issues/10207	2025-04-23 21:56:05 -07:00
Krrish Dholakia	6023427dae	docs(gemini.md): cleanup	2025-04-23 21:54:12 -07:00
Krrish Dholakia	2486a106f4	test: mark flaky tests	2025-04-23 21:50:25 -07:00
Christian Owusu	a260afb74d	Reset key alias value when resetting filters (#10099 )	2025-04-23 21:04:40 -07:00
Dimitri Papadopoulos Orfanos	5e2fd49dd3	Fix typos (#10232 )	2025-04-23 20:59:25 -07:00
Ishaan Jaff	f3291bde4d	fix for serviceAccountName on migration job (#10258 )	2025-04-23 20:56:31 -07:00
Krrish Dholakia	1014529ed6	build(ui/): update ui build All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 21s Details Helm unit test / unit-test (push) Successful in 23s Details	2025-04-23 16:55:35 -07:00
Krish Dholakia	edd15b0905	fix(user_dashboard.tsx): add token expiry logic to user dashboard (#10250 ) * fix(user_dashboard.tsx): add token expiry logic to user dashboard if token expired redirect to `/sso/key/generate` for login * fix(user_dashboard.tsx): check key health on login - if invalid -> redirect to login handles invalid / expired key scenario * fix(user_dashboard.tsx): fix linting error * fix(page.tsx): fix invitation link flow	2025-04-23 16:51:27 -07:00
Ishaan Jaff	dc9b058dbd	[Feat] Add support for GET Responses Endpoint - OpenAI, Azure OpenAI (#10235 ) * Added get responses API (#10234) * test_basic_openai_responses_get_endpoint * transform_get_response_api_request * test_basic_openai_responses_get_endpoint --------- Co-authored-by: Prathamesh Saraf <pratamesh1867@gmail.com>	2025-04-23 15:19:29 -07:00
Ishaan Jaff	2e58e47b43	[Bug Fix] Add Cost Tracking for gpt-image-1 when quality is unspecified (#10247 ) * TestOpenAIGPTImage1 * fixes for cost calc * fix ImageGenerationRequestQuality.MEDIUM	2025-04-23 15:16:40 -07:00
Ishaan Jaff	baa5564f95	cleanup remove stale dir	2025-04-23 14:07:43 -07:00
Ishaan Jaff	36ee132514	[Feat] Add gpt-image-1 cost tracking (#10241 ) * add gpt-image-1 * add gpt-image-1 example to docs	2025-04-23 12:20:55 -07:00
Krrish Dholakia	a649f10e63	test: update test to not use gemini-pro google removed it	2025-04-23 11:31:09 -07:00
Krrish Dholakia	8184124217	test: update testing	2025-04-23 11:21:50 -07:00
Krrish Dholakia	174a1aa007	test: update test All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 16s Details Helm unit test / unit-test (push) Successful in 25s Details	2025-04-23 10:51:18 -07:00
Christian Owusu	47420d8d68	Require auth for all dashboard pages (#10229 ) * Require authentication for all Dashboard pages * Add test * Add test	2025-04-23 07:08:25 -07:00
Dimitri Papadopoulos Orfanos	34be7ffceb	Discard duplicate sentence (#10231 )	2025-04-23 07:05:29 -07:00
Krrish Dholakia	f5996b2f6b	test: update test to skip 'gemini-pro' - model deprecated All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 19s Details Helm unit test / unit-test (push) Successful in 32s Details	2025-04-23 00:01:02 -07:00
Krish Dholakia	217681eb5e	Litellm dev 04 22 2025 p1 (#10206 ) * fix(openai.py): initial commit adding generic event type for openai responses api streaming Ensures handling for undocumented event types - e.g. "response.reasoning_summary_part.added" * fix(transformation.py): handle unknown openai response type * fix(datadog_llm_observability.py): handle dict[str, any] -> dict[str, str] conversion Fixes https://github.com/BerriAI/litellm/issues/9494 * test: add more unit testing * test: add unit test * fix(common_utils.py): fix message with content list * test: update testing	2025-04-22 23:58:43 -07:00
Krish Dholakia	f670ebeb2f	Users page - new user info pane (#10213 ) * feat(user_info_view.tsx): be able to click in and see all teams user is part of makes it easy to see which teams a user belongs to * test(ui/): add unit testing for user info view * fix(user_info_view.tsx): fix linting errors * fix(login.ts): fix login * fix: fix linting error	2025-04-22 21:55:47 -07:00
Krrish Dholakia	31f704a370	fix(internal_user_endpoints.py): add check on sortby value	2025-04-22 21:41:13 -07:00
Ishaan Jaff	1e3a1cba23	bump: version 1.67.1 → 1.67.2	2025-04-22 21:35:23 -07:00
Ishaan Jaff	96e31d205c	feat: Added Missing Attributes For Arize & Phoenix Integration (#10043 ) (#10215 ) * feat: Added Missing Attributes For Arize & Phoenix Integration * chore: Added noqa for PLR0915 to suppress warning * chore: Moved Contributor Test to Correct Location * chore: Removed Redundant Fallback Co-authored-by: Ali Saleh <saleh.a@turing.com>	2025-04-22 21:34:51 -07:00
Krish Dholakia	5f98d4d7de	UI - Users page - Enable global sorting (allows finding users with highest spend) (#10211 ) * fix(view_users.tsx): add time tracking logic to debounce search - prevent new queries from being overwritten by previous ones * fix(internal_user_endpoints.py): add sort functionality to user list endpoint * feat(internal_user_endpoints.py): support sort by on `/user/list` * fix(view_users.tsx): enable global sorting allows finding user with highest spend * feat(view_users.tsx): support filtering by sso user id * test(search_users.spec.ts): add tests to ensure filtering works * test: add more unit testing	2025-04-22 19:59:53 -07:00
Ishaan Jaff	0dba2886f0	fix test	2025-04-22 18:37:56 -07:00
Ishaan Jaff	6e4fed59b6	docs agent ops logging All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 20s Details Helm unit test / unit-test (push) Successful in 28s Details	2025-04-22 18:32:28 -07:00
Ishaan Jaff	b96d2ea422	Bug Fix - Address deprecation of open_text (#10208 ) * Update utils.py (#10201) * fixes importlib --------- Co-authored-by: Nathan Brake <33383515+njbrake@users.noreply.github.com>	2025-04-22 18:29:56 -07:00
Ishaan Jaff	868cdd0226	[Feat] Add Support for DELETE /v1/responses/{response_id} on OpenAI, Azure OpenAI (#10205 ) * add transform_delete_response_api_request to base responses config * add transform_delete_response_api_request * add delete_response_api_handler * fixes for deleting responses, response API * add adelete_responses * add async test_basic_openai_responses_delete_endpoint * test_basic_openai_responses_delete_endpoint * working delete for streaming on responses API * fixes azure transformation * TestAnthropicResponsesAPITest * fix code check * fix linting * fixes for get_complete_url * test_basic_openai_responses_streaming_delete_endpoint * streaming fixes	2025-04-22 18:27:03 -07:00
Ishaan Jaff	2bb51866b1	fix azure/computer-use-preview native streaming	2025-04-22 18:21:06 -07:00
Ishaan Jaff	44264ab6d6	fix failing agent ops test	2025-04-22 14:39:50 -07:00
Krish Dholakia	66680c421d	Add global filtering to Users tab (#10195 ) * style(internal_user_endpoints.py): add response model to `/user/list` endpoint make sure we maintain consistent response spec * fix(key_management_endpoints.py): return 'created_at' and 'updated_at' on `/key/generate` Show 'created_at' on UI when key created * test(test_keys.py): add e2e test to ensure created at is always returned * fix(view_users.tsx): support global search by user email allows easier search * test(search_users.spec.ts): add e2e test ensure user search works on admin ui * fix(view_users.tsx): support filtering user by role and user id More powerful filtering on internal users table * fix(view_users.tsx): allow filtering users by team * style(view_users.tsx): cleanup ui to show filters in consistent style * refactor(view_users.tsx): cleanup to just use 1 variable for the data * fix(view_users.tsx): cleanup use effect hooks * fix(internal_user_endpoints.py): fix check to pass testing * test: update tests * test: update tests * Revert "test: update tests" This reverts commit `6553eeb232`. * fix(view_userts.tsx): add back in 'previous' and 'next' tabs for pagination	2025-04-22 13:59:43 -07:00
Dwij	b2955a2bdd	Add AgentOps Integration to LiteLLM (#9685 ) * feat(sidebars): add new item for agentops integration in Logging & Observability category * Update agentops_integration.md to enhance title formatting and remove redundant section * Enhance AgentOps integration in documentation and codebase by removing LiteLLMCallbackHandler references, adding environment variable configurations, and updating logging initialization for AgentOps support. * Update AgentOps integration documentation to include instructions for obtaining API keys and clarify environment variable setup. * Add unit tests for AgentOps integration and improve error handling in token fetching * Add unit tests for AgentOps configuration and token fetching functionality * Corrected agentops test directory * Linting fix * chore: add OpenTelemetry dependencies to pyproject.toml * chore: update OpenTelemetry dependencies and add new packages in pyproject.toml and poetry.lock	2025-04-22 10:29:01 -07:00
Ishaan Jaff	ebfff975d4	docs responses routing All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 19s Details Helm unit test / unit-test (push) Successful in 52s Details	2025-04-21 23:05:53 -07:00
Krish Dholakia	a7db0df043	Gemini-2.5-flash improvements (#10198 ) * fix(vertex_and_google_ai_studio_gemini.py): allow thinking budget = 0 Fixes https://github.com/BerriAI/litellm/issues/10121 * fix(vertex_and_google_ai_studio_gemini.py): handle nuance in counting exclusive vs. inclusive tokens Addresses https://github.com/BerriAI/litellm/pull/10141#discussion_r2052272035	2025-04-21 22:48:00 -07:00
Ishaan Jaff	d1fb051d25	bump: version 1.67.0 → 1.67.1	2025-04-21 22:43:13 -07:00
Ishaan Jaff	7cb95bcc96	[Bug Fix] caching does not account for thinking or reasoning_effort config (#10140 ) * _get_litellm_supported_chat_completion_kwargs * test caching with thinking	2025-04-21 22:39:40 -07:00
Ishaan Jaff	104e4cb1bc	[Feat] Add infinity embedding support (contributor pr) (#10196 ) * Feature - infinity support for #8764 (#10009) * Added support for infinity embeddings * Added test cases * Fixed tests and api base * Updated docs and tests * Removed unused import * Updated signature * Added support for infinity embeddings * Added test cases * Fixed tests and api base * Updated docs and tests * Removed unused import * Updated signature * Updated validate params --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * fix InfinityEmbeddingConfig --------- Co-authored-by: Prathamesh Saraf <pratamesh1867@gmail.com>	2025-04-21 20:01:29 -07:00
Ishaan Jaff	0c2f705417	[Feat] Add Responses API - Routing Affinity logic for sessions (#10193 ) * test for test_responses_api_routing_with_previous_response_id * test_responses_api_routing_with_previous_response_id * add ResponsesApiDeploymentCheck * ResponsesApiDeploymentCheck * ResponsesApiDeploymentCheck * fix ResponsesApiDeploymentCheck * test_responses_api_routing_with_previous_response_id * ResponsesApiDeploymentCheck * test_responses_api_deployment_check.py * docs routing affinity * simplify ResponsesApiDeploymentCheck * test response id * fix code quality check	2025-04-21 20:00:27 -07:00
Ishaan Jaff	4eac0f64f3	[Feat] Pass through endpoints - ensure `PassthroughStandardLoggingPayload` is logged and contains method, url, request/response body (#10194 ) * ensure passthrough_logging_payload is filled in kwargs * test_assistants_passthrough_logging * test_assistants_passthrough_logging * test_assistants_passthrough_logging * test_threads_passthrough_logging * test _init_kwargs_for_pass_through_endpoint * _init_kwargs_for_pass_through_endpoint	2025-04-21 19:46:22 -07:00
Krrish Dholakia	4a50cf10fb	build: update ui build All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 23s Details Helm unit test / unit-test (push) Successful in 25s Details	2025-04-21 16:26:36 -07:00
Krish Dholakia	89131d8ed3	Remove user_id from url (#10192 ) * fix(user_dashboard.tsx): initial commit using user id from jwt instead of url * fix(proxy_server.py): remove user id from url fixes security issue around sharing url's * fix(user_dashboard.tsx): handle user id being null	2025-04-21 16:22:57 -07:00
Krrish Dholakia	a34778dda6	build(ui/): update ui build supports new non-user id in url flow	2025-04-21 16:22:28 -07:00
Krish Dholakia	0c3b7bb37d	fix(router.py): handle edge case where user sets 'model_group' inside… (#10191 ) * fix(router.py): handle edge case where user sets 'model_group' inside 'model_info' * fix(key_management_endpoints.py): security fix - return hashed token in 'token' field Ensures when creating a key on UI - only hashed token shown * test(test_key_management_endpoints.py): add unit test * test: update test	2025-04-21 16:17:45 -07:00
Nilanjan De	03245c732a	Fix: Potential SQLi in spend_management_endpoints.py (#9878 ) * fix: Potential SQLi in spend_management_endpoints.py * fix tests * test: add tests for global spend keys endpoint * chore: update error message * chore: lint * chore: rename test	2025-04-21 14:29:38 -07:00
Li Yang	10257426a2	fix(bedrock): wrong system prompt transformation (#10120 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 16s Details Helm unit test / unit-test (push) Successful in 25s Details * fix(bedrock): wrong system transformation * chore: add one more test case --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>	2025-04-21 08:48:14 -07:00
Marty Sullivan	0b63c7a2eb	Model pricing updates for Azure & VertexAI (#10178 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 23s Details Helm unit test / unit-test (push) Successful in 27s Details	2025-04-20 11:33:45 -07:00
Krrish Dholakia	1ff7625984	docs: cleanup	2025-04-20 09:26:05 -07:00
Krrish Dholakia	aa55103486	docs: cleanup doc All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 19s Details Helm unit test / unit-test (push) Successful in 26s Details	2025-04-20 09:20:47 -07:00
Krrish Dholakia	1d9b58688b	docs(sidebars.js): place scim doc in correct place	2025-04-20 09:20:10 -07:00
Krish Dholakia	ce828408da	fix(proxy_server.py): pass llm router to get complete model list (#10176 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 21s Details Helm unit test / unit-test (push) Successful in 27s Details allows model auth to work	2025-04-19 22:27:49 -07:00
Krish Dholakia	e0a613f88a	fix(common_daily_activity.py): support empty entity id field (#10175 ) * fix(common_daily_activity.py): support empty entity id field allows returning empty response when user is not admin and does not belong to any team * test(test_common_daily_activity.py): add unit testing	2025-04-19 22:20:28 -07:00
Ishaan Jaff	72f6bd3972	fix azure foundry phi error	2025-04-19 22:10:18 -07:00
Ishaan Jaff	36bcb3de4e	fix models appearing under test key page	2025-04-19 21:37:08 -07:00
Krrish Dholakia	bb13ac45c8	docs(index.md): cleanup	2025-04-19 19:16:10 -07:00
Ishaan Jaff	1be36be72e	Litellm docs SCIM (#10174 ) * docs scim * docs SCIM stash * docs litellm SCIM * docs fix * docs scim with LiteLLM	2025-04-19 18:29:09 -07:00
Krish Dholakia	55a17730fb	fix(transformation.py): pass back in gemini thinking content to api (#10173 ) Ensures thinking content always returned	2025-04-19 18:03:05 -07:00
Krish Dholakia	bbfcb1ac7e	Litellm release notes 04 19 2025 (#10169 ) * docs(index.md): initial draft release notes * docs: note all pending docs * build(model_prices_and_context_window.json): add o3, gpt-4.1, o4-mini pricing * docs(vllm.md): update vllm doc to show file message type support * docs(mistral.md): add mistral passthrough route doc * docs(gemini.md): add gemini thinking to docs * docs(vertex.md): add thinking/reasoning content for gemini models to docs * docs(index.md): more links * docs(index.md): add more links, images * docs(index.md): cleanup highlights	2025-04-19 17:26:30 -07:00
Ishaan Jaff	daf024bad1	Supported Responses API Parameters	2025-04-19 17:14:53 -07:00
Ishaan Jaff	f39d917886	[Docs] Responses API (#10172 ) * docs litellm responses api * doc fix * docs responses API * add get_supported_openai_params for LiteLLMCompletionResponsesConfig * add Supported Responses API Parameters	2025-04-19 17:10:45 -07:00
Ishaan Jaff	7c3df984da	can_user_call_model (#10170 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 51s Details Helm unit test / unit-test (push) Successful in 51s Details	2025-04-19 16:46:00 -07:00
Ishaan Jaff	431b230f07	[UI] Bug Fix, team model selector (#10171 ) * fix tooltip * bug fix fix team model selector	2025-04-19 16:31:38 -07:00
				`@ -0,0 +1 @@`
				`(self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).push([[185],{96443:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_cf7686', '__Inter_Fallback_cf7686'",fontStyle:"normal"},className:"__className_cf7686"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=96443)}),_N_E=n.O()}]);`
				`@ -1 +0,0 @@`
				`(self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).push([[185],{6580:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_cf7686', '__Inter_Fallback_cf7686'",fontStyle:"normal"},className:"__className_cf7686"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=6580)}),_N_E=n.O()}]);`
				`@ -1 +0,0 @@`
				(self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).push([[461],{32922:function(e,t,n){Promise.resolve().then(n.bind(n,12011))},12011:function(e,t,n){"use strict";n.r(t),n.d(t,{default:function(){return S}});var s=n(57437),o=n(2265),a=n(99376),i=n(20831),c=n(94789),l=n(12514),r=n(49804),u=n(67101),d=n(84264),m=n(49566),h=n(96761),x=n(84566),p=n(19250),f=n(14474),k=n(13634),j=n(73002),g=n(3914);function S(){let[e]=k.Z.useForm(),t=(0,a.useSearchParams)();(0,g.e)("token");let n=t.get("invitation_id"),[S,_]=(0,o.useState)(null),[w,Z]=(0,o.useState)(""),[N,b]=(0,o.useState)(""),[T,v]=(0,o.useState)(null),[y,E]=(0,o.useState)(""),[C,U]=(0,o.useState)("");return(0,o.useEffect)(()=>{n&&(0,p.W_)(n).then(e=>{let t=e.login_url;console.log("login_url:",t),E(t);let n=e.token,s=(0,f.o)(n);U(n),console.log("decoded:",s),_(s.key),console.log("decoded user email:",s.user_email),b(s.user_email),v(s.user_id)})},[n]),(0,s.jsx)("div",{className:"mx-auto w-full max-w-md mt-10",children:(0,s.jsxs)(l.Z,{children:[(0,s.jsx)(h.Z,{className:"text-sm mb-5 text-center",children:"\uD83D\uDE85 LiteLLM"}),(0,s.jsx)(h.Z,{className:"text-xl",children:"Sign up"}),(0,s.jsx)(d.Z,{children:"Claim your user account to login to Admin UI."}),(0,s.jsx)(c.Z,{className:"mt-4",title:"SSO",icon:x.GH$,color:"sky",children:(0,s.jsxs)(u.Z,{numItems:2,className:"flex justify-between items-center",children:[(0,s.jsx)(r.Z,{children:"SSO is under the Enterprise Tier."}),(0,s.jsx)(r.Z,{children:(0,s.jsx)(i.Z,{variant:"primary",className:"mb-2",children:(0,s.jsx)("a",{href:"https://forms.gle/W3U4PZpJGFHWtHyA9",target:"_blank",children:"Get Free Trial"})})})]})}),(0,s.jsxs)(k.Z,{className:"mt-10 mb-5 mx-auto",layout:"vertical",onFinish:e=>{console.log("in handle submit. accessToken:",S,"token:",C,"formValues:",e),S&&C&&(e.user_email=N,T&&n&&(0,p.m_)(S,n,T,e.password).then(e=>{var t;let n="/ui/";n+="?userID="+((null===(t=e.data)\|\|void 0===t?void 0:t.user_id)\|\|e.user_id),document.cookie="token="+C,console.log("redirecting to:",n),window.location.href=n}))},children:[(0,s.jsxs)(s.Fragment,{children:[(0,s.jsx)(k.Z.Item,{label:"Email Address",name:"user_email",children:(0,s.jsx)(m.Z,{type:"email",disabled:!0,value:N,defaultValue:N,className:"max-w-md"})}),(0,s.jsx)(k.Z.Item,{label:"Password",name:"password",rules:[{required:!0,message:"password required to sign up"}],help:"Create a password for your account",children:(0,s.jsx)(m.Z,{placeholder:"",type:"password",className:"max-w-md"})})]}),(0,s.jsx)("div",{className:"mt-10",children:(0,s.jsx)(j.ZP,{htmlType:"submit",children:"Sign Up"})})]})]})})}},3914:function(e,t,n){"use strict";function s(){let e=window.location.hostname,t=["Lax","Strict","None"];["/","/ui"].forEach(n=>{document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,";"),document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; domain=").concat(e,";"),t.forEach(t=>{let s="None"===t?" Secure;":"";document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; SameSite=").concat(t,";").concat(s),document.cookie="token=; expires=Thu, 01 Jan 1970 00:00:00 UTC; path=".concat(n,"; domain=").concat(e,"; SameSite=").concat(t,";").concat(s)})}),console.log("After clearing cookies:",document.cookie)}function o(e){let t=document.cookie.split("; ").find(t=>t.startsWith(e+"="));return t?t.split("=")[1]:null}n.d(t,{b:function(){return s},e:function(){return o}})}},function(e){e.O(0,[665,42,899,250,971,117,744],function(){return e(e.s=32922)}),_N_E=e.O()}]);