Merge branch 'main' into litellm_anthropic_responses_api_support

This commit is contained in:
Ishaan Jaff 2025-04-18 18:46:03 -07:00
commit f9d9b70538
10 changed files with 483 additions and 2089 deletions

View file

@ -1002,9 +1002,127 @@ Expected Response:
```
## **Azure Responses API**
| Property | Details |
|-------|-------|
| Description | Azure OpenAI Responses API |
| `custom_llm_provider` on LiteLLM | `azure/` |
| Supported Operations | `/v1/responses`|
| Azure OpenAI Responses API | [Azure OpenAI Responses API ↗](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure) |
| Cost Tracking, Logging Support | ✅ LiteLLM will log, track cost for Responses API Requests |
## Usage
## Create a model response
<Tabs>
<TabItem value="litellm-sdk" label="LiteLLM SDK">
#### Non-streaming
```python showLineNumbers title="Azure Responses API"
import litellm
# Non-streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)
print(response)
```
#### Streaming
```python showLineNumbers title="Azure Responses API"
import litellm
# Streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)
for event in response:
print(event)
```
</TabItem>
<TabItem value="proxy" label="OpenAI SDK with LiteLLM Proxy">
First, add this to your litellm proxy config.yaml:
```yaml showLineNumbers title="Azure Responses API"
model_list:
- model_name: o1-pro
litellm_params:
model: azure/o1-pro
api_key: os.environ/AZURE_RESPONSES_OPENAI_API_KEY
api_base: https://litellm8397336933.openai.azure.com/
api_version: 2023-03-15-preview
```
Start your LiteLLM proxy:
```bash
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
```
Then use the OpenAI SDK pointed to your proxy:
#### Non-streaming
```python showLineNumbers
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Non-streaming response
response = client.responses.create(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn."
)
print(response)
```
#### Streaming
```python showLineNumbers
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Streaming response
response = client.responses.create(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)
for event in response:
print(event)
```
</TabItem>
</Tabs>
## Advanced

View file

@ -24,7 +24,7 @@ LiteLLM provides a BETA endpoint in the spec of [OpenAI's `/responses` API](http
<TabItem value="litellm-sdk" label="LiteLLM SDK">
#### Non-streaming
```python
```python showLineNumbers
import litellm
# Non-streaming response
@ -38,7 +38,7 @@ print(response)
```
#### Streaming
```python
```python showLineNumbers
import litellm
# Streaming response
@ -56,7 +56,7 @@ for event in response:
<TabItem value="proxy" label="OpenAI SDK with LiteLLM Proxy">
First, add this to your litellm proxy config.yaml:
```yaml
```yaml showLineNumbers
model_list:
- model_name: o1-pro
litellm_params:
@ -74,7 +74,7 @@ litellm --config /path/to/config.yaml
Then use the OpenAI SDK pointed to your proxy:
#### Non-streaming
```python
```python showLineNumbers
from openai import OpenAI
# Initialize client with your proxy URL
@ -93,7 +93,7 @@ print(response)
```
#### Streaming
```python
```python showLineNumbers
from openai import OpenAI
# Initialize client with your proxy URL
@ -115,3 +115,11 @@ for event in response:
</TabItem>
</Tabs>
## **Supported Providers**
| Provider | Link to Usage |
|-------------|--------------------|
| OpenAI| [Usage](#usage) |
| Azure OpenAI| [Usage](../docs/providers/azure#responses-api) |

View file

@ -79,7 +79,7 @@ class AzureOpenAIO1Config(OpenAIOSeriesConfig):
return True
def is_o_series_model(self, model: str) -> bool:
return "o1" in model or "o3" in model or "o_series/" in model
return "o1" in model or "o3" in model or "o4" in model or "o_series/" in model
def transform_request(
self,

View file

@ -1471,6 +1471,64 @@
"litellm_provider": "openai",
"supported_endpoints": ["/v1/audio/speech"]
},
"azure/gpt-4.1": {
"max_tokens": 32768,
"max_input_tokens": 1047576,
"max_output_tokens": 32768,
"input_cost_per_token": 2e-6,
"output_cost_per_token": 8e-6,
"input_cost_per_token_batches": 1e-6,
"output_cost_per_token_batches": 4e-6,
"cache_read_input_token_cost": 0.5e-6,
"litellm_provider": "azure",
"mode": "chat",
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
"supported_modalities": ["text", "image"],
"supported_output_modalities": ["text"],
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true,
"supports_prompt_caching": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"supports_native_streaming": true,
"supports_web_search": true,
"search_context_cost_per_query": {
"search_context_size_low": 30e-3,
"search_context_size_medium": 35e-3,
"search_context_size_high": 50e-3
}
},
"azure/gpt-4.1-2025-04-14": {
"max_tokens": 32768,
"max_input_tokens": 1047576,
"max_output_tokens": 32768,
"input_cost_per_token": 2e-6,
"output_cost_per_token": 8e-6,
"input_cost_per_token_batches": 1e-6,
"output_cost_per_token_batches": 4e-6,
"cache_read_input_token_cost": 0.5e-6,
"litellm_provider": "azure",
"mode": "chat",
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
"supported_modalities": ["text", "image"],
"supported_output_modalities": ["text"],
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true,
"supports_prompt_caching": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"supports_native_streaming": true,
"supports_web_search": true,
"search_context_cost_per_query": {
"search_context_size_low": 30e-3,
"search_context_size_medium": 35e-3,
"search_context_size_high": 50e-3
}
},
"azure/gpt-4o-mini-realtime-preview-2024-12-17": {
"max_tokens": 4096,
"max_input_tokens": 128000,
@ -1647,6 +1705,23 @@
"supports_system_messages": true,
"supports_tool_choice": true
},
"azure/o4-mini-2025-04-16": {
"max_tokens": 100000,
"max_input_tokens": 200000,
"max_output_tokens": 100000,
"input_cost_per_token": 1.1e-6,
"output_cost_per_token": 4.4e-6,
"cache_read_input_token_cost": 2.75e-7,
"litellm_provider": "azure",
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": false,
"supports_vision": true,
"supports_prompt_caching": true,
"supports_response_schema": true,
"supports_reasoning": true,
"supports_tool_choice": true
},
"azure/o3-mini-2025-01-31": {
"max_tokens": 100000,
"max_input_tokens": 200000,

View file

@ -21,13 +21,12 @@ model_list:
model: databricks/databricks-claude-3-7-sonnet
api_key: os.environ/DATABRICKS_API_KEY
api_base: os.environ/DATABRICKS_API_BASE
- model_name: "gpt-4o-realtime-preview"
- model_name: "gpt-4.1"
litellm_params:
model: azure/gpt-4o-realtime-preview-2
model: azure/gpt-4.1
api_key: os.environ/AZURE_API_KEY_REALTIME
api_base: https://krris-m2f9a9i7-eastus2.openai.azure.com/
model_info:
base_model: azure/gpt-4o-realtime-preview-2024-10-01
litellm_settings:

View file

@ -1471,6 +1471,64 @@
"litellm_provider": "openai",
"supported_endpoints": ["/v1/audio/speech"]
},
"azure/gpt-4.1": {
"max_tokens": 32768,
"max_input_tokens": 1047576,
"max_output_tokens": 32768,
"input_cost_per_token": 2e-6,
"output_cost_per_token": 8e-6,
"input_cost_per_token_batches": 1e-6,
"output_cost_per_token_batches": 4e-6,
"cache_read_input_token_cost": 0.5e-6,
"litellm_provider": "azure",
"mode": "chat",
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
"supported_modalities": ["text", "image"],
"supported_output_modalities": ["text"],
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true,
"supports_prompt_caching": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"supports_native_streaming": true,
"supports_web_search": true,
"search_context_cost_per_query": {
"search_context_size_low": 30e-3,
"search_context_size_medium": 35e-3,
"search_context_size_high": 50e-3
}
},
"azure/gpt-4.1-2025-04-14": {
"max_tokens": 32768,
"max_input_tokens": 1047576,
"max_output_tokens": 32768,
"input_cost_per_token": 2e-6,
"output_cost_per_token": 8e-6,
"input_cost_per_token_batches": 1e-6,
"output_cost_per_token_batches": 4e-6,
"cache_read_input_token_cost": 0.5e-6,
"litellm_provider": "azure",
"mode": "chat",
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
"supported_modalities": ["text", "image"],
"supported_output_modalities": ["text"],
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true,
"supports_prompt_caching": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"supports_native_streaming": true,
"supports_web_search": true,
"search_context_cost_per_query": {
"search_context_size_low": 30e-3,
"search_context_size_medium": 35e-3,
"search_context_size_high": 50e-3
}
},
"azure/gpt-4o-mini-realtime-preview-2024-12-17": {
"max_tokens": 4096,
"max_input_tokens": 128000,
@ -1647,6 +1705,23 @@
"supports_system_messages": true,
"supports_tool_choice": true
},
"azure/o4-mini-2025-04-16": {
"max_tokens": 100000,
"max_input_tokens": 200000,
"max_output_tokens": 100000,
"input_cost_per_token": 1.1e-6,
"output_cost_per_token": 4.4e-6,
"cache_read_input_token_cost": 2.75e-7,
"litellm_provider": "azure",
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": false,
"supports_vision": true,
"supports_prompt_caching": true,
"supports_response_schema": true,
"supports_reasoning": true,
"supports_tool_choice": true
},
"azure/o3-mini-2025-01-31": {
"max_tokens": 100000,
"max_input_tokens": 200000,

View file

@ -23,18 +23,14 @@ export const columns = (
cell: ({ row }) => {
const model = row.original;
return (
<div className="overflow-hidden">
<Tooltip title={model.model_info.id}>
<Button
size="xs"
variant="light"
className="font-mono text-blue-500 bg-blue-50 hover:bg-blue-100 text-xs font-normal px-2 py-0.5 text-left overflow-hidden truncate max-w-[200px]"
onClick={() => setSelectedModelId(model.model_info.id)}
>
{model.model_info.id.slice(0, 7)}...
</Button>
</Tooltip>
</div>
<Tooltip title={model.model_info.id}>
<div
className="font-mono text-blue-500 bg-blue-50 hover:bg-blue-100 text-xs font-normal px-2 py-0.5 text-left w-full truncate whitespace-nowrap cursor-pointer"
onClick={() => setSelectedModelId(model.model_info.id)}
>
{model.model_info.id}
</div>
</Tooltip>
);
},
},
@ -45,9 +41,9 @@ export const columns = (
const displayName = getDisplayModelName(row.original) || "-";
return (
<Tooltip title={displayName}>
<p className="text-xs">
{displayName.length > 20 ? displayName.slice(0, 20) + "..." : displayName}
</p>
<div className="text-xs truncate whitespace-nowrap">
{displayName}
</div>
</Tooltip>
);
},
@ -88,11 +84,9 @@ export const columns = (
const model = row.original;
return (
<Tooltip title={model.litellm_model_name}>
<pre className="text-xs">
{model.litellm_model_name
? model.litellm_model_name.slice(0, 20) + (model.litellm_model_name.length > 20 ? "..." : "")
: "-"}
</pre>
<div className="text-xs truncate whitespace-nowrap">
{model.litellm_model_name || "-"}
</div>
</Tooltip>
);
},

View file

@ -6,6 +6,8 @@ import {
getSortedRowModel,
SortingState,
useReactTable,
ColumnResizeMode,
VisibilityState,
} from "@tanstack/react-table";
import React from "react";
import {
@ -16,7 +18,7 @@ import {
TableRow,
TableCell,
} from "@tremor/react";
import { SwitchVerticalIcon, ChevronUpIcon, ChevronDownIcon } from "@heroicons/react/outline";
import { SwitchVerticalIcon, ChevronUpIcon, ChevronDownIcon, TableIcon } from "@heroicons/react/outline";
interface ModelDataTableProps<TData, TValue> {
data: TData[];
@ -32,100 +34,205 @@ export function ModelDataTable<TData, TValue>({
const [sorting, setSorting] = React.useState<SortingState>([
{ id: "model_info.created_at", desc: true }
]);
const [columnResizeMode] = React.useState<ColumnResizeMode>("onChange");
const [columnSizing, setColumnSizing] = React.useState({});
const [columnVisibility, setColumnVisibility] = React.useState<VisibilityState>({});
const [isDropdownOpen, setIsDropdownOpen] = React.useState(false);
const dropdownRef = React.useRef<HTMLDivElement>(null);
React.useEffect(() => {
const handleClickOutside = (event: MouseEvent) => {
if (dropdownRef.current && !dropdownRef.current.contains(event.target as Node)) {
setIsDropdownOpen(false);
}
};
document.addEventListener('mousedown', handleClickOutside);
return () => document.removeEventListener('mousedown', handleClickOutside);
}, []);
const table = useReactTable({
data,
columns,
state: {
sorting,
columnSizing,
columnVisibility,
},
columnResizeMode,
onSortingChange: setSorting,
onColumnSizingChange: setColumnSizing,
onColumnVisibilityChange: setColumnVisibility,
getCoreRowModel: getCoreRowModel(),
getSortedRowModel: getSortedRowModel(),
enableSorting: true,
enableColumnResizing: true,
defaultColumn: {
minSize: 40,
maxSize: 500,
},
});
const getHeaderText = (header: any): string => {
if (typeof header === 'string') {
return header;
}
if (typeof header === 'function') {
const headerElement = header();
if (headerElement && headerElement.props && headerElement.props.children) {
const children = headerElement.props.children;
if (typeof children === 'string') {
return children;
}
if (children.props && children.props.children) {
return children.props.children;
}
}
}
return '';
};
return (
<div className="rounded-lg custom-border relative">
<div className="overflow-x-auto">
<Table className="[&_td]:py-0.5 [&_th]:py-1">
<TableHead>
{table.getHeaderGroups().map((headerGroup) => (
<TableRow key={headerGroup.id}>
{headerGroup.headers.map((header) => (
<TableHeaderCell
key={header.id}
className={`py-1 h-8 ${
header.id === 'actions'
? 'sticky right-0 bg-white shadow-[-4px_0_8px_-6px_rgba(0,0,0,0.1)]'
: ''
}`}
onClick={header.column.getToggleSortingHandler()}
>
<div className="flex items-center justify-between gap-2">
<div className="flex items-center">
{header.isPlaceholder ? null : (
flexRender(
header.column.columnDef.header,
header.getContext()
)
)}
</div>
{header.id !== 'actions' && (
<div className="w-4">
{header.column.getIsSorted() ? (
{
asc: <ChevronUpIcon className="h-4 w-4 text-blue-500" />,
desc: <ChevronDownIcon className="h-4 w-4 text-blue-500" />
}[header.column.getIsSorted() as string]
) : (
<SwitchVerticalIcon className="h-4 w-4 text-gray-400" />
<div className="space-y-4">
<div className="flex justify-end">
<div className="relative" ref={dropdownRef}>
<button
onClick={() => setIsDropdownOpen(!isDropdownOpen)}
className="flex items-center gap-2 px-3 py-2 text-sm font-medium text-gray-700 bg-white border border-gray-300 rounded-md hover:bg-gray-50 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-blue-500"
>
<TableIcon className="h-4 w-4" />
Columns
</button>
{isDropdownOpen && (
<div className="absolute right-0 mt-2 w-56 bg-white rounded-md shadow-lg ring-1 ring-black ring-opacity-5 z-50">
<div className="py-1">
{table.getAllLeafColumns().map((column) => {
if (column.id === 'actions') return null;
return (
<div
key={column.id}
className="flex items-center px-4 py-2 text-sm text-gray-700 hover:bg-gray-100 cursor-pointer"
onClick={() => column.toggleVisibility()}
>
<input
type="checkbox"
checked={column.getIsVisible()}
onChange={() => column.toggleVisibility()}
className="h-4 w-4 rounded border-gray-300 text-blue-600 focus:ring-blue-500"
/>
<span className="ml-2">{getHeaderText(column.columnDef.header)}</span>
</div>
);
})}
</div>
</div>
)}
</div>
</div>
<div className="rounded-lg custom-border relative">
<div className="overflow-x-auto">
<div className="relative min-w-full">
<Table className="[&_td]:py-0.5 [&_th]:py-1 w-full">
<TableHead>
{table.getHeaderGroups().map((headerGroup) => (
<TableRow key={headerGroup.id}>
{headerGroup.headers.map((header) => (
<TableHeaderCell
key={header.id}
className={`py-1 h-8 relative ${
header.id === 'actions'
? 'sticky right-0 bg-white shadow-[-4px_0_8px_-6px_rgba(0,0,0,0.1)] z-10 w-[120px] ml-8'
: ''
}`}
style={{
width: header.id === 'actions' ? 120 : header.getSize(),
position: header.id === 'actions' ? 'sticky' : 'relative',
right: header.id === 'actions' ? 0 : 'auto',
}}
onClick={header.column.getToggleSortingHandler()}
>
<div className="flex items-center justify-between gap-2">
<div className="flex items-center">
{header.isPlaceholder ? null : (
flexRender(
header.column.columnDef.header,
header.getContext()
)
)}
</div>
{header.id !== 'actions' && (
<div className="w-4">
{header.column.getIsSorted() ? (
{
asc: <ChevronUpIcon className="h-4 w-4 text-blue-500" />,
desc: <ChevronDownIcon className="h-4 w-4 text-blue-500" />
}[header.column.getIsSorted() as string]
) : (
<SwitchVerticalIcon className="h-4 w-4 text-gray-400" />
)}
</div>
)}
</div>
)}
</div>
</TableHeaderCell>
{header.column.getCanResize() && (
<div
onMouseDown={header.getResizeHandler()}
onTouchStart={header.getResizeHandler()}
className={`absolute right-0 top-0 h-full w-2 cursor-col-resize select-none touch-none ${
header.column.getIsResizing() ? 'bg-blue-500' : 'hover:bg-blue-200'
}`}
/>
)}
</TableHeaderCell>
))}
</TableRow>
))}
</TableRow>
))}
</TableHead>
<TableBody>
{isLoading ? (
<TableRow>
<TableCell colSpan={columns.length} className="h-8 text-center">
<div className="text-center text-gray-500">
<p>🚅 Loading models...</p>
</div>
</TableCell>
</TableRow>
) : table.getRowModel().rows.length > 0 ? (
table.getRowModel().rows.map((row) => (
<TableRow key={row.id} className="h-8">
{row.getVisibleCells().map((cell) => (
<TableCell
key={cell.id}
className={`py-0.5 max-h-8 overflow-hidden text-ellipsis whitespace-nowrap ${
cell.column.id === 'actions'
? 'sticky right-0 bg-white shadow-[-4px_0_8px_-6px_rgba(0,0,0,0.1)]'
: ''
}`}
>
{flexRender(cell.column.columnDef.cell, cell.getContext())}
</TableHead>
<TableBody>
{isLoading ? (
<TableRow>
<TableCell colSpan={columns.length} className="h-8 text-center">
<div className="text-center text-gray-500">
<p>🚅 Loading models...</p>
</div>
</TableCell>
))}
</TableRow>
))
) : (
<TableRow>
<TableCell colSpan={columns.length} className="h-8 text-center">
<div className="text-center text-gray-500">
<p>No models found</p>
</div>
</TableCell>
</TableRow>
)}
</TableBody>
</Table>
</TableRow>
) : table.getRowModel().rows.length > 0 ? (
table.getRowModel().rows.map((row) => (
<TableRow key={row.id} className="h-8">
{row.getVisibleCells().map((cell) => (
<TableCell
key={cell.id}
className={`py-0.5 max-h-8 overflow-hidden text-ellipsis whitespace-nowrap ${
cell.column.id === 'actions'
? 'sticky right-0 bg-white shadow-[-4px_0_8px_-6px_rgba(0,0,0,0.1)] z-10 w-[120px] ml-8'
: ''
}`}
style={{
width: cell.column.id === 'actions' ? 120 : cell.column.getSize(),
minWidth: cell.column.id === 'actions' ? 120 : cell.column.getSize(),
maxWidth: cell.column.id === 'actions' ? 120 : cell.column.getSize(),
position: cell.column.id === 'actions' ? 'sticky' : 'relative',
right: cell.column.id === 'actions' ? 0 : 'auto',
}}
>
{flexRender(cell.column.columnDef.cell, cell.getContext())}
</TableCell>
))}
</TableRow>
))
) : (
<TableRow>
<TableCell colSpan={columns.length} className="h-8 text-center">
<div className="text-center text-gray-500">
<p>No models found</p>
</div>
</TableCell>
</TableRow>
)}
</TableBody>
</Table>
</div>
</div>
</div>
</div>
);

1972
ui/package-lock.json generated

File diff suppressed because it is too large Load diff

View file

@ -1,10 +0,0 @@
{
"dependencies": {
"@headlessui/react": "^1.7.18",
"@headlessui/tailwindcss": "^0.2.0",
"@tremor/react": "^3.13.3"
},
"devDependencies": {
"@tailwindcss/forms": "^0.5.7"
}
}