mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-24 18:24:20 +00:00
Litellm release notes 04 19 2025 (#10169)
* docs(index.md): initial draft release notes * docs: note all pending docs * build(model_prices_and_context_window.json): add o3, gpt-4.1, o4-mini pricing * docs(vllm.md): update vllm doc to show file message type support * docs(mistral.md): add mistral passthrough route doc * docs(gemini.md): add gemini thinking to docs * docs(vertex.md): add thinking/reasoning content for gemini models to docs * docs(index.md): more links * docs(index.md): add more links, images * docs(index.md): cleanup highlights
This commit is contained in:
parent
daf024bad1
commit
bbfcb1ac7e
19 changed files with 1409 additions and 10 deletions
|
@ -4,7 +4,7 @@ Pass-through endpoints for Cohere - call provider-specific endpoint, in native f
|
|||
|
||||
| Feature | Supported | Notes |
|
||||
|-------|-------|-------|
|
||||
| Cost Tracking | ✅ | works across all integrations |
|
||||
| Cost Tracking | ✅ | Supported for `/v1/chat`, and `/v2/chat` |
|
||||
| Logging | ✅ | works across all integrations |
|
||||
| End-user Tracking | ❌ | [Tell us if you need this](https://github.com/BerriAI/litellm/issues/new) |
|
||||
| Streaming | ✅ | |
|
||||
|
|
217
docs/my-website/docs/pass_through/mistral.md
Normal file
217
docs/my-website/docs/pass_through/mistral.md
Normal file
|
@ -0,0 +1,217 @@
|
|||
# Mistral
|
||||
|
||||
Pass-through endpoints for Mistral - call provider-specific endpoint, in native format (no translation).
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|-------|-------|-------|
|
||||
| Cost Tracking | ❌ | Not supported |
|
||||
| Logging | ✅ | works across all integrations |
|
||||
| End-user Tracking | ❌ | [Tell us if you need this](https://github.com/BerriAI/litellm/issues/new) |
|
||||
| Streaming | ✅ | |
|
||||
|
||||
Just replace `https://api.mistral.ai/v1` with `LITELLM_PROXY_BASE_URL/mistral` 🚀
|
||||
|
||||
#### **Example Usage**
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/ocr' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-d '{
|
||||
"model": "mistral-ocr-latest",
|
||||
"document": {
|
||||
"type": "image_url",
|
||||
"image_url": "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png"
|
||||
}
|
||||
|
||||
}'
|
||||
```
|
||||
|
||||
Supports **ALL** Mistral Endpoints (including streaming).
|
||||
|
||||
## Quick Start
|
||||
|
||||
Let's call the Mistral [`/chat/completions` endpoint](https://docs.mistral.ai/api/#tag/chat/operation/chat_completion_v1_chat_completions_post)
|
||||
|
||||
1. Add MISTRAL_API_KEY to your environment
|
||||
|
||||
```bash
|
||||
export MISTRAL_API_KEY="sk-1234"
|
||||
```
|
||||
|
||||
2. Start LiteLLM Proxy
|
||||
|
||||
```bash
|
||||
litellm
|
||||
|
||||
# RUNNING on http://0.0.0.0:4000
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
Let's call the Mistral `/ocr` endpoint
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/ocr' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-d '{
|
||||
"model": "mistral-ocr-latest",
|
||||
"document": {
|
||||
"type": "image_url",
|
||||
"image_url": "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png"
|
||||
}
|
||||
|
||||
}'
|
||||
```
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
Anything after `http://0.0.0.0:4000/mistral` is treated as a provider-specific route, and handled accordingly.
|
||||
|
||||
Key Changes:
|
||||
|
||||
| **Original Endpoint** | **Replace With** |
|
||||
|------------------------------------------------------|-----------------------------------|
|
||||
| `https://api.mistral.ai/v1` | `http://0.0.0.0:4000/mistral` (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000") |
|
||||
| `bearer $MISTRAL_API_KEY` | `bearer anything` (use `bearer LITELLM_VIRTUAL_KEY` if Virtual Keys are setup on proxy) |
|
||||
|
||||
|
||||
### **Example 1: OCR endpoint**
|
||||
|
||||
#### LiteLLM Proxy Call
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/ocr' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer $LITELLM_API_KEY' \
|
||||
-d '{
|
||||
"model": "mistral-ocr-latest",
|
||||
"document": {
|
||||
"type": "image_url",
|
||||
"image_url": "https://raw.githubusercontent.com/mistralai/cookbook/refs/heads/main/mistral/ocr/receipt.png"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
|
||||
#### Direct Mistral API Call
|
||||
|
||||
```bash
|
||||
curl https://api.mistral.ai/v1/ocr \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer ${MISTRAL_API_KEY}" \
|
||||
-d '{
|
||||
"model": "mistral-ocr-latest",
|
||||
"document": {
|
||||
"type": "document_url",
|
||||
"document_url": "https://arxiv.org/pdf/2201.04234"
|
||||
},
|
||||
"include_image_base64": true
|
||||
}'
|
||||
```
|
||||
|
||||
### **Example 2: Chat API**
|
||||
|
||||
#### LiteLLM Proxy Call
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
|
||||
-d '{
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "I am going to Paris, what should I see?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.8,
|
||||
"top_p": 0.1,
|
||||
"model": "mistral-large-latest",
|
||||
}'
|
||||
```
|
||||
|
||||
#### Direct Mistral API Call
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'https://api.mistral.ai/v1/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "I am going to Paris, what should I see?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.8,
|
||||
"top_p": 0.1,
|
||||
"model": "mistral-large-latest",
|
||||
}'
|
||||
```
|
||||
|
||||
|
||||
## Advanced - Use with Virtual Keys
|
||||
|
||||
Pre-requisites
|
||||
- [Setup proxy with DB](../proxy/virtual_keys.md#setup)
|
||||
|
||||
Use this, to avoid giving developers the raw Mistral API key, but still letting them use Mistral endpoints.
|
||||
|
||||
### Usage
|
||||
|
||||
1. Setup environment
|
||||
|
||||
```bash
|
||||
export DATABASE_URL=""
|
||||
export LITELLM_MASTER_KEY=""
|
||||
export MISTRAL_API_BASE=""
|
||||
```
|
||||
|
||||
```bash
|
||||
litellm
|
||||
|
||||
# RUNNING on http://0.0.0.0:4000
|
||||
```
|
||||
|
||||
2. Generate virtual key
|
||||
|
||||
```bash
|
||||
curl -X POST 'http://0.0.0.0:4000/key/generate' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{}'
|
||||
```
|
||||
|
||||
Expected Response
|
||||
|
||||
```bash
|
||||
{
|
||||
...
|
||||
"key": "sk-1234ewknldferwedojwojw"
|
||||
}
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/mistral/v1/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234ewknldferwedojwojw' \
|
||||
--data '{
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "I am going to Paris, what should I see?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.8,
|
||||
"top_p": 0.1,
|
||||
"model": "qwen2.5-7b-instruct",
|
||||
}'
|
||||
```
|
185
docs/my-website/docs/pass_through/vllm.md
Normal file
185
docs/my-website/docs/pass_through/vllm.md
Normal file
|
@ -0,0 +1,185 @@
|
|||
# VLLM
|
||||
|
||||
Pass-through endpoints for VLLM - call provider-specific endpoint, in native format (no translation).
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|-------|-------|-------|
|
||||
| Cost Tracking | ❌ | Not supported |
|
||||
| Logging | ✅ | works across all integrations |
|
||||
| End-user Tracking | ❌ | [Tell us if you need this](https://github.com/BerriAI/litellm/issues/new) |
|
||||
| Streaming | ✅ | |
|
||||
|
||||
Just replace `https://my-vllm-server.com` with `LITELLM_PROXY_BASE_URL/vllm` 🚀
|
||||
|
||||
#### **Example Usage**
|
||||
|
||||
```bash
|
||||
curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
```
|
||||
|
||||
Supports **ALL** VLLM Endpoints (including streaming).
|
||||
|
||||
## Quick Start
|
||||
|
||||
Let's call the VLLM [`/metrics` endpoint](https://vllm.readthedocs.io/en/latest/api_reference/api_reference.html)
|
||||
|
||||
1. Add HOSTED VLLM API BASE to your environment
|
||||
|
||||
```bash
|
||||
export HOSTED_VLLM_API_BASE="https://my-vllm-server.com"
|
||||
```
|
||||
|
||||
2. Start LiteLLM Proxy
|
||||
|
||||
```bash
|
||||
litellm
|
||||
|
||||
# RUNNING on http://0.0.0.0:4000
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
Let's call the VLLM `/metrics` endpoint
|
||||
|
||||
```bash
|
||||
curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
```
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
Anything after `http://0.0.0.0:4000/vllm` is treated as a provider-specific route, and handled accordingly.
|
||||
|
||||
Key Changes:
|
||||
|
||||
| **Original Endpoint** | **Replace With** |
|
||||
|------------------------------------------------------|-----------------------------------|
|
||||
| `https://my-vllm-server.com` | `http://0.0.0.0:4000/vllm` (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000") |
|
||||
| `bearer $VLLM_API_KEY` | `bearer anything` (use `bearer LITELLM_VIRTUAL_KEY` if Virtual Keys are setup on proxy) |
|
||||
|
||||
|
||||
### **Example 1: Metrics endpoint**
|
||||
|
||||
#### LiteLLM Proxy Call
|
||||
|
||||
```bash
|
||||
curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
|
||||
```
|
||||
|
||||
|
||||
#### Direct VLLM API Call
|
||||
|
||||
```bash
|
||||
curl -L -X GET 'https://my-vllm-server.com/metrics' \
|
||||
-H 'Content-Type: application/json' \
|
||||
```
|
||||
|
||||
### **Example 2: Chat API**
|
||||
|
||||
#### LiteLLM Proxy Call
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
|
||||
-d '{
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "I am going to Paris, what should I see?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.8,
|
||||
"top_p": 0.1,
|
||||
"model": "qwen2.5-7b-instruct",
|
||||
}'
|
||||
```
|
||||
|
||||
#### Direct VLLM API Call
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'https://my-vllm-server.com/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "I am going to Paris, what should I see?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.8,
|
||||
"top_p": 0.1,
|
||||
"model": "qwen2.5-7b-instruct",
|
||||
}'
|
||||
```
|
||||
|
||||
|
||||
## Advanced - Use with Virtual Keys
|
||||
|
||||
Pre-requisites
|
||||
- [Setup proxy with DB](../proxy/virtual_keys.md#setup)
|
||||
|
||||
Use this, to avoid giving developers the raw Cohere API key, but still letting them use Cohere endpoints.
|
||||
|
||||
### Usage
|
||||
|
||||
1. Setup environment
|
||||
|
||||
```bash
|
||||
export DATABASE_URL=""
|
||||
export LITELLM_MASTER_KEY=""
|
||||
export HOSTED_VLLM_API_BASE=""
|
||||
```
|
||||
|
||||
```bash
|
||||
litellm
|
||||
|
||||
# RUNNING on http://0.0.0.0:4000
|
||||
```
|
||||
|
||||
2. Generate virtual key
|
||||
|
||||
```bash
|
||||
curl -X POST 'http://0.0.0.0:4000/key/generate' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{}'
|
||||
```
|
||||
|
||||
Expected Response
|
||||
|
||||
```bash
|
||||
{
|
||||
...
|
||||
"key": "sk-1234ewknldferwedojwojw"
|
||||
}
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
|
||||
```bash
|
||||
curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer sk-1234ewknldferwedojwojw' \
|
||||
--data '{
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "I am going to Paris, what should I see?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.8,
|
||||
"top_p": 0.1,
|
||||
"model": "qwen2.5-7b-instruct",
|
||||
}'
|
||||
```
|
|
@ -1011,8 +1011,7 @@ Expected Response:
|
|||
| Supported Operations | `/v1/responses`|
|
||||
| Azure OpenAI Responses API | [Azure OpenAI Responses API ↗](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure) |
|
||||
| Cost Tracking, Logging Support | ✅ LiteLLM will log, track cost for Responses API Requests |
|
||||
|
||||
|
||||
| Supported OpenAI Params | ✅ All OpenAI params are supported, [See here](https://github.com/BerriAI/litellm/blob/0717369ae6969882d149933da48eeb8ab0e691bd/litellm/llms/openai/responses/transformation.py#L23) |
|
||||
|
||||
## Usage
|
||||
|
||||
|
|
|
@ -39,14 +39,164 @@ response = completion(
|
|||
- temperature
|
||||
- top_p
|
||||
- max_tokens
|
||||
- max_completion_tokens
|
||||
- stream
|
||||
- tools
|
||||
- tool_choice
|
||||
- functions
|
||||
- response_format
|
||||
- n
|
||||
- stop
|
||||
- logprobs
|
||||
- frequency_penalty
|
||||
- modalities
|
||||
- reasoning_content
|
||||
|
||||
**Anthropic Params**
|
||||
- thinking (used to set max budget tokens across anthropic/gemini models)
|
||||
|
||||
[**See Updated List**](https://github.com/BerriAI/litellm/blob/main/litellm/llms/gemini/chat/transformation.py#L70)
|
||||
|
||||
|
||||
|
||||
## Usage - Thinking / `reasoning_content`
|
||||
|
||||
LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362)
|
||||
|
||||
**Mapping**
|
||||
|
||||
| reasoning_effort | thinking |
|
||||
| ---------------- | -------- |
|
||||
| "low" | "budget_tokens": 1024 |
|
||||
| "medium" | "budget_tokens": 2048 |
|
||||
| "high" | "budget_tokens": 4096 |
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
resp = completion(
|
||||
model="gemini/gemini-2.5-flash-preview-04-17",
|
||||
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||
reasoning_effort="low",
|
||||
)
|
||||
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
1. Setup config.yaml
|
||||
|
||||
```yaml
|
||||
- model_name: gemini-2.5-flash
|
||||
litellm_params:
|
||||
model: gemini/gemini-2.5-flash-preview-04-17
|
||||
api_key: os.environ/GEMINI_API_KEY
|
||||
```
|
||||
|
||||
2. Start proxy
|
||||
|
||||
```bash
|
||||
litellm --config /path/to/config.yaml
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
```bash
|
||||
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
|
||||
-d '{
|
||||
"model": "gemini-2.5-flash",
|
||||
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||
"reasoning_effort": "low"
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
**Expected Response**
|
||||
|
||||
```python
|
||||
ModelResponse(
|
||||
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
|
||||
created=1740470510,
|
||||
model='claude-3-7-sonnet-20250219',
|
||||
object='chat.completion',
|
||||
system_fingerprint=None,
|
||||
choices=[
|
||||
Choices(
|
||||
finish_reason='stop',
|
||||
index=0,
|
||||
message=Message(
|
||||
content="The capital of France is Paris.",
|
||||
role='assistant',
|
||||
tool_calls=None,
|
||||
function_call=None,
|
||||
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
|
||||
),
|
||||
)
|
||||
],
|
||||
usage=Usage(
|
||||
completion_tokens=68,
|
||||
prompt_tokens=42,
|
||||
total_tokens=110,
|
||||
completion_tokens_details=None,
|
||||
prompt_tokens_details=PromptTokensDetailsWrapper(
|
||||
audio_tokens=None,
|
||||
cached_tokens=0,
|
||||
text_tokens=None,
|
||||
image_tokens=None
|
||||
),
|
||||
cache_creation_input_tokens=0,
|
||||
cache_read_input_tokens=0
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### Pass `thinking` to Gemini models
|
||||
|
||||
You can also pass the `thinking` parameter to Gemini models.
|
||||
|
||||
This is translated to Gemini's [`thinkingConfig` parameter](https://ai.google.dev/gemini-api/docs/thinking#set-budget).
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
response = litellm.completion(
|
||||
model="gemini/gemini-2.5-flash-preview-04-17",
|
||||
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||
thinking={"type": "enabled", "budget_tokens": 1024},
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
```bash
|
||||
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $LITELLM_KEY" \
|
||||
-d '{
|
||||
"model": "gemini/gemini-2.5-flash-preview-04-17",
|
||||
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||
"thinking": {"type": "enabled", "budget_tokens": 1024}
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
|
||||
|
||||
[**See Updated List**](https://github.com/BerriAI/litellm/blob/1c747f3ad372399c5b95cc5696b06a5fbe53186b/litellm/llms/vertex_httpx.py#L122)
|
||||
|
||||
## Passing Gemini Specific Params
|
||||
### Response schema
|
||||
|
|
|
@ -163,6 +163,12 @@ os.environ["OPENAI_API_BASE"] = "openaiai-api-base" # OPTIONAL
|
|||
|
||||
| Model Name | Function Call |
|
||||
|-----------------------|-----------------------------------------------------------------|
|
||||
| gpt-4.1 | `response = completion(model="gpt-4.1", messages=messages)` |
|
||||
| gpt-4.1-mini | `response = completion(model="gpt-4.1-mini", messages=messages)` |
|
||||
| gpt-4.1-nano | `response = completion(model="gpt-4.1-nano", messages=messages)` |
|
||||
| o4-mini | `response = completion(model="o4-mini", messages=messages)` |
|
||||
| o3-mini | `response = completion(model="o3-mini", messages=messages)` |
|
||||
| o3 | `response = completion(model="o3", messages=messages)` |
|
||||
| o1-mini | `response = completion(model="o1-mini", messages=messages)` |
|
||||
| o1-preview | `response = completion(model="o1-preview", messages=messages)` |
|
||||
| gpt-4o-mini | `response = completion(model="gpt-4o-mini", messages=messages)` |
|
||||
|
|
|
@ -542,6 +542,154 @@ print(resp)
|
|||
```
|
||||
|
||||
|
||||
### **Thinking / `reasoning_content`**
|
||||
|
||||
LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362)
|
||||
|
||||
**Mapping**
|
||||
|
||||
| reasoning_effort | thinking |
|
||||
| ---------------- | -------- |
|
||||
| "low" | "budget_tokens": 1024 |
|
||||
| "medium" | "budget_tokens": 2048 |
|
||||
| "high" | "budget_tokens": 4096 |
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
# !gcloud auth application-default login - run this to add vertex credentials to your env
|
||||
|
||||
resp = completion(
|
||||
model="vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||
reasoning_effort="low",
|
||||
vertex_project="project-id",
|
||||
vertex_location="us-central1"
|
||||
)
|
||||
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
1. Setup config.yaml
|
||||
|
||||
```yaml
|
||||
- model_name: gemini-2.5-flash
|
||||
litellm_params:
|
||||
model: vertex_ai/gemini-2.5-flash-preview-04-17
|
||||
vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
|
||||
vertex_project: "project-id"
|
||||
vertex_location: "us-central1"
|
||||
```
|
||||
|
||||
2. Start proxy
|
||||
|
||||
```bash
|
||||
litellm --config /path/to/config.yaml
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
```bash
|
||||
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
|
||||
-d '{
|
||||
"model": "gemini-2.5-flash",
|
||||
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||
"reasoning_effort": "low"
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
**Expected Response**
|
||||
|
||||
```python
|
||||
ModelResponse(
|
||||
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
|
||||
created=1740470510,
|
||||
model='claude-3-7-sonnet-20250219',
|
||||
object='chat.completion',
|
||||
system_fingerprint=None,
|
||||
choices=[
|
||||
Choices(
|
||||
finish_reason='stop',
|
||||
index=0,
|
||||
message=Message(
|
||||
content="The capital of France is Paris.",
|
||||
role='assistant',
|
||||
tool_calls=None,
|
||||
function_call=None,
|
||||
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
|
||||
),
|
||||
)
|
||||
],
|
||||
usage=Usage(
|
||||
completion_tokens=68,
|
||||
prompt_tokens=42,
|
||||
total_tokens=110,
|
||||
completion_tokens_details=None,
|
||||
prompt_tokens_details=PromptTokensDetailsWrapper(
|
||||
audio_tokens=None,
|
||||
cached_tokens=0,
|
||||
text_tokens=None,
|
||||
image_tokens=None
|
||||
),
|
||||
cache_creation_input_tokens=0,
|
||||
cache_read_input_tokens=0
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
#### Pass `thinking` to Gemini models
|
||||
|
||||
You can also pass the `thinking` parameter to Gemini models.
|
||||
|
||||
This is translated to Gemini's [`thinkingConfig` parameter](https://ai.google.dev/gemini-api/docs/thinking#set-budget).
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
# !gcloud auth application-default login - run this to add vertex credentials to your env
|
||||
|
||||
response = litellm.completion(
|
||||
model="vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||
thinking={"type": "enabled", "budget_tokens": 1024},
|
||||
vertex_project="project-id",
|
||||
vertex_location="us-central1"
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
```bash
|
||||
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $LITELLM_KEY" \
|
||||
-d '{
|
||||
"model": "vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||
"thinking": {"type": "enabled", "budget_tokens": 1024}
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
### **Context Caching**
|
||||
|
||||
Use Vertex AI context caching is supported by calling provider api directly. (Unified Endpoint support comin soon.).
|
||||
|
|
|
@ -161,6 +161,120 @@ curl -L -X POST 'http://0.0.0.0:4000/embeddings' \
|
|||
|
||||
Example Implementation from VLLM [here](https://github.com/vllm-project/vllm/pull/10020)
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="files_message" label="(Unified) Files Message">
|
||||
|
||||
Use this to send a video url to VLLM + Gemini in the same format, using OpenAI's `files` message type.
|
||||
|
||||
There are two ways to send a video url to VLLM:
|
||||
|
||||
1. Pass the video url directly
|
||||
|
||||
```
|
||||
{"type": "file", "file": {"file_id": video_url}},
|
||||
```
|
||||
|
||||
2. Pass the video data as base64
|
||||
|
||||
```
|
||||
{"type": "file", "file": {"file_data": f"data:video/mp4;base64,{video_data_base64}"}}
|
||||
```
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": "Summarize the following video"
|
||||
},
|
||||
{
|
||||
"type": "file",
|
||||
"file": {
|
||||
"file_id": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
|
||||
# call vllm
|
||||
os.environ["HOSTED_VLLM_API_BASE"] = "https://hosted-vllm-api.co"
|
||||
os.environ["HOSTED_VLLM_API_KEY"] = "" # [optional], if your VLLM server requires an API key
|
||||
response = completion(
|
||||
model="hosted_vllm/qwen", # pass the vllm model name
|
||||
messages=messages,
|
||||
)
|
||||
|
||||
# call gemini
|
||||
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"
|
||||
response = completion(
|
||||
model="gemini/gemini-1.5-flash", # pass the gemini model name
|
||||
messages=messages,
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
1. Setup config.yaml
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: my-model
|
||||
litellm_params:
|
||||
model: hosted_vllm/qwen # add hosted_vllm/ prefix to route as OpenAI provider
|
||||
api_base: https://hosted-vllm-api.co # add api base for OpenAI compatible provider
|
||||
- model_name: my-gemini-model
|
||||
litellm_params:
|
||||
model: gemini/gemini-1.5-flash # add gemini/ prefix to route as Google AI Studio provider
|
||||
api_key: os.environ/GEMINI_API_KEY
|
||||
```
|
||||
|
||||
2. Start the proxy
|
||||
|
||||
```bash
|
||||
$ litellm --config /path/to/config.yaml
|
||||
|
||||
# RUNNING on http://0.0.0.0:4000
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
```bash
|
||||
curl -X POST http://0.0.0.0:4000/chat/completions \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "my-model",
|
||||
"messages": [
|
||||
{"role": "user", "content":
|
||||
[
|
||||
{"type": "text", "text": "Summarize the following video"},
|
||||
{"type": "file", "file": {"file_id": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}}
|
||||
]
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="video_url" label="(VLLM-specific) Video Message">
|
||||
|
||||
Use this to send a video url to VLLM in it's native message format (`video_url`).
|
||||
|
||||
There are two ways to send a video url to VLLM:
|
||||
|
||||
1. Pass the video url directly
|
||||
|
@ -249,6 +363,10 @@ curl -X POST http://0.0.0.0:4000/chat/completions \
|
|||
</Tabs>
|
||||
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
## (Deprecated) for `vllm pip package`
|
||||
### Using - `litellm.completion`
|
||||
|
||||
|
|
108
docs/my-website/docs/proxy/model_discovery.md
Normal file
108
docs/my-website/docs/proxy/model_discovery.md
Normal file
|
@ -0,0 +1,108 @@
|
|||
# Model Discovery
|
||||
|
||||
Use this to give users an accurate list of models available behind provider endpoint, when calling `/v1/models` for wildcard models.
|
||||
|
||||
## Supported Models
|
||||
|
||||
- Fireworks AI
|
||||
- OpenAI
|
||||
- Gemini
|
||||
- LiteLLM Proxy
|
||||
- Topaz
|
||||
- Anthropic
|
||||
- XAI
|
||||
- VLLM
|
||||
- Vertex AI
|
||||
|
||||
### Usage
|
||||
|
||||
**1. Setup config.yaml**
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: xai/*
|
||||
litellm_params:
|
||||
model: xai/*
|
||||
api_key: os.environ/XAI_API_KEY
|
||||
|
||||
litellm_settings:
|
||||
check_provider_endpoint: true # 👈 Enable checking provider endpoint for wildcard models
|
||||
```
|
||||
|
||||
**2. Start proxy**
|
||||
|
||||
```bash
|
||||
litellm --config /path/to/config.yaml
|
||||
|
||||
# RUNNING on http://0.0.0.0:4000
|
||||
```
|
||||
|
||||
**3. Call `/v1/models`**
|
||||
|
||||
```bash
|
||||
curl -X GET "http://localhost:4000/v1/models" -H "Authorization: Bearer $LITELLM_KEY"
|
||||
```
|
||||
|
||||
Expected response
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"id": "xai/grok-2-1212",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-2-vision-1212",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-3-beta",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-3-fast-beta",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-3-mini-beta",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-3-mini-fast-beta",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-beta",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-vision-beta",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "xai/grok-2-image-1212",
|
||||
"object": "model",
|
||||
"created": 1677610602,
|
||||
"owned_by": "openai"
|
||||
}
|
||||
],
|
||||
"object": "list"
|
||||
}
|
||||
```
|
|
@ -16,6 +16,8 @@ Supported Providers:
|
|||
- Vertex AI (Anthropic) (`vertexai/`)
|
||||
- OpenRouter (`openrouter/`)
|
||||
- XAI (`xai/`)
|
||||
- Google AI Studio (`google/`)
|
||||
- Vertex AI (`vertex_ai/`)
|
||||
|
||||
LiteLLM will standardize the `reasoning_content` in the response and `thinking_blocks` in the assistant message.
|
||||
|
||||
|
@ -23,7 +25,7 @@ LiteLLM will standardize the `reasoning_content` in the response and `thinking_b
|
|||
"message": {
|
||||
...
|
||||
"reasoning_content": "The capital of France is Paris.",
|
||||
"thinking_blocks": [
|
||||
"thinking_blocks": [ # only returned for Anthropic models
|
||||
{
|
||||
"type": "thinking",
|
||||
"thinking": "The capital of France is Paris.",
|
||||
|
|
|
@ -520,6 +520,3 @@ for event in response:
|
|||
| `azure_ai` | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
|
||||
| All other llm api providers | [See supported parameters here](https://github.com/BerriAI/litellm/blob/f39d9178868662746f159d5ef642c7f34f9bfe5f/litellm/responses/litellm_completion_transformation/transformation.py#L57) |
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
BIN
docs/my-website/img/release_notes/new_tag_usage.png
Normal file
BIN
docs/my-website/img/release_notes/new_tag_usage.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 207 KiB |
BIN
docs/my-website/img/release_notes/new_team_usage.png
Normal file
BIN
docs/my-website/img/release_notes/new_team_usage.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 268 KiB |
BIN
docs/my-website/img/release_notes/new_team_usage_highlight.jpg
Normal file
BIN
docs/my-website/img/release_notes/new_team_usage_highlight.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 999 KiB |
136
docs/my-website/release_notes/v1.67.0-stable/index.md
Normal file
136
docs/my-website/release_notes/v1.67.0-stable/index.md
Normal file
|
@ -0,0 +1,136 @@
|
|||
---
|
||||
title: v1.67.0-stable - Unified Responses API
|
||||
slug: v1.67.0-stable
|
||||
date: 2025-04-19T10:00:00
|
||||
authors:
|
||||
- name: Krrish Dholakia
|
||||
title: CEO, LiteLLM
|
||||
url: https://www.linkedin.com/in/krish-d/
|
||||
image_url: https://media.licdn.com/dms/image/v2/D4D03AQGrlsJ3aqpHmQ/profile-displayphoto-shrink_400_400/B4DZSAzgP7HYAg-/0/1737327772964?e=1749686400&v=beta&t=Hkl3U8Ps0VtvNxX0BNNq24b4dtX5wQaPFp6oiKCIHD8
|
||||
- name: Ishaan Jaffer
|
||||
title: CTO, LiteLLM
|
||||
url: https://www.linkedin.com/in/reffajnaahsi/
|
||||
image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
|
||||
|
||||
tags: ["sso", "unified_file_id", "cost_tracking", "security"]
|
||||
hide_table_of_contents: false
|
||||
---
|
||||
import Image from '@theme/IdealImage';
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
## Key Highlights
|
||||
|
||||
- **Team and Tag based usage tracking**: You can now see usage and spend by team and tag at 1M+ spend logs.
|
||||
- **SCIM Integration**: Enables identity providers (Okta, Azure AD, OneLogin, etc.) to automate user and team (group) provisioning, updates, and deprovisioning
|
||||
- **Unified Responses API**: Support for calling Anthropic, Gemini, Groq, etc. via OpenAI's new Responses API.
|
||||
|
||||
Let's dive in.
|
||||
|
||||
## Team and Tag based usage tracking
|
||||
|
||||
<Image img={require('../../img/release_notes/new_team_usage_highlight.jpg')}/>
|
||||
|
||||
|
||||
This release improves team and tag based usage tracking at 1m+ spend logs, making it easy to monitor your LLM API Spend in production. This covers:
|
||||
|
||||
- **Admins** can view spend across all teams + tags
|
||||
- **Admins** can now see spend across multiple tags
|
||||
- **Admins** can now check the activity by key, within teams
|
||||
- **Internal Users** can now view spend of teams they’re a member of
|
||||
|
||||
[Read more](#management-endpoints--ui)
|
||||
|
||||
## New Models / Updated Models
|
||||
|
||||
- **OpenAI**
|
||||
1. gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing - [Get Started](../../docs/providers/openai#usage), [PR](https://github.com/BerriAI/litellm/pull/9990)
|
||||
2. o4 - correctly map o4 to openai o_series model
|
||||
- **Azure AI**
|
||||
1. Phi-4 output cost per token fix - [PR](https://github.com/BerriAI/litellm/pull/9880)
|
||||
2. Responses API support [Get Started](../../docs/providers/azure#azure-responses-api),[PR](https://github.com/BerriAI/litellm/pull/10116)
|
||||
- **Anthropic**
|
||||
1. redacted message thinking support - [Get Started](../../docs/providers/anthropic#usage---thinking--reasoning_content),[PR](https://github.com/BerriAI/litellm/pull/10129)
|
||||
- **Cohere**
|
||||
1. `/v2/chat` Passthrough endpoint support w/ cost tracking - [Get Started](../../docs/pass_through/cohere), [PR](https://github.com/BerriAI/litellm/pull/9997)
|
||||
- **Azure**
|
||||
1. Support azure tenant_id/client_id env vars - [Get Started](../../docs/providers/azure#entra-id---use-tenant_id-client_id-client_secret), [PR](https://github.com/BerriAI/litellm/pull/9993)
|
||||
2. Fix response_format check for 2025+ api versions - [PR](https://github.com/BerriAI/litellm/pull/9993)
|
||||
3. Add gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing
|
||||
- **VLLM**
|
||||
1. Files - Support 'file' message type for VLLM video url's - [Get Started](../../docs/providers/vllm#send-video-url-to-vllm), [PR](https://github.com/BerriAI/litellm/pull/10129)
|
||||
2. Passthrough - new `/vllm/` passthrough endpoint support [Get Started](../../docs/pass_through/vllm), [PR](https://github.com/BerriAI/litellm/pull/10002)
|
||||
- **Mistral**
|
||||
1. new `/mistral` passthrough endpoint support [Get Started](../../docs/pass_through/mistral), [PR](https://github.com/BerriAI/litellm/pull/10002)
|
||||
- **AWS**
|
||||
1. New mapped bedrock regions - [PR](https://github.com/BerriAI/litellm/pull/9430)
|
||||
- **VertexAI / Google AI Studio**
|
||||
1. Gemini - Response format - Retain schema field ordering for google gemini and vertex by specifying propertyOrdering - [Get Started](../../docs/providers/vertex#json-schema), [PR](https://github.com/BerriAI/litellm/pull/9828)
|
||||
2. Gemini-2.5-flash - return reasoning content [Google AI Studio](../../docs/providers/gemini#usage---thinking--reasoning_content), [Vertex AI](../../docs/providers/vertex#thinking--reasoning_content)
|
||||
3. Gemini-2.5-flash - pricing + model information [PR](https://github.com/BerriAI/litellm/pull/10125)
|
||||
4. Passthrough - new `/vertex_ai/discovery` route - enables calling AgentBuilder API routes [Get Started](../../docs/pass_through/vertex_ai#supported-api-endpoints), [PR](https://github.com/BerriAI/litellm/pull/10084)
|
||||
- **Fireworks AI**
|
||||
1. return tool calling responses in `tool_calls` field (fireworks incorrectly returns this as a json str in content) [PR](https://github.com/BerriAI/litellm/pull/10130)
|
||||
- **Triton**
|
||||
1. Remove fixed remove bad_words / stop words from `/generate` call - [Get Started](../../docs/providers/triton-inference-server#triton-generate---chat-completion), [PR](https://github.com/BerriAI/litellm/pull/10163)
|
||||
- **Other**
|
||||
1. Support for all litellm providers on Responses API (works with Codex) - [Get Started](../../docs/tutorials/openai_codex), [PR](https://github.com/BerriAI/litellm/pull/10132)
|
||||
2. Fix combining multiple tool calls in streaming response - [Get Started](../../docs/completion/stream#helper-function), [PR](https://github.com/BerriAI/litellm/pull/10040)
|
||||
|
||||
|
||||
## Spend Tracking Improvements
|
||||
|
||||
- **Cost Control** - inject cache control points in prompt for cost reduction [Get Started](../../docs/tutorials/prompt_caching), [PR](https://github.com/BerriAI/litellm/pull/10000)
|
||||
- **Spend Tags** - spend tags in headers - support x-litellm-tags even if tag based routing not enabled [Get Started](../../docs/proxy/request_headers#litellm-headers), [PR](https://github.com/BerriAI/litellm/pull/10000)
|
||||
- **Gemini-2.5-flash** - support cost calculation for reasoning tokens [PR](https://github.com/BerriAI/litellm/pull/10141)
|
||||
|
||||
## Management Endpoints / UI
|
||||
- **Users**
|
||||
1. Show created_at and updated_at on users page - [PR](https://github.com/BerriAI/litellm/pull/10033)
|
||||
- **Virtual Keys**
|
||||
1. Filter by key alias - https://github.com/BerriAI/litellm/pull/10085
|
||||
- **Usage Tab**
|
||||
|
||||
1. Team based usage
|
||||
|
||||
- New `LiteLLM_DailyTeamSpend` Table for aggregate team based usage logging - [PR](https://github.com/BerriAI/litellm/pull/10039)
|
||||
|
||||
- New Team based usage dashboard + new `/team/daily/activity` API - [PR](https://github.com/BerriAI/litellm/pull/10081)
|
||||
- Return team alias on /team/daily/activity API - [PR](https://github.com/BerriAI/litellm/pull/10157)
|
||||
- allow internal user view spend for teams they belong to - [PR](https://github.com/BerriAI/litellm/pull/10157)
|
||||
- allow viewing top keys by team - [PR](https://github.com/BerriAI/litellm/pull/10157)
|
||||
|
||||
<Image img={require('../../img/release_notes/new_team_usage.png')}/>
|
||||
|
||||
2. Tag Based Usage
|
||||
- New `LiteLLM_DailyTagSpend` Table for aggregate tag based usage logging - [PR](https://github.com/BerriAI/litellm/pull/10071)
|
||||
- Restrict to only Proxy Admins - [PR](https://github.com/BerriAI/litellm/pull/10157)
|
||||
- allow viewing top keys by tag
|
||||
- Return tags passed in request (i.e. dynamic tags) on `/tag/list` API - [PR](https://github.com/BerriAI/litellm/pull/10157)
|
||||
<Image img={require('../../img/release_notes/new_tag_usage.png')}/>
|
||||
3. Track prompt caching metrics in daily user, team, tag tables - [PR](https://github.com/BerriAI/litellm/pull/10029)
|
||||
4. Show usage by key (on all up, team, and tag usage dashboards) - [PR](https://github.com/BerriAI/litellm/pull/10157)
|
||||
5. swap old usage with new usage tab
|
||||
- **Models**
|
||||
1. Make columns resizable/hideable - [PR](https://github.com/BerriAI/litellm/pull/10119)
|
||||
- **API Playground**
|
||||
1. Allow internal user to call api playground - [PR](https://github.com/BerriAI/litellm/pull/10157)
|
||||
- **SCIM**
|
||||
1. Add LiteLLM SCIM Integration for Team and User management - [Get Started](ADD DOCS HERE), [PR](https://github.com/BerriAI/litellm/pull/10072)
|
||||
|
||||
|
||||
## Logging / Guardrail Integrations
|
||||
- **GCS**
|
||||
1. Fix gcs pub sub logging with env var GCS_PROJECT_ID - [Get Started](../../docs/observability/gcs_bucket_integration#usage), [PR](https://github.com/BerriAI/litellm/pull/10042)
|
||||
- **AIM**
|
||||
1. Add litellm call id passing to Aim guardrails on pre and post-hooks calls - [Get Started](../../docs/proxy/guardrails/aim_security), [PR](https://github.com/BerriAI/litellm/pull/10021)
|
||||
- **Azure blob storage**
|
||||
1. Ensure logging works in high throughput scenarios - [Get Started](../../docs/proxy/logging#azure-blob-storage), [PR](https://github.com/BerriAI/litellm/pull/9962)
|
||||
|
||||
## General Proxy Improvements
|
||||
|
||||
- **Support setting `litellm.modify_params` via env var** [PR](https://github.com/BerriAI/litellm/pull/9964)
|
||||
- **Model Discovery** - Check provider’s `/models` endpoints when calling proxy’s `/v1/models` endpoint - [Get Started](../../docs/proxy/model_discovery), [PR](https://github.com/BerriAI/litellm/pull/9958)
|
||||
- **`/utils/token_counter`** - fix retrieving custom tokenizer for db models - [Get Started](../../docs/proxy/configs#set-custom-tokenizer), [PR](https://github.com/BerriAI/litellm/pull/10047)
|
||||
- **Prisma migrate** - handle existing columns in db table - [PR](https://github.com/BerriAI/litellm/pull/10138)
|
||||
|
|
@ -69,6 +69,7 @@ const sidebars = {
|
|||
"proxy/clientside_auth",
|
||||
"proxy/request_headers",
|
||||
"proxy/response_headers",
|
||||
"proxy/model_discovery",
|
||||
],
|
||||
},
|
||||
{
|
||||
|
@ -330,6 +331,8 @@ const sidebars = {
|
|||
"pass_through/vertex_ai",
|
||||
"pass_through/google_ai_studio",
|
||||
"pass_through/cohere",
|
||||
"pass_through/vllm",
|
||||
"pass_through/mistral",
|
||||
"pass_through/openai_passthrough",
|
||||
"pass_through/anthropic_completion",
|
||||
"pass_through/bedrock",
|
||||
|
|
|
@ -1530,6 +1530,170 @@
|
|||
"search_context_size_high": 50e-3
|
||||
}
|
||||
},
|
||||
"azure/gpt-4.1-mini": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.4e-6,
|
||||
"output_cost_per_token": 1.6e-6,
|
||||
"input_cost_per_token_batches": 0.2e-6,
|
||||
"output_cost_per_token_batches": 0.8e-6,
|
||||
"cache_read_input_token_cost": 0.1e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true,
|
||||
"supports_web_search": true,
|
||||
"search_context_cost_per_query": {
|
||||
"search_context_size_low": 25e-3,
|
||||
"search_context_size_medium": 27.5e-3,
|
||||
"search_context_size_high": 30e-3
|
||||
}
|
||||
},
|
||||
"azure/gpt-4.1-mini-2025-04-14": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.4e-6,
|
||||
"output_cost_per_token": 1.6e-6,
|
||||
"input_cost_per_token_batches": 0.2e-6,
|
||||
"output_cost_per_token_batches": 0.8e-6,
|
||||
"cache_read_input_token_cost": 0.1e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true,
|
||||
"supports_web_search": true,
|
||||
"search_context_cost_per_query": {
|
||||
"search_context_size_low": 25e-3,
|
||||
"search_context_size_medium": 27.5e-3,
|
||||
"search_context_size_high": 30e-3
|
||||
}
|
||||
},
|
||||
"azure/gpt-4.1-nano": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.1e-6,
|
||||
"output_cost_per_token": 0.4e-6,
|
||||
"input_cost_per_token_batches": 0.05e-6,
|
||||
"output_cost_per_token_batches": 0.2e-6,
|
||||
"cache_read_input_token_cost": 0.025e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true
|
||||
},
|
||||
"azure/gpt-4.1-nano-2025-04-14": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.1e-6,
|
||||
"output_cost_per_token": 0.4e-6,
|
||||
"input_cost_per_token_batches": 0.05e-6,
|
||||
"output_cost_per_token_batches": 0.2e-6,
|
||||
"cache_read_input_token_cost": 0.025e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true
|
||||
},
|
||||
"azure/o3": {
|
||||
"max_tokens": 100000,
|
||||
"max_input_tokens": 200000,
|
||||
"max_output_tokens": 100000,
|
||||
"input_cost_per_token": 1e-5,
|
||||
"output_cost_per_token": 4e-5,
|
||||
"cache_read_input_token_cost": 2.5e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": false,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_reasoning": true,
|
||||
"supports_tool_choice": true
|
||||
},
|
||||
"azure/o3-2025-04-16": {
|
||||
"max_tokens": 100000,
|
||||
"max_input_tokens": 200000,
|
||||
"max_output_tokens": 100000,
|
||||
"input_cost_per_token": 1e-5,
|
||||
"output_cost_per_token": 4e-5,
|
||||
"cache_read_input_token_cost": 2.5e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": false,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_reasoning": true,
|
||||
"supports_tool_choice": true
|
||||
},
|
||||
"azure/o4-mini": {
|
||||
"max_tokens": 100000,
|
||||
"max_input_tokens": 200000,
|
||||
"max_output_tokens": 100000,
|
||||
"input_cost_per_token": 1.1e-6,
|
||||
"output_cost_per_token": 4.4e-6,
|
||||
"cache_read_input_token_cost": 2.75e-7,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": false,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_reasoning": true,
|
||||
"supports_tool_choice": true
|
||||
},
|
||||
"azure/gpt-4o-mini-realtime-preview-2024-12-17": {
|
||||
"max_tokens": 4096,
|
||||
"max_input_tokens": 128000,
|
||||
|
|
|
@ -26,8 +26,10 @@ model_list:
|
|||
model: azure/gpt-4.1
|
||||
api_key: os.environ/AZURE_API_KEY_REALTIME
|
||||
api_base: https://krris-m2f9a9i7-eastus2.openai.azure.com/
|
||||
|
||||
|
||||
- model_name: "xai/*"
|
||||
litellm_params:
|
||||
model: xai/*
|
||||
api_key: os.environ/XAI_API_KEY
|
||||
|
||||
litellm_settings:
|
||||
num_retries: 0
|
||||
|
|
|
@ -1530,6 +1530,170 @@
|
|||
"search_context_size_high": 50e-3
|
||||
}
|
||||
},
|
||||
"azure/gpt-4.1-mini": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.4e-6,
|
||||
"output_cost_per_token": 1.6e-6,
|
||||
"input_cost_per_token_batches": 0.2e-6,
|
||||
"output_cost_per_token_batches": 0.8e-6,
|
||||
"cache_read_input_token_cost": 0.1e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true,
|
||||
"supports_web_search": true,
|
||||
"search_context_cost_per_query": {
|
||||
"search_context_size_low": 25e-3,
|
||||
"search_context_size_medium": 27.5e-3,
|
||||
"search_context_size_high": 30e-3
|
||||
}
|
||||
},
|
||||
"azure/gpt-4.1-mini-2025-04-14": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.4e-6,
|
||||
"output_cost_per_token": 1.6e-6,
|
||||
"input_cost_per_token_batches": 0.2e-6,
|
||||
"output_cost_per_token_batches": 0.8e-6,
|
||||
"cache_read_input_token_cost": 0.1e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true,
|
||||
"supports_web_search": true,
|
||||
"search_context_cost_per_query": {
|
||||
"search_context_size_low": 25e-3,
|
||||
"search_context_size_medium": 27.5e-3,
|
||||
"search_context_size_high": 30e-3
|
||||
}
|
||||
},
|
||||
"azure/gpt-4.1-nano": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.1e-6,
|
||||
"output_cost_per_token": 0.4e-6,
|
||||
"input_cost_per_token_batches": 0.05e-6,
|
||||
"output_cost_per_token_batches": 0.2e-6,
|
||||
"cache_read_input_token_cost": 0.025e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true
|
||||
},
|
||||
"azure/gpt-4.1-nano-2025-04-14": {
|
||||
"max_tokens": 32768,
|
||||
"max_input_tokens": 1047576,
|
||||
"max_output_tokens": 32768,
|
||||
"input_cost_per_token": 0.1e-6,
|
||||
"output_cost_per_token": 0.4e-6,
|
||||
"input_cost_per_token_batches": 0.05e-6,
|
||||
"output_cost_per_token_batches": 0.2e-6,
|
||||
"cache_read_input_token_cost": 0.025e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_system_messages": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_native_streaming": true
|
||||
},
|
||||
"azure/o3": {
|
||||
"max_tokens": 100000,
|
||||
"max_input_tokens": 200000,
|
||||
"max_output_tokens": 100000,
|
||||
"input_cost_per_token": 1e-5,
|
||||
"output_cost_per_token": 4e-5,
|
||||
"cache_read_input_token_cost": 2.5e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": false,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_reasoning": true,
|
||||
"supports_tool_choice": true
|
||||
},
|
||||
"azure/o3-2025-04-16": {
|
||||
"max_tokens": 100000,
|
||||
"max_input_tokens": 200000,
|
||||
"max_output_tokens": 100000,
|
||||
"input_cost_per_token": 1e-5,
|
||||
"output_cost_per_token": 4e-5,
|
||||
"cache_read_input_token_cost": 2.5e-6,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": false,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_reasoning": true,
|
||||
"supports_tool_choice": true
|
||||
},
|
||||
"azure/o4-mini": {
|
||||
"max_tokens": 100000,
|
||||
"max_input_tokens": 200000,
|
||||
"max_output_tokens": 100000,
|
||||
"input_cost_per_token": 1.1e-6,
|
||||
"output_cost_per_token": 4.4e-6,
|
||||
"cache_read_input_token_cost": 2.75e-7,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supported_endpoints": ["/v1/chat/completions", "/v1/batch", "/v1/responses"],
|
||||
"supported_modalities": ["text", "image"],
|
||||
"supported_output_modalities": ["text"],
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": false,
|
||||
"supports_vision": true,
|
||||
"supports_prompt_caching": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_reasoning": true,
|
||||
"supports_tool_choice": true
|
||||
},
|
||||
"azure/gpt-4o-mini-realtime-preview-2024-12-17": {
|
||||
"max_tokens": 4096,
|
||||
"max_input_tokens": 128000,
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue