mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-26 11:14:04 +00:00
docs(vertex.md): add thinking/reasoning content for gemini models to docs
This commit is contained in:
parent
e1ef20d4b9
commit
a63e5ff597
2 changed files with 149 additions and 1 deletions
|
@ -542,6 +542,154 @@ print(resp)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### **Thinking / `reasoning_content`**
|
||||||
|
|
||||||
|
LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362)
|
||||||
|
|
||||||
|
**Mapping**
|
||||||
|
|
||||||
|
| reasoning_effort | thinking |
|
||||||
|
| ---------------- | -------- |
|
||||||
|
| "low" | "budget_tokens": 1024 |
|
||||||
|
| "medium" | "budget_tokens": 2048 |
|
||||||
|
| "high" | "budget_tokens": 4096 |
|
||||||
|
|
||||||
|
<Tabs>
|
||||||
|
<TabItem value="sdk" label="SDK">
|
||||||
|
|
||||||
|
```python
|
||||||
|
from litellm import completion
|
||||||
|
|
||||||
|
# !gcloud auth application-default login - run this to add vertex credentials to your env
|
||||||
|
|
||||||
|
resp = completion(
|
||||||
|
model="vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||||
|
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||||
|
reasoning_effort="low",
|
||||||
|
vertex_project="project-id",
|
||||||
|
vertex_location="us-central1"
|
||||||
|
)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
|
||||||
|
<TabItem value="proxy" label="PROXY">
|
||||||
|
|
||||||
|
1. Setup config.yaml
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- model_name: gemini-2.5-flash
|
||||||
|
litellm_params:
|
||||||
|
model: vertex_ai/gemini-2.5-flash-preview-04-17
|
||||||
|
vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
|
||||||
|
vertex_project: "project-id"
|
||||||
|
vertex_location: "us-central1"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Start proxy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
litellm --config /path/to/config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Test it!
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
|
||||||
|
-d '{
|
||||||
|
"model": "gemini-2.5-flash",
|
||||||
|
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||||
|
"reasoning_effort": "low"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
</Tabs>
|
||||||
|
|
||||||
|
|
||||||
|
**Expected Response**
|
||||||
|
|
||||||
|
```python
|
||||||
|
ModelResponse(
|
||||||
|
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
|
||||||
|
created=1740470510,
|
||||||
|
model='claude-3-7-sonnet-20250219',
|
||||||
|
object='chat.completion',
|
||||||
|
system_fingerprint=None,
|
||||||
|
choices=[
|
||||||
|
Choices(
|
||||||
|
finish_reason='stop',
|
||||||
|
index=0,
|
||||||
|
message=Message(
|
||||||
|
content="The capital of France is Paris.",
|
||||||
|
role='assistant',
|
||||||
|
tool_calls=None,
|
||||||
|
function_call=None,
|
||||||
|
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
|
||||||
|
),
|
||||||
|
)
|
||||||
|
],
|
||||||
|
usage=Usage(
|
||||||
|
completion_tokens=68,
|
||||||
|
prompt_tokens=42,
|
||||||
|
total_tokens=110,
|
||||||
|
completion_tokens_details=None,
|
||||||
|
prompt_tokens_details=PromptTokensDetailsWrapper(
|
||||||
|
audio_tokens=None,
|
||||||
|
cached_tokens=0,
|
||||||
|
text_tokens=None,
|
||||||
|
image_tokens=None
|
||||||
|
),
|
||||||
|
cache_creation_input_tokens=0,
|
||||||
|
cache_read_input_tokens=0
|
||||||
|
)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Pass `thinking` to Gemini models
|
||||||
|
|
||||||
|
You can also pass the `thinking` parameter to Gemini models.
|
||||||
|
|
||||||
|
This is translated to Gemini's [`thinkingConfig` parameter](https://ai.google.dev/gemini-api/docs/thinking#set-budget).
|
||||||
|
|
||||||
|
<Tabs>
|
||||||
|
<TabItem value="sdk" label="SDK">
|
||||||
|
|
||||||
|
```python
|
||||||
|
from litellm import completion
|
||||||
|
|
||||||
|
# !gcloud auth application-default login - run this to add vertex credentials to your env
|
||||||
|
|
||||||
|
response = litellm.completion(
|
||||||
|
model="vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||||
|
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||||
|
thinking={"type": "enabled", "budget_tokens": 1024},
|
||||||
|
vertex_project="project-id",
|
||||||
|
vertex_location="us-central1"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
<TabItem value="proxy" label="PROXY">
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer $LITELLM_KEY" \
|
||||||
|
-d '{
|
||||||
|
"model": "vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||||
|
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||||
|
"thinking": {"type": "enabled", "budget_tokens": 1024}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
</Tabs>
|
||||||
|
|
||||||
|
|
||||||
### **Context Caching**
|
### **Context Caching**
|
||||||
|
|
||||||
Use Vertex AI context caching is supported by calling provider api directly. (Unified Endpoint support comin soon.).
|
Use Vertex AI context caching is supported by calling provider api directly. (Unified Endpoint support comin soon.).
|
||||||
|
|
|
@ -41,7 +41,7 @@ hide_table_of_contents: false
|
||||||
1. New mapped bedrock regions - [PR](https://github.com/BerriAI/litellm/pull/9430)
|
1. New mapped bedrock regions - [PR](https://github.com/BerriAI/litellm/pull/9430)
|
||||||
- **VertexAI / Google AI Studio**
|
- **VertexAI / Google AI Studio**
|
||||||
1. Gemini - Response format - Retain schema field ordering for google gemini and vertex by specifying propertyOrdering - [Get Started](../../docs/providers/vertex#json-schema), [PR](https://github.com/BerriAI/litellm/pull/9828)
|
1. Gemini - Response format - Retain schema field ordering for google gemini and vertex by specifying propertyOrdering - [Get Started](../../docs/providers/vertex#json-schema), [PR](https://github.com/BerriAI/litellm/pull/9828)
|
||||||
2. Gemini-2.5-flash - return reasoning content [ADD DOCS HERE], [PR](https://github.com/BerriAI/litellm/pull/10125)
|
2. Gemini-2.5-flash - return reasoning content [Google AI Studio](../../docs/providers/gemini#usage---thinking--reasoning_content), [Vertex AI](../../docs/providers/vertex#thinking--reasoning_content)
|
||||||
3. Gemini-2.5-flash - pricing + model information [PR](https://github.com/BerriAI/litellm/pull/10125)
|
3. Gemini-2.5-flash - pricing + model information [PR](https://github.com/BerriAI/litellm/pull/10125)
|
||||||
4. Passthrough - new `/vertex_ai/discovery` route - enables calling AgentBuilder API routes [ADD DOCS HERE], [PR](https://github.com/BerriAI/litellm/pull/10084)
|
4. Passthrough - new `/vertex_ai/discovery` route - enables calling AgentBuilder API routes [ADD DOCS HERE], [PR](https://github.com/BerriAI/litellm/pull/10084)
|
||||||
- **Fireworks AI**
|
- **Fireworks AI**
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue