mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-25 02:34:29 +00:00
docs(vertex.md): add thinking/reasoning content for gemini models to docs
This commit is contained in:
parent
e1ef20d4b9
commit
a63e5ff597
2 changed files with 149 additions and 1 deletions
|
@ -542,6 +542,154 @@ print(resp)
|
|||
```
|
||||
|
||||
|
||||
### **Thinking / `reasoning_content`**
|
||||
|
||||
LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362)
|
||||
|
||||
**Mapping**
|
||||
|
||||
| reasoning_effort | thinking |
|
||||
| ---------------- | -------- |
|
||||
| "low" | "budget_tokens": 1024 |
|
||||
| "medium" | "budget_tokens": 2048 |
|
||||
| "high" | "budget_tokens": 4096 |
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
# !gcloud auth application-default login - run this to add vertex credentials to your env
|
||||
|
||||
resp = completion(
|
||||
model="vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||
reasoning_effort="low",
|
||||
vertex_project="project-id",
|
||||
vertex_location="us-central1"
|
||||
)
|
||||
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
1. Setup config.yaml
|
||||
|
||||
```yaml
|
||||
- model_name: gemini-2.5-flash
|
||||
litellm_params:
|
||||
model: vertex_ai/gemini-2.5-flash-preview-04-17
|
||||
vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
|
||||
vertex_project: "project-id"
|
||||
vertex_location: "us-central1"
|
||||
```
|
||||
|
||||
2. Start proxy
|
||||
|
||||
```bash
|
||||
litellm --config /path/to/config.yaml
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
```bash
|
||||
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
|
||||
-d '{
|
||||
"model": "gemini-2.5-flash",
|
||||
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||
"reasoning_effort": "low"
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
**Expected Response**
|
||||
|
||||
```python
|
||||
ModelResponse(
|
||||
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
|
||||
created=1740470510,
|
||||
model='claude-3-7-sonnet-20250219',
|
||||
object='chat.completion',
|
||||
system_fingerprint=None,
|
||||
choices=[
|
||||
Choices(
|
||||
finish_reason='stop',
|
||||
index=0,
|
||||
message=Message(
|
||||
content="The capital of France is Paris.",
|
||||
role='assistant',
|
||||
tool_calls=None,
|
||||
function_call=None,
|
||||
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
|
||||
),
|
||||
)
|
||||
],
|
||||
usage=Usage(
|
||||
completion_tokens=68,
|
||||
prompt_tokens=42,
|
||||
total_tokens=110,
|
||||
completion_tokens_details=None,
|
||||
prompt_tokens_details=PromptTokensDetailsWrapper(
|
||||
audio_tokens=None,
|
||||
cached_tokens=0,
|
||||
text_tokens=None,
|
||||
image_tokens=None
|
||||
),
|
||||
cache_creation_input_tokens=0,
|
||||
cache_read_input_tokens=0
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
#### Pass `thinking` to Gemini models
|
||||
|
||||
You can also pass the `thinking` parameter to Gemini models.
|
||||
|
||||
This is translated to Gemini's [`thinkingConfig` parameter](https://ai.google.dev/gemini-api/docs/thinking#set-budget).
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
# !gcloud auth application-default login - run this to add vertex credentials to your env
|
||||
|
||||
response = litellm.completion(
|
||||
model="vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||
messages=[{"role": "user", "content": "What is the capital of France?"}],
|
||||
thinking={"type": "enabled", "budget_tokens": 1024},
|
||||
vertex_project="project-id",
|
||||
vertex_location="us-central1"
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
```bash
|
||||
curl http://0.0.0.0:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $LITELLM_KEY" \
|
||||
-d '{
|
||||
"model": "vertex_ai/gemini-2.5-flash-preview-04-17",
|
||||
"messages": [{"role": "user", "content": "What is the capital of France?"}],
|
||||
"thinking": {"type": "enabled", "budget_tokens": 1024}
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
### **Context Caching**
|
||||
|
||||
Use Vertex AI context caching is supported by calling provider api directly. (Unified Endpoint support comin soon.).
|
||||
|
|
|
@ -41,7 +41,7 @@ hide_table_of_contents: false
|
|||
1. New mapped bedrock regions - [PR](https://github.com/BerriAI/litellm/pull/9430)
|
||||
- **VertexAI / Google AI Studio**
|
||||
1. Gemini - Response format - Retain schema field ordering for google gemini and vertex by specifying propertyOrdering - [Get Started](../../docs/providers/vertex#json-schema), [PR](https://github.com/BerriAI/litellm/pull/9828)
|
||||
2. Gemini-2.5-flash - return reasoning content [ADD DOCS HERE], [PR](https://github.com/BerriAI/litellm/pull/10125)
|
||||
2. Gemini-2.5-flash - return reasoning content [Google AI Studio](../../docs/providers/gemini#usage---thinking--reasoning_content), [Vertex AI](../../docs/providers/vertex#thinking--reasoning_content)
|
||||
3. Gemini-2.5-flash - pricing + model information [PR](https://github.com/BerriAI/litellm/pull/10125)
|
||||
4. Passthrough - new `/vertex_ai/discovery` route - enables calling AgentBuilder API routes [ADD DOCS HERE], [PR](https://github.com/BerriAI/litellm/pull/10084)
|
||||
- **Fireworks AI**
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue