diff --git a/docs/my-website/docs/providers/gemini.md b/docs/my-website/docs/providers/gemini.md index db63d33d8d..434df6a7c9 100644 --- a/docs/my-website/docs/providers/gemini.md +++ b/docs/my-website/docs/providers/gemini.md @@ -39,14 +39,164 @@ response = completion( - temperature - top_p - max_tokens +- max_completion_tokens - stream - tools - tool_choice +- functions - response_format - n - stop +- logprobs +- frequency_penalty +- modalities +- reasoning_content + +**Anthropic Params** +- thinking (used to set max budget tokens across anthropic/gemini models) + +[**See Updated List**](https://github.com/BerriAI/litellm/blob/main/litellm/llms/gemini/chat/transformation.py#L70) + + + +## Usage - Thinking / `reasoning_content` + +LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362) + +**Mapping** + +| reasoning_effort | thinking | +| ---------------- | -------- | +| "low" | "budget_tokens": 1024 | +| "medium" | "budget_tokens": 2048 | +| "high" | "budget_tokens": 4096 | + + + + +```python +from litellm import completion + +resp = completion( + model="gemini/gemini-2.5-flash-preview-04-17", + messages=[{"role": "user", "content": "What is the capital of France?"}], + reasoning_effort="low", +) + +``` + + + + + +1. Setup config.yaml + +```yaml +- model_name: gemini-2.5-flash + litellm_params: + model: gemini/gemini-2.5-flash-preview-04-17 + api_key: os.environ/GEMINI_API_KEY +``` + +2. Start proxy + +```bash +litellm --config /path/to/config.yaml +``` + +3. Test it! + +```bash +curl http://0.0.0.0:4000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer " \ + -d '{ + "model": "gemini-2.5-flash", + "messages": [{"role": "user", "content": "What is the capital of France?"}], + "reasoning_effort": "low" + }' +``` + + + + + +**Expected Response** + +```python +ModelResponse( + id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e', + created=1740470510, + model='claude-3-7-sonnet-20250219', + object='chat.completion', + system_fingerprint=None, + choices=[ + Choices( + finish_reason='stop', + index=0, + message=Message( + content="The capital of France is Paris.", + role='assistant', + tool_calls=None, + function_call=None, + reasoning_content='The capital of France is Paris. This is a very straightforward factual question.' + ), + ) + ], + usage=Usage( + completion_tokens=68, + prompt_tokens=42, + total_tokens=110, + completion_tokens_details=None, + prompt_tokens_details=PromptTokensDetailsWrapper( + audio_tokens=None, + cached_tokens=0, + text_tokens=None, + image_tokens=None + ), + cache_creation_input_tokens=0, + cache_read_input_tokens=0 + ) +) +``` + +### Pass `thinking` to Gemini models + +You can also pass the `thinking` parameter to Gemini models. + +This is translated to Gemini's [`thinkingConfig` parameter](https://ai.google.dev/gemini-api/docs/thinking#set-budget). + + + + +```python +response = litellm.completion( + model="gemini/gemini-2.5-flash-preview-04-17", + messages=[{"role": "user", "content": "What is the capital of France?"}], + thinking={"type": "enabled", "budget_tokens": 1024}, +) +``` + + + + +```bash +curl http://0.0.0.0:4000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $LITELLM_KEY" \ + -d '{ + "model": "gemini/gemini-2.5-flash-preview-04-17", + "messages": [{"role": "user", "content": "What is the capital of France?"}], + "thinking": {"type": "enabled", "budget_tokens": 1024} + }' +``` + + + + + + -[**See Updated List**](https://github.com/BerriAI/litellm/blob/1c747f3ad372399c5b95cc5696b06a5fbe53186b/litellm/llms/vertex_httpx.py#L122) ## Passing Gemini Specific Params ### Response schema diff --git a/docs/my-website/docs/reasoning_content.md b/docs/my-website/docs/reasoning_content.md index 86ef58bd68..12a0f17ba0 100644 --- a/docs/my-website/docs/reasoning_content.md +++ b/docs/my-website/docs/reasoning_content.md @@ -16,6 +16,8 @@ Supported Providers: - Vertex AI (Anthropic) (`vertexai/`) - OpenRouter (`openrouter/`) - XAI (`xai/`) +- Google AI Studio (`google/`) +- Vertex AI (`vertex_ai/`) LiteLLM will standardize the `reasoning_content` in the response and `thinking_blocks` in the assistant message. @@ -23,7 +25,7 @@ LiteLLM will standardize the `reasoning_content` in the response and `thinking_b "message": { ... "reasoning_content": "The capital of France is Paris.", - "thinking_blocks": [ + "thinking_blocks": [ # only returned for Anthropic models { "type": "thinking", "thinking": "The capital of France is Paris.",