fix: Avoid BadRequestError due to invalid max_tokens (#3667)

This patch ensures if max tokens is not defined, then is set to None instead of 0 when calling openai_chat_completion. This way some providers (like gemini) that cannot handle the `max_tokens = 0` will not fail Issue: #3666
2025-12-03 18:00:36 +00:00 · 2025-10-27 17:27:21 +01:00 · 2025-10-27 17:27:21 +01:00 · f18b5eb537
commit f18b5eb537
parent 00d8414597
171 changed files with 12728 additions and 8 deletions
--- a/llama_stack/apis/inference/inference.py
+++ b/llama_stack/apis/inference/inference.py
@ -97,7 +97,7 @@ class SamplingParams(BaseModel):

    strategy: SamplingStrategy = Field(default_factory=GreedySamplingStrategy)

-    max_tokens: int | None = 0
+    max_tokens: int | None = None
    repetition_penalty: float | None = 1.0
    stop: list[str] | None = None