feat: Structured output for Responses API

This adds the missing `text` parameter to the Responses API that is how users control structured outputs. All we do with that parameter is map it to the corresponding chat completion response_format. The unit tests exercise the various permutations allowed for this property, while a couple of new verification tests actually use it for real to verify the model outputs are following the format as expected. Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-29 03:14:19 +00:00 · 2025-05-31 13:44:20 -04:00 · 2025-05-31 13:44:20 -04:00 · badf8594d1
commit badf8594d1
parent 4540c9b3e5
8 changed files with 323 additions and 2 deletions
--- a/llama_stack/apis/agents/agents.py
+++ b/llama_stack/apis/agents/agents.py
@ -37,6 +37,7 @@ from .openai_responses import (
    OpenAIResponseInputTool,
    OpenAIResponseObject,
    OpenAIResponseObjectStream,
+    OpenAIResponseText,
 )

 # TODO: use enum.StrEnum when we drop support for python 3.10
@ -603,6 +604,7 @@ class Agents(Protocol):
        store: bool | None = True,
        stream: bool | None = False,
        temperature: float | None = None,
+        text: OpenAIResponseText | None = None,
        tools: list[OpenAIResponseInputTool] | None = None,
        max_infer_iters: int | None = 10,  # this is an extension to the OpenAI API
    ) -> OpenAIResponseObject | AsyncIterator[OpenAIResponseObjectStream]: