Fixes for multi-turn tool calls in Responses API

Testing with Codex locally, I found another issue in how we were plumbing through tool calls in multi-turn scenarios and the way tool call inputs and outputs from previous turns were passed back into future turns. This led me to realize we were missing the function tool call output type in the Responses API, so this adds that and plumbs handling of it through the responses API to chat completion conversion code. Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-12-28 03:42:00 +00:00 · 2025-05-08 16:21:15 -04:00 · 2025-05-08 16:21:15 -04:00 · 4df8caab41
commit 4df8caab41
parent 65c56d0ee8
4 changed files with 187 additions and 69 deletions
--- a/llama_stack/apis/agents/openai_responses.py
+++ b/llama_stack/apis/agents/openai_responses.py
@ -130,9 +130,24 @@ OpenAIResponseObjectStream = Annotated[
 register_schema(OpenAIResponseObjectStream, name="OpenAIResponseObjectStream")


+@json_schema_type
+class OpenAIResponseInputFunctionToolCallOutput(BaseModel):
+    """
+    This represents the output of a function call that gets passed back to the model.
+    """
+
+    call_id: str
+    output: str
+    type: Literal["function_call_output"] = "function_call_output"
+    id: str | None = None
+    status: str | None = None
+
+
 OpenAIResponseInput = Annotated[
    # Responses API allows output messages to be passed in as input
    OpenAIResponseOutputMessageWebSearchToolCall
+    | OpenAIResponseOutputMessageFunctionToolCall
+    | OpenAIResponseInputFunctionToolCallOutput
    |
    # Fallback to the generic message type as a last resort
    OpenAIResponseMessage,