fix: Fix max_tool_calls for openai provider and add integration tests for the max_tool_calls feat (#4190)

# Problem OpenAI gpt-4 returned an error when built-in and mcp calls were skipped due to max_tool_calls parameter. Following is from the server log: ``` RuntimeError: OpenAI response failed: Error code: 400 - {'error': {'message': "An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_Yi9V1QNpN73dJCAgP2Arcjej", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}} ``` # What does this PR do? - Fixes error returned by openai/gpt when calls were skipped due to max_tool_calls. We now return a tool message that explicitly mentions that the call is skipped. - Adds integration tests as a follow-up to PR#[4062](https://github.com/llamastack/llama-stack/pull/4062)  Part 2 for issue #[3563](https://github.com/llamastack/llama-stack/issues/3563) ## Test Plan  - Added integration tests - Added new recordings --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-12-03 09:53:45 +00:00 · 2025-11-19 13:27:56 -05:00 · 2025-11-19 13:27:56 -05:00 · 72ea95e2e0
commit 72ea95e2e0
parent f18870a221
11 changed files with 8386 additions and 168 deletions
--- a/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py
+++ b/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py
@ -66,6 +66,7 @@ from llama_stack_api import (
    OpenAIResponseUsage,
    OpenAIResponseUsageInputTokensDetails,
    OpenAIResponseUsageOutputTokensDetails,
+    OpenAIToolMessageParam,
    Safety,
    WebSearchToolTypes,
 )
@ -906,10 +907,16 @@ class StreamingResponseOrchestrator:
        """Coordinate execution of both function and non-function tool calls."""
        # Execute non-function tool calls
        for tool_call in non_function_tool_calls:
-            # Check if total calls made to built-in and mcp tools exceed max_tool_calls
+            # if total calls made to built-in and mcp tools exceed max_tool_calls
+            # then create a tool response message indicating the call was skipped
            if self.max_tool_calls is not None and self.accumulated_builtin_tool_calls >= self.max_tool_calls:
                logger.info(f"Ignoring built-in and mcp tool call since reached the limit of {self.max_tool_calls=}.")
-                break
+                skipped_call_message = OpenAIToolMessageParam(
+                    content=f"Tool call skipped: maximum tool calls limit ({self.max_tool_calls}) reached.",
+                    tool_call_id=tool_call.id,
+                )
+                next_turn_messages.append(skipped_call_message)
+                continue

            # Find the item_id for this tool call
            matching_item_id = None