fix: Fix max_tool_calls for openai provider and add integration tests for the max_tool_calls feat (#4190)

# Problem

OpenAI gpt-4 returned an error when built-in and mcp calls were skipped
due to max_tool_calls parameter. Following is from the server log:
```
RuntimeError: OpenAI response failed: Error code: 400 - {'error': {'message': "An assistant message with       
'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids  
did not have response messages: call_Yi9V1QNpN73dJCAgP2Arcjej", 'type': 'invalid_request_error', 'param':      
'messages', 'code': None}}
```

# What does this PR do?

- Fixes error returned by openai/gpt when calls were skipped due to
max_tool_calls. We now return a tool message that explicitly mentions
that the call is skipped.
- Adds integration tests as a follow-up to
PR#[4062](https://github.com/llamastack/llama-stack/pull/4062)

<!-- If resolving an issue, uncomment and update the line below -->
Part 2 for issue
#[3563](https://github.com/llamastack/llama-stack/issues/3563)

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- Added integration tests
- Added new recordings

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
This commit is contained in:
Shabana Baig 2025-11-19 13:27:56 -05:00 committed by GitHub
parent f18870a221
commit 72ea95e2e0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 8386 additions and 168 deletions

View file

@ -66,6 +66,7 @@ from llama_stack_api import (
OpenAIResponseUsage,
OpenAIResponseUsageInputTokensDetails,
OpenAIResponseUsageOutputTokensDetails,
OpenAIToolMessageParam,
Safety,
WebSearchToolTypes,
)
@ -906,10 +907,16 @@ class StreamingResponseOrchestrator:
"""Coordinate execution of both function and non-function tool calls."""
# Execute non-function tool calls
for tool_call in non_function_tool_calls:
# Check if total calls made to built-in and mcp tools exceed max_tool_calls
# if total calls made to built-in and mcp tools exceed max_tool_calls
# then create a tool response message indicating the call was skipped
if self.max_tool_calls is not None and self.accumulated_builtin_tool_calls >= self.max_tool_calls:
logger.info(f"Ignoring built-in and mcp tool call since reached the limit of {self.max_tool_calls=}.")
break
skipped_call_message = OpenAIToolMessageParam(
content=f"Tool call skipped: maximum tool calls limit ({self.max_tool_calls}) reached.",
tool_call_id=tool_call.id,
)
next_turn_messages.append(skipped_call_message)
continue
# Find the item_id for this tool call
matching_item_id = None