fix: resume responses with tool call output (#2524)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 49s
Test External Providers / test-external-providers (venv) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 2m5s

# What does this PR do?
closes #2522 

## Test Plan
added integration test
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -v
tests/integration/agents/test_openai_responses.py --text-model
"accounts/fireworks/models/llama-v3p3-70b-instruct" -vv -k
'function_call'
This commit is contained in:
ehhuang 2025-06-25 14:43:37 -07:00 committed by GitHub
parent 82f13fe83e
commit 1d3f27fe5b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 56 additions and 1 deletions

View file

@ -221,3 +221,56 @@ def test_list_response_input_items_with_limit_and_order(openai_client, client_wi
assert hasattr(item, "type")
assert item.type == "message"
assert item.role in ["user", "assistant"]
@pytest.mark.skip(reason="Tool calling is not reliable.")
def test_function_call_output_response(openai_client, client_with_models, text_model_id):
"""Test handling of function call outputs in responses."""
if isinstance(client_with_models, LlamaStackAsLibraryClient):
pytest.skip("OpenAI responses are not supported when testing with library client yet.")
client = openai_client
# First create a response that triggers a function call
response = client.responses.create(
model=text_model_id,
input=[
{
"role": "user",
"content": "what's the weather in tokyo? You MUST call the `get_weather` function to find out.",
}
],
tools=[
{
"type": "function",
"name": "get_weather",
"description": "Get the weather in a given city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city to get the weather for"},
},
},
}
],
stream=False,
)
# Verify we got a function call
assert response.output[0].type == "function_call"
call_id = response.output[0].call_id
# Now send the function call output as a follow-up
response2 = client.responses.create(
model=text_model_id,
input=[{"type": "function_call_output", "call_id": call_id, "output": "sunny and warm"}],
previous_response_id=response.id,
stream=False,
)
# Verify the second response processed successfully
assert response2.id is not None
assert response2.output[0].type == "message"
assert (
"sunny" in response2.output[0].content[0].text.lower() or "warm" in response2.output[0].content[0].text.lower()
)