llama-stack-mirror/llama_stack/apis
Ben Browning 8e316c9b1e
feat: function tools in OpenAI Responses (#2094)
# What does this PR do?

This is a combination of what was previously 3 separate PRs - #2069,
#2075, and #2083. It turns out all 3 of those are needed to land a
working function calling Responses implementation. The web search
builtin tool was already working, but this wires in support for custom
function calling.

I ended up combining all three into one PR because they all had lots of
merge conflicts, both with each other but also with #1806 that just
landed. And, because landing any of them individually would have only
left a partially working implementation merged.

The new things added here are:
* Storing of input items from previous responses and restoring of those
input items when adding previous responses to the conversation state
* Handling of multiple input item messages roles, not just "user"
messages.
* Support for custom tools passed into the Responses API to enable
function calling outside of just the builtin websearch tool.

Closes #2074
Closes #2080

## Test Plan

### Unit Tests

Several new unit tests were added, and they all pass. Ran via:

```
python -m pytest -s -v tests/unit/providers/agents/meta_reference/test_openai_responses.py
```

### Responses API Verification Tests

I ran our verification run.yaml against multiple providers to ensure we
were getting a decent pass rate. Specifically, I ensured the new custom
tool verification test passed across multiple providers and that the
multi-turn examples passed across at least some of the providers (some
providers struggle with the multi-turn workflows still).

Running the stack setup for verification testing:

```
llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml
```

Together, passing 100% as an example:

```
pytest -s -v 'tests/verifications/openai_api/test_responses.py' --provider=together-llama-stack
```

## Documentation

We will need to start documenting the OpenAI APIs, but for now the
Responses stuff is still rapidly evolving so delaying that.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-05-13 11:29:15 -07:00
..
agents feat: function tools in OpenAI Responses (#2094) 2025-05-13 11:29:15 -07:00
batch_inference chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
benchmarks chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
common chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
datasetio chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
datasets chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
eval fix: Syntax error with missing stubs at the end of some function calls (#2116) 2025-05-12 17:05:40 +02:00
files chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
inference chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
inspect chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
models chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
post_training chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
providers chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
safety chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
scoring chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
scoring_functions chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
shields chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
synthetic_data_generation chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
telemetry feat: add metrics query API (#1394) 2025-05-07 10:11:26 -07:00
tools chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
vector_dbs chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
vector_io chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
datatypes.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
resource.py chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
version.py llama-stack version alpha -> v1 2025-01-15 05:58:09 -08:00