The bulk of the change here is making the naming and contents of the
conversion to/from Responses API inputs -> Chat Completion API
messages and Chat Completion API choices -> Responses API outputs more
clear with some code comments, method renaming, and slight
refactoring.
There are also some other minor changes, like moving a pydantic model
from the api/ to the implementation since it's not actually exposed
via the API, as well as making some if/else usage more clear.
Signed-off-by: Ben Browning <bbrownin@redhat.com>
This finishes the plumbing for function tool call and adds a basic
verification test (that passes for me locally against Llama 4 Scout in
vllm).
Signed-off-by: Ben Browning <bbrownin@redhat.com>
This adjusts the restoration of previous responses to prepend them to
the list of Responses API inputs instead of our converted list of Chat
Completion messages. This matches the expected behavior of the
Responses API, and I misinterpreted the nuances here in the initial implementation.
Signed-off-by: Ben Browning <bbrownin@redhat.com>
This adds storing of input items with previous responses and then
restores those input items to prepend to the user's messages list when
using conversation state.
I missed this in the initial implementation, but it makes sense that we
have to store the input items from previous responses so that we can
reconstruct the proper messages stack for multi-turn conversations -
just the output from previous responses isn't enough context for the
models to follow the turns and the original instructions.
Signed-off-by: Ben Browning <bbrownin@redhat.com>
# What does this PR do?
The goal of this PR is code base modernization.
Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)
Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
This provides an initial [OpenAI Responses
API](https://platform.openai.com/docs/api-reference/responses)
implementation. The API is not yet complete, and this is more a
proof-of-concept to show how we can store responses in our key-value
stores and use them to support the Responses API concepts like
`previous_response_id`.
## Test Plan
I've added a new
`tests/integration/openai_responses/test_openai_responses.py` as part of
a test-driven development for this new API. I'm only testing this
locally with the remote-vllm provider for now, but it should work with
any of our inference providers since the only API it requires out of the
inference provider is the `openai_chat_completion` endpoint.
```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack build --template remote-vllm --image-type venv --run
```
```
LLAMA_STACK_CONFIG="http://localhost:8321" \
python -m pytest -v \
tests/integration/openai_responses/test_openai_responses.py \
--text-model "meta-llama/Llama-3.2-3B-Instruct"
```
---------
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>