Commit graph

5 commits

Author SHA1 Message Date
Ashwin Bharambe
1990df2c50 feat: add function tools to openai responses 2025-05-08 07:03:47 -04:00
Ben Browning
b90bb66f28 fix: Restore previous responses to input list, not messages
This adjusts the restoration of previous responses to prepend them to
the list of Responses API inputs instead of our converted list of Chat
Completion messages. This matches the expected behavior of the
Responses API, and I misinterpreted the nuances here in the initial implementation.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-05-08 07:03:47 -04:00
Ben Browning
5b2e850754 fix: Responses API previous_response input items
This adds storing of input items with previous responses and then
restores those input items to prepend to the user's messages list when
using conversation state.

I missed this in the initial implementation, but it makes sense that we
have to store the input items from previous responses so that we can
reconstruct the proper messages stack for multi-turn conversations -
just the output from previous responses isn't enough context for the
models to follow the turns and the original instructions.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-05-08 06:58:43 -04:00
Ihar Hrachyshka
9e6561a1ec
chore: enable pyupgrade fixes (#1806)
# What does this PR do?

The goal of this PR is code base modernization.

Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)

Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-01 14:23:50 -07:00
Ben Browning
8dfce2f596
feat: OpenAI Responses API (#1989)
# What does this PR do?

This provides an initial [OpenAI Responses
API](https://platform.openai.com/docs/api-reference/responses)
implementation. The API is not yet complete, and this is more a
proof-of-concept to show how we can store responses in our key-value
stores and use them to support the Responses API concepts like
`previous_response_id`.

## Test Plan

I've added a new
`tests/integration/openai_responses/test_openai_responses.py` as part of
a test-driven development for this new API. I'm only testing this
locally with the remote-vllm provider for now, but it should work with
any of our inference providers since the only API it requires out of the
inference provider is the `openai_chat_completion` endpoint.

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack build --template remote-vllm --image-type venv --run
```

```
LLAMA_STACK_CONFIG="http://localhost:8321" \
python -m pytest -v \
  tests/integration/openai_responses/test_openai_responses.py \
  --text-model "meta-llama/Llama-3.2-3B-Instruct"
 ```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-04-28 14:06:00 -07:00