Commit graph

10 commits

Author SHA1 Message Date
grs
da73f1a180
fix: ensure assistant message is followed by tool call message as expected by openai (#3224)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s
Test Llama Stack Build / generate-matrix (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 23s
Test Llama Stack Build / build (push) Has been skipped
Update ReadTheDocs / update-readthedocs (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s
# What does this PR do?

As described in #3134 a langchain example works against openai's
responses impl, but not against llama stack's. This turned out to be due
to the order of the inputs. The langchain example has the two function
call outputs first, followed by each call result in turn. This seems to
be valid as it is accepted by openai's impl. However in llama stack,
these inputs are converted to chat completion inputs and the resulting
order for that api is not accpeted by openai.

This PR fixes the issue by ensuring that the converted chat completions
inputs are in the expected order.

Closes #3134 

## Test Plan
Added unit and integration tests. Verified this fixes original issue as
reported.

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-08-22 10:42:03 -07:00
Mustafa Elbehery
1790fc0f25
feat: Remove initialize() Method from LlamaStackAsLibrary (#2979)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR removes `init()` from `LlamaStackAsLibrary` 

Currently client.initialize() had to be invoked by user.
To improve dev experience and to avoid runtime errors, this PR init
LlamaStackAsLibrary implicitly upon using the client.
It prevents also multiple init of the same client, while maintaining
backward ccompatibility.

This PR does the following 

- Automatic Initialization: Constructor calls initialize_impl()
automatically.
-  Client is fully initialized after __init__ completes.
- Prevents consecutive initialization after the client has been
successfully initialized.
-  initialize() method still exists but is now a no-op.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
fixes https://github.com/meta-llama/llama-stack/issues/2946

---------

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-21 15:59:04 -07:00
grs
14082b22af
fix: handle mcp tool calls in previous response correctly (#3155)
# What does this PR do?

Handles MCP tool calls in a previous response

Closes #3105

## Test Plan
Made call to create response with tool call, then made second call with
the first linked through previous_response_id. Did not get error.

Also added unit test.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-08-20 14:12:15 -07:00
ashwinb
8ed69978f9
refactor(tests): make the responses tests nicer (#3161)
# What does this PR do?

A _bunch_ on cleanup for the Responses tests.

- Got rid of YAML test cases, moved them to just use simple pydantic models
- Splitting the large monolithic test file into multiple focused test files:
   - `test_basic_responses.py` for basic and image response tests
   - `test_tool_responses.py` for tool-related tests
   - `test_file_search.py` for file search specific tests
- Adding a `StreamingValidator` helper class to standardize streaming response validation

## Test Plan

Run the tests:

```
pytest -s -v tests/integration/non_ci/responses/ \
   --stack-config=starter \
   --text-model openai/gpt-4o \
   --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
    -k "client_with_models"
```
2025-08-15 00:05:36 +00:00
ashwinb
ba664474de
feat(responses): add mcp list tool streaming event (#3159)
# What does this PR do?

Adds proper streaming events for MCP tool listing (`mcp_list_tools.in_progress` and `mcp_list_tools.completed`). Also refactors things a bit more.

## Test Plan

Verified existing integration tests pass with the refactored code. The test `test_response_streaming_multi_turn_tool_execution` has been updated to check for the new MCP list tools streaming events
2025-08-15 00:05:36 +00:00
Ashwin Bharambe
e1e161553c
feat(responses): add MCP argument streaming and content part events (#3136)
# What does this PR do?

Adds content part streaming events to the OpenAI-compatible Responses API to support more granular streaming of response content. This introduces:

1. New schema types for content parts: `OpenAIResponseContentPart` with variants for text output and refusals

2. New streaming event types:
   - `OpenAIResponseObjectStreamResponseContentPartAdded` for when content parts begin
   - `OpenAIResponseObjectStreamResponseContentPartDone` for when content parts complete

3. Implementation in the reference provider to emit these events during streaming responses. Also emits MCP arguments just like function call ones.


## Test Plan

Updated existing streaming tests to verify content part events are properly emitted
2025-08-13 16:34:26 -07:00
Ashwin Bharambe
8638537d14
feat(responses): stream progress of tool calls (#3135)
# What does this PR do?
Enhances tool execution streaming by adding support for real-time progress events during tool calls. This implementation adds streaming events for MCP and web search tools, including in-progress, searching, completed, and failed states. 

The refactored `_execute_tool_call` method now returns an async iterator that yields streaming events throughout the tool execution lifecycle.

## Test Plan
Updated the integration test `test_response_streaming_multi_turn_tool_execution` to verify the presence and structure of new streaming events, including:
- Checking for MCP in-progress and completed events
- Verifying that progress events contain required fields (item_id, output_index, sequence_number)
- Ensuring completed events have the necessary sequence_number field
2025-08-13 16:31:25 -07:00
Ashwin Bharambe
5b312a80b9
feat(responses): improve streaming for function calls (#3124)
Some checks failed
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Python Package Build Test / build (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 22s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m10s
Test Llama Stack Build / build (push) Failing after 12s
Emit streaming events for function calls

## Test Plan

Improved the test case
2025-08-13 11:23:27 -07:00
Ashwin Bharambe
3d90117891
chore(tests): fix responses and vector_io tests (#3119)
Some fixes to MCP tests. And a bunch of fixes for Vector providers.

I also enabled a bunch of Vector IO tests to be used with
`LlamaStackLibraryClient`

## Test Plan

Run Responses tests with llama stack library client:
```
pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \
   --text-model openai/gpt-4o \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
  -k "client_with_models"
```

Do the same with `-k openai_client`

The rest should be taken care of by CI.
2025-08-12 16:15:53 -07:00
Ashwin Bharambe
5f1ddd35e4
chore(tests): refactor and move responses tests away from verifications (#3068)
This PR kills the verifications infrastructure which is no longer used.
It was relocated to the `llama-stack-evals`
(https://github.com/meta-llama/llama-stack-evals) repository previously.

Responses tests used this infrastructure but that wasn't quite
necessary, just a little useful back when @bbrownin introduced the
tests. On Discord, we agreed that tests can be moved to our regular
integrations test infra.

## Test Plan

Some tests currently do fail (although they run!) I will send a
follow-up PR which makes them all pass.
2025-08-07 13:48:16 -07:00