llama-stack-mirror

2901 commits 156 branches 116 tags 155 MiB

Author	SHA1	Message	Date
Ashwin Bharambe	e9b4278a51	feat(responses)!: improve responses + conversations implementations (#3810 ) This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from https://github.com/llamastack/llama-stack-client-python/pull/281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ```	2025-10-15 09:36:11 -07:00
Ashwin Bharambe	f50ce11a3b	feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403 ) Renames `inference_recorder.py` to `api_recorder.py` and extends it to support recording/replaying tool invocations in addition to inference calls. This allows us to record web-search, etc. tool calls and thereafter apply recordings for `tests/integration/responses` ## Test Plan ``` export OPENAI_API_KEY=... export TAVILY_SEARCH_API_KEY=... ./scripts/integration-tests.sh --stack-config ci-tests \ --suite responses --inference-mode record-if-missing ```	2025-10-09 14:27:51 -07:00
Ashwin Bharambe	47b640370e	feat(tests): introduce a test "suite" concept to encompass dirs, options (#3339 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m15s Details Our integration tests need to be 'grouped' because each group often needs a specific set of models it works with. We separated vision tests due to this, and we have a separate set of tests which test "Responses" API. This PR makes this system a bit more official so it is very easy to target these groups and apply all testing infrastructure towards all the groups (for example, record-replay) uniformly. There are three suites declared: - base - vision - responses Note that our CI currently runs the "base" and "vision" suites. You can use the `--suite` option when running pytest (or any of the testing scripts or workflows.) For example: ``` OLLAMA_URL=http://localhost:11434 \ pytest -s -v tests/integration/ --stack-config starter --suite vision ```	2025-09-05 13:58:49 -07:00

Author

SHA1

Message

Date

Ashwin Bharambe

e9b4278a51

feat(responses)!: improve responses + conversations implementations (#3810 )

This PR updates the Conversation item related types and improves a
couple critical parts of the implemenation:

- it creates a streaming output item for the final assistant message
output by
  the model. until now we only added content parts and included that
  message in the final response.

- rewrites the conversation update code completely to account for items
  other than messages (tool calls, outputs, etc.)

## Test Plan

Used the test script from
https://github.com/llamastack/llama-stack-client-python/pull/281 for
this

```
TEST_API_BASE_URL=http://localhost:8321/v1 \
  pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs
```

2025-10-15 09:36:11 -07:00

Ashwin Bharambe

f50ce11a3b

feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403 )

Renames `inference_recorder.py` to `api_recorder.py` and extends it to
support recording/replaying tool invocations in addition to inference
calls.

This allows us to record web-search, etc. tool calls and thereafter
apply recordings for `tests/integration/responses`

## Test Plan

```
export OPENAI_API_KEY=...
export TAVILY_SEARCH_API_KEY=...

./scripts/integration-tests.sh --stack-config ci-tests \
   --suite responses --inference-mode record-if-missing
```

2025-10-09 14:27:51 -07:00

Ashwin Bharambe

47b640370e

feat(tests): introduce a test "suite" concept to encompass dirs, options (#3339 )

Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s

Details

SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s

Details

Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped

Details

Python Package Build Test / build (3.13) (push) Failing after 1s

Details

SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s

Details

Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s

Details

Vector IO Integration Tests / test-matrix (push) Failing after 4s

Details

Python Package Build Test / build (3.12) (push) Failing after 3s

Details

Test External API and Providers / test-external (venv) (push) Failing after 4s

Details

Unit Tests / unit-tests (3.12) (push) Failing after 4s

Details

Unit Tests / unit-tests (3.13) (push) Failing after 3s

Details

UI Tests / ui-tests (22) (push) Successful in 33s

Details

Pre-commit / pre-commit (push) Successful in 1m15s

Details

Our integration tests need to be 'grouped' because each group often
needs a specific set of models it works with. We separated vision tests
due to this, and we have a separate set of tests which test "Responses"
API.

This PR makes this system a bit more official so it is very easy to
target these groups and apply all testing infrastructure towards all the
groups (for example, record-replay) uniformly.

There are three suites declared:
- base
- vision
- responses

Note that our CI currently runs the "base" and "vision" suites.

You can use the `--suite` option when running pytest (or any of the
testing scripts or workflows.) For example:
```
OLLAMA_URL=http://localhost:11434 \
  pytest -s -v tests/integration/ --stack-config starter --suite vision
```

2025-09-05 13:58:49 -07:00

Renamed from tests/integration/non_ci/responses/fixtures/test_cases.py (Browse further)

3 commits