Commit graph

3 commits

Author SHA1 Message Date
Ashwin Bharambe
a8aa815b6a
feat(tests): migrate to global "setups" system for test configuration (#3390)
This PR refactors the integration test system to use global "setups"
which provides better separation of concerns:

**suites = what to test, setups = how to configure.**

NOTE: if you naming suggestions, please provide feedback

Changes:
- New `tests/integration/setups.py` with global, reusable configurations
(ollama, vllm, gpt, claude)
- Modified `scripts/integration-tests.sh` options to match with the
underlying pytest options
    - Updated documentation to reflect the new global setup system

The main benefit is that setups can be reused across multiple suites
(e.g., use "gpt" with any suite) even though sometimes they could
specifically tailored for a suite (vision <> ollama-vision). It is now
easier to add new configurations without modifying existing suites.

Usage examples:
    - `pytest tests/integration --suite=responses --setup=gpt`
- `pytest tests/integration --suite=vision` # auto-selects
"ollama-vision" setup
    - `pytest tests/integration --suite=base --setup=vllm`
2025-09-09 15:50:56 -07:00
Ashwin Bharambe
47b640370e
feat(tests): introduce a test "suite" concept to encompass dirs, options (#3339)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m15s
Our integration tests need to be 'grouped' because each group often
needs a specific set of models it works with. We separated vision tests
due to this, and we have a separate set of tests which test "Responses"
API.

This PR makes this system a bit more official so it is very easy to
target these groups and apply all testing infrastructure towards all the
groups (for example, record-replay) uniformly.

There are three suites declared:
- base
- vision
- responses

Note that our CI currently runs the "base" and "vision" suites.

You can use the `--suite` option when running pytest (or any of the
testing scripts or workflows.) For example:
```
OLLAMA_URL=http://localhost:11434 \
  pytest -s -v tests/integration/ --stack-config starter --suite vision
```
2025-09-05 13:58:49 -07:00
Ashwin Bharambe
5e7c2250be
test(recording): add a script to schedule recording workflow (#3170)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Vector IO Integration Tests / test-matrix (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Test External API and Providers / test-external (venv) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Pre-commit / pre-commit (push) Successful in 1m19s
See comment here:
https://github.com/llamastack/llama-stack/pull/3162#issuecomment-3192859097
-- TL;DR it is quite complex to invoke the recording workflow correctly
for an end developer writing tests. This script simplifies the work.

No more manual GitHub UI navigation!

## Script Functionality

  - Auto-detects your current branch and associated PR
  - Finds the right repository context (works from forks!)
  - Runs the workflow where it can actually commit back
  - Validates prerequisites and provides helpful error messages

## How to Use

First ensure you are on the branch which introduced a new test and want
it recorded. **Make sure you have pushed this branch remotely, easiest
is to create a PR.**

```
  # Record tests for current branch
  ./scripts/github/schedule-record-workflow.sh

  # Record specific test subdirectories
  ./scripts/github/schedule-record-workflow.sh --test-subdirs "agents,inference"

  # Record with vision tests enabled
  ./scripts/github/schedule-record-workflow.sh --run-vision-tests

  # Record tests matching a pattern
  ./scripts/github/schedule-record-workflow.sh --test-pattern "test_streaming"
```

## Test Plan

Ran `./scripts/github/schedule-record-workflow.sh -s inference -k
tool_choice` which started
4820409329
which successfully committed recorded outputs.
2025-08-15 16:54:34 -07:00