## Summary
- Link pre-commit bot comment to workflow run instead of PR for better
debugging
- Dump docker container logs before removal to ensure logs are actually
captured
## Changes
1. **Pre-commit bot**: Changed the initial bot comment to link
"pre-commit hooks" text to the actual workflow run URL instead of just
having the PR number auto-link
2. **Docker logs**: Moved docker container log dumping from GitHub
Actions to the integration-tests.sh script's stop_container() function,
ensuring logs are captured before container removal
## Test plan
- Pre-commit bot comment will now have a clickable link to the workflow
run
- Docker container logs will be successfully captured in CI runs
**This PR changes configurations in a backward incompatible way.**
Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.
## Key Changes
- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.
## Migration
Before:
```yaml
metadata_store:
type: sqlite
db_path: ~/.llama/distributions/foo/registry.db
inference_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
conversations_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
```
After:
```yaml
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/distributions/foo/kvstore.db
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
stores:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
max_write_queue_size: 10000
num_writers: 4
conversations:
backend: sql_default
table_name: openai_conversations
```
Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
kvstore:
type: sqlite
db_path: ~/.llama/distributions/foo/chroma.db
```
to:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
persistence:
backend: kv_default
namespace: vector_io::chroma_remote
```
Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
Fixed CI job to check the correct directory for file changes Artifacts
are now stored in multiple directories not just
./tests/integration/recordings
Signed-off-by: Derek Higgins <derekh@redhat.com>
This PR refactors the integration test system to use global "setups"
which provides better separation of concerns:
**suites = what to test, setups = how to configure.**
NOTE: if you naming suggestions, please provide feedback
Changes:
- New `tests/integration/setups.py` with global, reusable configurations
(ollama, vllm, gpt, claude)
- Modified `scripts/integration-tests.sh` options to match with the
underlying pytest options
- Updated documentation to reflect the new global setup system
The main benefit is that setups can be reused across multiple suites
(e.g., use "gpt" with any suite) even though sometimes they could
specifically tailored for a suite (vision <> ollama-vision). It is now
easier to add new configurations without modifying existing suites.
Usage examples:
- `pytest tests/integration --suite=responses --setup=gpt`
- `pytest tests/integration --suite=vision` # auto-selects
"ollama-vision" setup
- `pytest tests/integration --suite=base --setup=vllm`
Our integration tests need to be 'grouped' because each group often
needs a specific set of models it works with. We separated vision tests
due to this, and we have a separate set of tests which test "Responses"
API.
This PR makes this system a bit more official so it is very easy to
target these groups and apply all testing infrastructure towards all the
groups (for example, record-replay) uniformly.
There are three suites declared:
- base
- vision
- responses
Note that our CI currently runs the "base" and "vision" suites.
You can use the `--suite` option when running pytest (or any of the
testing scripts or workflows.) For example:
```
OLLAMA_URL=http://localhost:11434 \
pytest -s -v tests/integration/ --stack-config starter --suite vision
```
I started this PR trying to unbreak a newly broken test
`test_agent_name`. This test was broken all along but did not show up
because during testing we were pulling the "non-updated" llama stack
client. See this comment:
https://github.com/llamastack/llama-stack/pull/3119#discussion_r2270988205
While fixing this, I encountered a large amount of badness in our CI
workflow definitions.
- We weren't passing `LLAMA_STACK_DIR` or `LLAMA_STACK_CLIENT_DIR`
overrides to `llama stack build` at all in some cases.
- Even when we did, we used `uv run` liberally. The first thing `uv run`
does is "syncs" the project environment. This means, it is going to undo
any mutations we might have done ourselves. But we make many mutations
in our CI runners to these environments. The most important of which is
why `llama stack build` where we install distro dependencies. As a
result, when you tried to run the integration tests, you would see old,
strange versions.
## Test Plan
Re-record using:
```
sh scripts/integration-tests.sh --stack-config ci-tests \
--provider ollama --test-pattern test_agent_name --inference-mode record
```
Then re-run with `--inference-mode replay`. But:
Eventually, this test turned out to be quite flaky for telemetry
reasons. I haven't investigated it for now and just disabled it sadly
since we have a release to push out.
# What does this PR do?
Recording tests has become a nightmare. This is the first part of making
that process simpler by making it _less_ automatic. I tried to be too
clever earlier.
It simplifies the record-integration-tests workflow to use workflow
dispatch inputs instead of PR labels. No more opaque stuff. Just go to
the GitHub UI and run the workflow with inputs. I will soon add a helper
script for this also.
Other things to aid re-running just the small set of things you need to
re-record:
- Replaces the `test-types` JSON array parameter with a more intuitive
`test-subdirs` comma-separated list. The whole JSON array crap was for
matrix.
- Adds a new `test-pattern` parameter to allow filtering tests using
pytest's `-k` option
## Test Plan
Note that this PR is in a fork not the source repository.
- Replay tests on this PR are green
- Manually
[ran](1699856292)
the replay workflow with a test-subdir and test-pattern filter, worked
- Manually
[ran](4819508034)
the **record** workflow with a simple pattern, it has worked and updated
_this_ PR.
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
We are going to split record and replay workflows completely to simplify
the concurrency key design.
We can add vision tests by just adding to our matrix.
This PR significantly refactors the Integration Tests workflow. The main
goal behind the PR was to enable recording of vision tests which were
never run as part of our CI ever before. During debugging, I ended up
making several other changes refactoring and hopefully increasing the
robustness of the workflow.
After doing the experiments, I have updated the trigger event to be
`pull_request_target` so this workflow can get write permissions by
default but it will run with source code from the base (main) branch in
the source repository only. If you do change the workflow, you'd need to
experiment using the `workflow_dispatch` triggers. This should not be
news to anyone using Github Actions (except me!)
It is likely to be a little rocky though while I learn more about GitHub
Actions, etc. Please be patient :)
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>