One needed to specify record-replay related environment variables for
running integration tests. We could not use defaults because integration
tests could be run against Ollama instances which could be running
different models. For example, text vs vision tests needed separate
instances of Ollama because a single instance typically cannot serve
both of these models if you assume the standard CI worker configuration
on Github. As a result, `client.list()` as returned by the Ollama client
would be different between these runs and we'd end up overwriting
responses.
This PR "solves" it by adding a small amount of complexity -- we store
model list responses specially, keyed by the hashes of the models they
return. At replay time, we merge all of them and pretend that we have
the union of all models available.
## Test Plan
Re-recorded all the tests using `scripts/integration-tests.sh
--inference-mode record`, including the vision tests.
See comment here:
https://github.com/llamastack/llama-stack/pull/3162#issuecomment-3192859097
-- TL;DR it is quite complex to invoke the recording workflow correctly
for an end developer writing tests. This script simplifies the work.
No more manual GitHub UI navigation!
## Script Functionality
- Auto-detects your current branch and associated PR
- Finds the right repository context (works from forks!)
- Runs the workflow where it can actually commit back
- Validates prerequisites and provides helpful error messages
## How to Use
First ensure you are on the branch which introduced a new test and want
it recorded. **Make sure you have pushed this branch remotely, easiest
is to create a PR.**
```
# Record tests for current branch
./scripts/github/schedule-record-workflow.sh
# Record specific test subdirectories
./scripts/github/schedule-record-workflow.sh --test-subdirs "agents,inference"
# Record with vision tests enabled
./scripts/github/schedule-record-workflow.sh --run-vision-tests
# Record tests matching a pattern
./scripts/github/schedule-record-workflow.sh --test-pattern "test_streaming"
```
## Test Plan
Ran `./scripts/github/schedule-record-workflow.sh -s inference -k
tool_choice` which started
4820409329
which successfully committed recorded outputs.
# What does this PR do?
Creates a structured testing documentation section with multiple detailed pages:
- Testing overview explaining the record-replay architecture
- Integration testing guide with practical usage examples
- Record-replay system technical documentation
- Guide for writing effective tests
- Troubleshooting guide for common testing issues
Hopefully this makes things a bit easier.
# What does this PR do?
* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
since inference providers are disabled by default and can be turned on
manually via env variable.
* Disables safety in starter distro
Closes: https://github.com/meta-llama/llama-stack/issues/2502.
~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~
TODO:
- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama
Signed-off-by: Sébastien Han <seb@redhat.com>
## Summary
Add support for `server:<config>` format in `--stack-config` option to
enable seamless one-step integration testing. This eliminates the need
to manually start servers in separate terminals before running tests.
## Key Features
- **Auto-start server**: Automatically launches `llama stack run
<config>` if target port is available
- **Smart reuse**: Reuses existing server if port is already occupied
- **Health check polling**: Waits up to 2 minutes for server readiness
via `/v1/health` endpoint
- **Custom port support**: Use `server:<config>:<port>` for non-default
ports
- **Clean output**: Server runs quietly in background without cluttering
test output
- **Backward compatibility**: All existing `--stack-config` formats
continue to work
## Usage Examples
```bash
# Auto-start server with default port 8321
pytest tests/integration/inference/ --stack-config=server:fireworks
# Use custom port
pytest tests/integration/safety/ --stack-config=server:together:8322
# Run multiple test suites seamlessly
pytest tests/integration/inference/ tests/integration/agents/ --stack-config=server:starter
```
## Implementation Details
- Enhanced `llama_stack_client` fixture with server management
- Updated documentation with cleaner organization and comprehensive
examples
- Added utility functions for port checking, server startup, and health
verification
## Test Plan
- Verified server auto-start when port 8321 is available
- Verified server reuse when port 8321 is occupied
- Tested health check polling via `/v1/health` endpoint
- Confirmed custom port configuration works correctly
- Verified backward compatibility with existing config formats
## Before/After Comparison
**Before (2 steps):**
```bash
# Terminal 1: Start server manually
llama stack run fireworks --port 8321
# Terminal 2: Wait for startup, then run tests
pytest tests/integration/inference/ --stack-config=http://localhost:8321
```
**After (1 step):**
```bash
# Single command handles everything
pytest tests/integration/inference/ --stack-config=server:fireworks
```
# What does this PR do?
reduces duplication and centralizes information to be easier to find for
contributors
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
* Added `--text-model` in example command.
* Added link to integration tests instruction and a note on specifying
models.
This is to avoid confusion when all tests are skipped because no model
is provided.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
It should use `export` for env var for api key.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
You now run the integration tests with these options:
```bash
Custom options:
--stack-config=STACK_CONFIG
a 'pointer' to the stack. this can be either be:
(a) a template name like `fireworks`, or
(b) a path to a run.yaml file, or
(c) an adhoc config spec, e.g.
`inference=fireworks,safety=llama-guard,agents=meta-
reference`
--env=ENV Set environment variables, e.g. --env KEY=value
--text-model=TEXT_MODEL
comma-separated list of text models. Fixture name:
text_model_id
--vision-model=VISION_MODEL
comma-separated list of vision models. Fixture name:
vision_model_id
--embedding-model=EMBEDDING_MODEL
comma-separated list of embedding models. Fixture name:
embedding_model_id
--safety-shield=SAFETY_SHIELD
comma-separated list of safety shields. Fixture name:
shield_id
--judge-model=JUDGE_MODEL
comma-separated list of judge models. Fixture name:
judge_model_id
--embedding-dimension=EMBEDDING_DIMENSION
Output dimensionality of the embedding model to use for
testing. Default: 384
--record-responses Record new API responses instead of using cached ones.
--report=REPORT Path where the test report should be written, e.g.
--report=/path/to/report.md
```
Importantly, if you don't specify any of the models (text-model,
vision-model, etc.) the relevant tests will get **skipped!**
This will make running tests somewhat more annoying since all options
will need to be specified. We will make this easier by adding some easy
wrapper yaml configs.
## Test Plan
Example:
```bash
ashwin@ashwin-mbp ~/local/llama-stack/tests/integration (unify_tests) $
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/test_text_inference.py \
--text-model meta-llama/Llama-3.2-3B-Instruct
```