mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-12 05:54:38 +00:00
feat(tests): auto-merge all model list responses and unify recordings (#3320)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 1m7s
Pre-commit / pre-commit (push) Successful in 2m34s
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 1m7s
Pre-commit / pre-commit (push) Successful in 2m34s
One needed to specify record-replay related environment variables for running integration tests. We could not use defaults because integration tests could be run against Ollama instances which could be running different models. For example, text vs vision tests needed separate instances of Ollama because a single instance typically cannot serve both of these models if you assume the standard CI worker configuration on Github. As a result, `client.list()` as returned by the Ollama client would be different between these runs and we'd end up overwriting responses. This PR "solves" it by adding a small amount of complexity -- we store model list responses specially, keyed by the hashes of the models they return. At replay time, we merge all of them and pretend that we have the union of all models available. ## Test Plan Re-recorded all the tests using `scripts/integration-tests.sh --inference-mode record`, including the vision tests.
This commit is contained in:
parent
d948e63340
commit
c3d3a0b833
122 changed files with 17460 additions and 16129 deletions
|
@ -38,26 +38,15 @@ For running integration tests, you must provide a few things:
|
|||
- a distribution name (e.g., `starter`) or a path to a `run.yaml` file
|
||||
- a comma-separated list of api=provider pairs, e.g. `inference=fireworks,safety=llama-guard,agents=meta-reference`. This is most useful for testing a single API surface.
|
||||
|
||||
- Whether you are using replay or live mode for inference. This is specified with the LLAMA_STACK_TEST_INFERENCE_MODE environment variable. The default mode currently is "live" -- that is certainly surprising, but we will fix this soon.
|
||||
|
||||
- Any API keys you need to use should be set in the environment, or can be passed in with the --env option.
|
||||
|
||||
You can run the integration tests in replay mode with:
|
||||
```bash
|
||||
# Run all tests with existing recordings
|
||||
LLAMA_STACK_TEST_INFERENCE_MODE=replay \
|
||||
LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings \
|
||||
uv run --group test \
|
||||
pytest -sv tests/integration/ --stack-config=starter
|
||||
```
|
||||
|
||||
If you don't specify LLAMA_STACK_TEST_INFERENCE_MODE, by default it will be in "live" mode -- that is, it will make real API calls.
|
||||
|
||||
```bash
|
||||
# Test against live APIs
|
||||
FIREWORKS_API_KEY=your_key pytest -sv tests/integration/inference --stack-config=starter
|
||||
```
|
||||
|
||||
### Re-recording tests
|
||||
|
||||
#### Local Re-recording (Manual Setup Required)
|
||||
|
@ -66,7 +55,6 @@ If you want to re-record tests locally, you can do so with:
|
|||
|
||||
```bash
|
||||
LLAMA_STACK_TEST_INFERENCE_MODE=record \
|
||||
LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings \
|
||||
uv run --group test \
|
||||
pytest -sv tests/integration/ --stack-config=starter -k "<appropriate test name>"
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue