forked from phoenix-oss/llama-stack-mirror
refactor(test): introduce --stack-config and simplify options (#1404)
You now run the integration tests with these options: ```bash Custom options: --stack-config=STACK_CONFIG a 'pointer' to the stack. this can be either be: (a) a template name like `fireworks`, or (b) a path to a run.yaml file, or (c) an adhoc config spec, e.g. `inference=fireworks,safety=llama-guard,agents=meta- reference` --env=ENV Set environment variables, e.g. --env KEY=value --text-model=TEXT_MODEL comma-separated list of text models. Fixture name: text_model_id --vision-model=VISION_MODEL comma-separated list of vision models. Fixture name: vision_model_id --embedding-model=EMBEDDING_MODEL comma-separated list of embedding models. Fixture name: embedding_model_id --safety-shield=SAFETY_SHIELD comma-separated list of safety shields. Fixture name: shield_id --judge-model=JUDGE_MODEL comma-separated list of judge models. Fixture name: judge_model_id --embedding-dimension=EMBEDDING_DIMENSION Output dimensionality of the embedding model to use for testing. Default: 384 --record-responses Record new API responses instead of using cached ones. --report=REPORT Path where the test report should be written, e.g. --report=/path/to/report.md ``` Importantly, if you don't specify any of the models (text-model, vision-model, etc.) the relevant tests will get **skipped!** This will make running tests somewhat more annoying since all options will need to be specified. We will make this easier by adding some easy wrapper yaml configs. ## Test Plan Example: ```bash ashwin@ashwin-mbp ~/local/llama-stack/tests/integration (unify_tests) $ LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/test_text_inference.py \ --text-model meta-llama/Llama-3.2-3B-Instruct ```
This commit is contained in:
parent
a0d6b165b0
commit
2fe976ed0a
15 changed files with 536 additions and 1144 deletions
|
@ -1,31 +1,87 @@
|
|||
# Llama Stack Integration Tests
|
||||
You can run llama stack integration tests on either a Llama Stack Library or a Llama Stack endpoint.
|
||||
|
||||
To test on a Llama Stack library with certain configuration, run
|
||||
We use `pytest` for parameterizing and running tests. You can see all options with:
|
||||
```bash
|
||||
LLAMA_STACK_CONFIG=./llama_stack/templates/cerebras/run.yaml pytest -s -v tests/api/inference/
|
||||
```
|
||||
or just the template name
|
||||
```bash
|
||||
LLAMA_STACK_CONFIG=together pytest -s -v tests/api/inference/
|
||||
cd tests/integration
|
||||
|
||||
# this will show a long list of options, look for "Custom options:"
|
||||
pytest --help
|
||||
```
|
||||
|
||||
To test on a Llama Stack endpoint, run
|
||||
Here are the most important options:
|
||||
- `--stack-config`: specify the stack config to use. You have three ways to point to a stack:
|
||||
- a URL which points to a Llama Stack distribution server
|
||||
- a template (e.g., `fireworks`, `together`) or a path to a run.yaml file
|
||||
- a comma-separated list of api=provider pairs, e.g. `inference=fireworks,safety=llama-guard,agents=meta-reference`. This is most useful for testing a single API surface.
|
||||
- `--env`: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.
|
||||
|
||||
Model parameters can be influenced by the following options:
|
||||
- `--text-model`: comma-separated list of text models.
|
||||
- `--vision-model`: comma-separated list of vision models.
|
||||
- `--embedding-model`: comma-separated list of embedding models.
|
||||
- `--safety-shield`: comma-separated list of safety shields.
|
||||
- `--judge-model`: comma-separated list of judge models.
|
||||
- `--embedding-dimension`: output dimensionality of the embedding model to use for testing. Default: 384
|
||||
|
||||
Each of these are comma-separated lists and can be used to generate multiple parameter combinations.
|
||||
|
||||
|
||||
Experimental, under development, options:
|
||||
- `--record-responses`: record new API responses instead of using cached ones
|
||||
- `--report`: path where the test report should be written, e.g. --report=/path/to/report.md
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
Run all text inference tests with the `together` distribution:
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_BASE_URL=http://localhost:8089 pytest -s -v tests/api/inference
|
||||
pytest -s -v tests/api/inference/test_text_inference.py \
|
||||
--stack-config=together \
|
||||
--text-model=meta-llama/Llama-3.1-8B-Instruct
|
||||
```
|
||||
|
||||
## Report Generation
|
||||
Run all text inference tests with the `together` distribution and `meta-llama/Llama-3.1-8B-Instruct`:
|
||||
|
||||
To generate a report, run with `--report` option
|
||||
```bash
|
||||
LLAMA_STACK_CONFIG=together pytest -s -v report.md tests/api/ --report
|
||||
pytest -s -v tests/api/inference/test_text_inference.py \
|
||||
--stack-config=together \
|
||||
--text-model=meta-llama/Llama-3.1-8B-Instruct
|
||||
```
|
||||
|
||||
## Common options
|
||||
Depending on the API, there are custom options enabled
|
||||
- For tests in `inference/` and `agents/, we support `--inference-model` (to be used in text inference tests) and `--vision-inference-model` (only used in image inference tests) overrides
|
||||
- For tests in `vector_io/`, we support `--embedding-model` override
|
||||
- For tests in `safety/`, we support `--safety-shield` override
|
||||
- The param can be `--report` or `--report <path>`
|
||||
If path is not provided, we do a best effort to infer based on the config / template name. For url endpoints, path is required.
|
||||
Running all inference tests for a number of models:
|
||||
|
||||
```bash
|
||||
TEXT_MODELS=meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct
|
||||
VISION_MODELS=meta-llama/Llama-3.2-11B-Vision-Instruct
|
||||
EMBEDDING_MODELS=all-MiniLM-L6-v2
|
||||
TOGETHER_API_KEY=...
|
||||
|
||||
pytest -s -v tests/api/inference/ \
|
||||
--stack-config=together \
|
||||
--text-model=$TEXT_MODELS \
|
||||
--vision-model=$VISION_MODELS \
|
||||
--embedding-model=$EMBEDDING_MODELS
|
||||
```
|
||||
|
||||
Same thing but instead of using the distribution, use an adhoc stack with just one provider (`fireworks` for inference):
|
||||
|
||||
```bash
|
||||
FIREWORKS_API_KEY=...
|
||||
|
||||
pytest -s -v tests/api/inference/ \
|
||||
--stack-config=inference=fireworks \
|
||||
--text-model=$TEXT_MODELS \
|
||||
--vision-model=$VISION_MODELS \
|
||||
--embedding-model=$EMBEDDING_MODELS
|
||||
```
|
||||
|
||||
Running Vector IO tests for a number of embedding models:
|
||||
|
||||
```bash
|
||||
EMBEDDING_MODELS=all-MiniLM-L6-v2
|
||||
|
||||
pytest -s -v tests/api/vector_io/ \
|
||||
--stack-config=inference=sentence-transformers,vector_io=sqlite-vec \
|
||||
--embedding-model=$EMBEDDING_MODELS
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue