mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-07 02:47:21 +00:00
refactor(test): introduce --stack-config and simplify options (#1404)
You now run the integration tests with these options:
```bash
Custom options:
--stack-config=STACK_CONFIG
a 'pointer' to the stack. this can be either be:
(a) a template name like `fireworks`, or
(b) a path to a run.yaml file, or
(c) an adhoc config spec, e.g.
`inference=fireworks,safety=llama-guard,agents=meta-
reference`
--env=ENV Set environment variables, e.g. --env KEY=value
--text-model=TEXT_MODEL
comma-separated list of text models. Fixture name:
text_model_id
--vision-model=VISION_MODEL
comma-separated list of vision models. Fixture name:
vision_model_id
--embedding-model=EMBEDDING_MODEL
comma-separated list of embedding models. Fixture name:
embedding_model_id
--safety-shield=SAFETY_SHIELD
comma-separated list of safety shields. Fixture name:
shield_id
--judge-model=JUDGE_MODEL
comma-separated list of judge models. Fixture name:
judge_model_id
--embedding-dimension=EMBEDDING_DIMENSION
Output dimensionality of the embedding model to use for
testing. Default: 384
--record-responses Record new API responses instead of using cached ones.
--report=REPORT Path where the test report should be written, e.g.
--report=/path/to/report.md
```
Importantly, if you don't specify any of the models (text-model,
vision-model, etc.) the relevant tests will get **skipped!**
This will make running tests somewhat more annoying since all options
will need to be specified. We will make this easier by adding some easy
wrapper yaml configs.
## Test Plan
Example:
```bash
ashwin@ashwin-mbp ~/local/llama-stack/tests/integration (unify_tests) $
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/test_text_inference.py \
--text-model meta-llama/Llama-3.2-3B-Instruct
```
This commit is contained in:
parent
a0d6b165b0
commit
2fe976ed0a
15 changed files with 536 additions and 1144 deletions
|
|
@ -1,31 +1,87 @@
|
|||
# Llama Stack Integration Tests
|
||||
You can run llama stack integration tests on either a Llama Stack Library or a Llama Stack endpoint.
|
||||
|
||||
To test on a Llama Stack library with certain configuration, run
|
||||
We use `pytest` for parameterizing and running tests. You can see all options with:
|
||||
```bash
|
||||
LLAMA_STACK_CONFIG=./llama_stack/templates/cerebras/run.yaml pytest -s -v tests/api/inference/
|
||||
```
|
||||
or just the template name
|
||||
```bash
|
||||
LLAMA_STACK_CONFIG=together pytest -s -v tests/api/inference/
|
||||
cd tests/integration
|
||||
|
||||
# this will show a long list of options, look for "Custom options:"
|
||||
pytest --help
|
||||
```
|
||||
|
||||
To test on a Llama Stack endpoint, run
|
||||
Here are the most important options:
|
||||
- `--stack-config`: specify the stack config to use. You have three ways to point to a stack:
|
||||
- a URL which points to a Llama Stack distribution server
|
||||
- a template (e.g., `fireworks`, `together`) or a path to a run.yaml file
|
||||
- a comma-separated list of api=provider pairs, e.g. `inference=fireworks,safety=llama-guard,agents=meta-reference`. This is most useful for testing a single API surface.
|
||||
- `--env`: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.
|
||||
|
||||
Model parameters can be influenced by the following options:
|
||||
- `--text-model`: comma-separated list of text models.
|
||||
- `--vision-model`: comma-separated list of vision models.
|
||||
- `--embedding-model`: comma-separated list of embedding models.
|
||||
- `--safety-shield`: comma-separated list of safety shields.
|
||||
- `--judge-model`: comma-separated list of judge models.
|
||||
- `--embedding-dimension`: output dimensionality of the embedding model to use for testing. Default: 384
|
||||
|
||||
Each of these are comma-separated lists and can be used to generate multiple parameter combinations.
|
||||
|
||||
|
||||
Experimental, under development, options:
|
||||
- `--record-responses`: record new API responses instead of using cached ones
|
||||
- `--report`: path where the test report should be written, e.g. --report=/path/to/report.md
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
Run all text inference tests with the `together` distribution:
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_BASE_URL=http://localhost:8089 pytest -s -v tests/api/inference
|
||||
pytest -s -v tests/api/inference/test_text_inference.py \
|
||||
--stack-config=together \
|
||||
--text-model=meta-llama/Llama-3.1-8B-Instruct
|
||||
```
|
||||
|
||||
## Report Generation
|
||||
Run all text inference tests with the `together` distribution and `meta-llama/Llama-3.1-8B-Instruct`:
|
||||
|
||||
To generate a report, run with `--report` option
|
||||
```bash
|
||||
LLAMA_STACK_CONFIG=together pytest -s -v report.md tests/api/ --report
|
||||
pytest -s -v tests/api/inference/test_text_inference.py \
|
||||
--stack-config=together \
|
||||
--text-model=meta-llama/Llama-3.1-8B-Instruct
|
||||
```
|
||||
|
||||
## Common options
|
||||
Depending on the API, there are custom options enabled
|
||||
- For tests in `inference/` and `agents/, we support `--inference-model` (to be used in text inference tests) and `--vision-inference-model` (only used in image inference tests) overrides
|
||||
- For tests in `vector_io/`, we support `--embedding-model` override
|
||||
- For tests in `safety/`, we support `--safety-shield` override
|
||||
- The param can be `--report` or `--report <path>`
|
||||
If path is not provided, we do a best effort to infer based on the config / template name. For url endpoints, path is required.
|
||||
Running all inference tests for a number of models:
|
||||
|
||||
```bash
|
||||
TEXT_MODELS=meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct
|
||||
VISION_MODELS=meta-llama/Llama-3.2-11B-Vision-Instruct
|
||||
EMBEDDING_MODELS=all-MiniLM-L6-v2
|
||||
TOGETHER_API_KEY=...
|
||||
|
||||
pytest -s -v tests/api/inference/ \
|
||||
--stack-config=together \
|
||||
--text-model=$TEXT_MODELS \
|
||||
--vision-model=$VISION_MODELS \
|
||||
--embedding-model=$EMBEDDING_MODELS
|
||||
```
|
||||
|
||||
Same thing but instead of using the distribution, use an adhoc stack with just one provider (`fireworks` for inference):
|
||||
|
||||
```bash
|
||||
FIREWORKS_API_KEY=...
|
||||
|
||||
pytest -s -v tests/api/inference/ \
|
||||
--stack-config=inference=fireworks \
|
||||
--text-model=$TEXT_MODELS \
|
||||
--vision-model=$VISION_MODELS \
|
||||
--embedding-model=$EMBEDDING_MODELS
|
||||
```
|
||||
|
||||
Running Vector IO tests for a number of embedding models:
|
||||
|
||||
```bash
|
||||
EMBEDDING_MODELS=all-MiniLM-L6-v2
|
||||
|
||||
pytest -s -v tests/api/vector_io/ \
|
||||
--stack-config=inference=sentence-transformers,vector_io=sqlite-vec \
|
||||
--embedding-model=$EMBEDDING_MODELS
|
||||
```
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue