mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-27 18:50:41 +00:00
You now run the integration tests with these options: ```bash Custom options: --stack-config=STACK_CONFIG a 'pointer' to the stack. this can be either be: (a) a template name like `fireworks`, or (b) a path to a run.yaml file, or (c) an adhoc config spec, e.g. `inference=fireworks,safety=llama-guard,agents=meta- reference` --env=ENV Set environment variables, e.g. --env KEY=value --text-model=TEXT_MODEL comma-separated list of text models. Fixture name: text_model_id --vision-model=VISION_MODEL comma-separated list of vision models. Fixture name: vision_model_id --embedding-model=EMBEDDING_MODEL comma-separated list of embedding models. Fixture name: embedding_model_id --safety-shield=SAFETY_SHIELD comma-separated list of safety shields. Fixture name: shield_id --judge-model=JUDGE_MODEL comma-separated list of judge models. Fixture name: judge_model_id --embedding-dimension=EMBEDDING_DIMENSION Output dimensionality of the embedding model to use for testing. Default: 384 --record-responses Record new API responses instead of using cached ones. --report=REPORT Path where the test report should be written, e.g. --report=/path/to/report.md ``` Importantly, if you don't specify any of the models (text-model, vision-model, etc.) the relevant tests will get **skipped!** This will make running tests somewhat more annoying since all options will need to be specified. We will make this easier by adding some easy wrapper yaml configs. ## Test Plan Example: ```bash ashwin@ashwin-mbp ~/local/llama-stack/tests/integration (unify_tests) $ LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/test_text_inference.py \ --text-model meta-llama/Llama-3.2-3B-Instruct ```
87 lines
3 KiB
Markdown
87 lines
3 KiB
Markdown
# Llama Stack Integration Tests
|
|
|
|
We use `pytest` for parameterizing and running tests. You can see all options with:
|
|
```bash
|
|
cd tests/integration
|
|
|
|
# this will show a long list of options, look for "Custom options:"
|
|
pytest --help
|
|
```
|
|
|
|
Here are the most important options:
|
|
- `--stack-config`: specify the stack config to use. You have three ways to point to a stack:
|
|
- a URL which points to a Llama Stack distribution server
|
|
- a template (e.g., `fireworks`, `together`) or a path to a run.yaml file
|
|
- a comma-separated list of api=provider pairs, e.g. `inference=fireworks,safety=llama-guard,agents=meta-reference`. This is most useful for testing a single API surface.
|
|
- `--env`: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.
|
|
|
|
Model parameters can be influenced by the following options:
|
|
- `--text-model`: comma-separated list of text models.
|
|
- `--vision-model`: comma-separated list of vision models.
|
|
- `--embedding-model`: comma-separated list of embedding models.
|
|
- `--safety-shield`: comma-separated list of safety shields.
|
|
- `--judge-model`: comma-separated list of judge models.
|
|
- `--embedding-dimension`: output dimensionality of the embedding model to use for testing. Default: 384
|
|
|
|
Each of these are comma-separated lists and can be used to generate multiple parameter combinations.
|
|
|
|
|
|
Experimental, under development, options:
|
|
- `--record-responses`: record new API responses instead of using cached ones
|
|
- `--report`: path where the test report should be written, e.g. --report=/path/to/report.md
|
|
|
|
|
|
## Examples
|
|
|
|
Run all text inference tests with the `together` distribution:
|
|
|
|
```bash
|
|
pytest -s -v tests/api/inference/test_text_inference.py \
|
|
--stack-config=together \
|
|
--text-model=meta-llama/Llama-3.1-8B-Instruct
|
|
```
|
|
|
|
Run all text inference tests with the `together` distribution and `meta-llama/Llama-3.1-8B-Instruct`:
|
|
|
|
```bash
|
|
pytest -s -v tests/api/inference/test_text_inference.py \
|
|
--stack-config=together \
|
|
--text-model=meta-llama/Llama-3.1-8B-Instruct
|
|
```
|
|
|
|
Running all inference tests for a number of models:
|
|
|
|
```bash
|
|
TEXT_MODELS=meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct
|
|
VISION_MODELS=meta-llama/Llama-3.2-11B-Vision-Instruct
|
|
EMBEDDING_MODELS=all-MiniLM-L6-v2
|
|
TOGETHER_API_KEY=...
|
|
|
|
pytest -s -v tests/api/inference/ \
|
|
--stack-config=together \
|
|
--text-model=$TEXT_MODELS \
|
|
--vision-model=$VISION_MODELS \
|
|
--embedding-model=$EMBEDDING_MODELS
|
|
```
|
|
|
|
Same thing but instead of using the distribution, use an adhoc stack with just one provider (`fireworks` for inference):
|
|
|
|
```bash
|
|
FIREWORKS_API_KEY=...
|
|
|
|
pytest -s -v tests/api/inference/ \
|
|
--stack-config=inference=fireworks \
|
|
--text-model=$TEXT_MODELS \
|
|
--vision-model=$VISION_MODELS \
|
|
--embedding-model=$EMBEDDING_MODELS
|
|
```
|
|
|
|
Running Vector IO tests for a number of embedding models:
|
|
|
|
```bash
|
|
EMBEDDING_MODELS=all-MiniLM-L6-v2
|
|
|
|
pytest -s -v tests/api/vector_io/ \
|
|
--stack-config=inference=sentence-transformers,vector_io=sqlite-vec \
|
|
--embedding-model=$EMBEDDING_MODELS
|
|
```
|