mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 18:00:36 +00:00
# What does this PR do?
Add unit tests for the inspect endpoint.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
$ ollama run llama3.2:3b-instruct-fp16 --keepalive=60m &
$ LLAMA_STACK_CONFIG=./llama_stack/templates/ollama/run.yaml uv run
pytest -v -s tests/integration/inspect/test_inspect.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207:
PytestDeprecationWarning: The configuration option
"asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the
fixture caching scope. Future versions of pytest-asyncio will default
the loop scope for asynchronous fixtures to function scope. Set the
default fixture loop scope explicitly in order to avoid unexpected
behavior in the future. Valid fixture loop scopes are: "function",
"class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================== test session starts
==============================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 --
/Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform':
'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4',
'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1',
'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0,
nbval-0.11.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items
tests/integration/inspect/test_inspect.py::TestInspect::test_health[txt=8B]
PASSED
tests/integration/inspect/test_inspect.py::TestInspect::test_version[txt=8B]
PASSED
========================================= 2 passed, 3 warnings in 2.26s
===================================
```
Signed-off-by: Sébastien Han <seb@redhat.com>
|
||
|---|---|---|
| .. | ||
| agents | ||
| datasetio | ||
| eval | ||
| fixtures | ||
| inference | ||
| inspect | ||
| post_training | ||
| safety | ||
| scoring | ||
| test_cases | ||
| tool_runtime | ||
| vector_io | ||
| __init__.py | ||
| conftest.py | ||
| metadata.py | ||
| README.md | ||
| report.py | ||
Llama Stack Integration Tests
We use pytest for parameterizing and running tests. You can see all options with:
cd tests/integration
# this will show a long list of options, look for "Custom options:"
pytest --help
Here are the most important options:
--stack-config: specify the stack config to use. You have three ways to point to a stack:- a URL which points to a Llama Stack distribution server
- a template (e.g.,
fireworks,together) or a path to a run.yaml file - a comma-separated list of api=provider pairs, e.g.
inference=fireworks,safety=llama-guard,agents=meta-reference. This is most useful for testing a single API surface.
--env: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.
Model parameters can be influenced by the following options:
--text-model: comma-separated list of text models.--vision-model: comma-separated list of vision models.--embedding-model: comma-separated list of embedding models.--safety-shield: comma-separated list of safety shields.--judge-model: comma-separated list of judge models.--embedding-dimension: output dimensionality of the embedding model to use for testing. Default: 384
Each of these are comma-separated lists and can be used to generate multiple parameter combinations.
Experimental, under development, options:
--record-responses: record new API responses instead of using cached ones--report: path where the test report should be written, e.g. --report=/path/to/report.md
Examples
Run all text inference tests with the together distribution:
pytest -s -v tests/api/inference/test_text_inference.py \
--stack-config=together \
--text-model=meta-llama/Llama-3.1-8B-Instruct
Run all text inference tests with the together distribution and meta-llama/Llama-3.1-8B-Instruct:
pytest -s -v tests/api/inference/test_text_inference.py \
--stack-config=together \
--text-model=meta-llama/Llama-3.1-8B-Instruct
Running all inference tests for a number of models:
TEXT_MODELS=meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct
VISION_MODELS=meta-llama/Llama-3.2-11B-Vision-Instruct
EMBEDDING_MODELS=all-MiniLM-L6-v2
TOGETHER_API_KEY=...
pytest -s -v tests/api/inference/ \
--stack-config=together \
--text-model=$TEXT_MODELS \
--vision-model=$VISION_MODELS \
--embedding-model=$EMBEDDING_MODELS
Same thing but instead of using the distribution, use an adhoc stack with just one provider (fireworks for inference):
FIREWORKS_API_KEY=...
pytest -s -v tests/api/inference/ \
--stack-config=inference=fireworks \
--text-model=$TEXT_MODELS \
--vision-model=$VISION_MODELS \
--embedding-model=$EMBEDDING_MODELS
Running Vector IO tests for a number of embedding models:
EMBEDDING_MODELS=all-MiniLM-L6-v2
pytest -s -v tests/api/vector_io/ \
--stack-config=inference=sentence-transformers,vector_io=sqlite-vec \
--embedding-model=$EMBEDDING_MODELS