llama-stack-mirror/tests/integration
Sébastien Han bad12ee21f
fix: remove ruff N999 (#1388)
# What does this PR do?

Since we moved the move tests/client-sdk to tests/api in
https://github.com/meta-llama/llama-stack/pull/1376. The N999 rule is
not needed anymore. And furthermore in

abfbaf3c1b

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-07 11:14:04 -08:00
..
agents fix: remove ruff N999 (#1388) 2025-03-07 11:14:04 -08:00
datasetio test: revamp eval related integration tests (#1433) 2025-03-06 10:51:35 -08:00
eval test: revamp eval related integration tests (#1433) 2025-03-06 10:51:35 -08:00
fixtures feat(agent): plain function as client tool (#1479) 2025-03-07 11:10:07 -08:00
inference fix: remove ruff N999 (#1388) 2025-03-07 11:14:04 -08:00
post_training refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
safety fix: remove ruff N999 (#1388) 2025-03-07 11:14:04 -08:00
scoring test: revamp eval related integration tests (#1433) 2025-03-06 10:51:35 -08:00
test_cases chore: Reduce flakes in test_text_inference on smaller models (#1428) 2025-03-05 13:05:30 -08:00
tool_runtime refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
vector_io fix: remove ruff N999 (#1388) 2025-03-07 11:14:04 -08:00
__init__.py fix: remove ruff N999 (#1388) 2025-03-07 11:14:04 -08:00
conftest.py test: revamp eval related integration tests (#1433) 2025-03-06 10:51:35 -08:00
metadata.py refactor: tests/unittests -> tests/unit; tests/api -> tests/integration 2025-03-04 09:57:00 -08:00
README.md refactor(test): introduce --stack-config and simplify options (#1404) 2025-03-05 17:02:02 -08:00
report.py refactor(test): introduce --stack-config and simplify options (#1404) 2025-03-05 17:02:02 -08:00

Llama Stack Integration Tests

We use pytest for parameterizing and running tests. You can see all options with:

cd tests/integration

# this will show a long list of options, look for "Custom options:"
pytest --help

Here are the most important options:

  • --stack-config: specify the stack config to use. You have three ways to point to a stack:
    • a URL which points to a Llama Stack distribution server
    • a template (e.g., fireworks, together) or a path to a run.yaml file
    • a comma-separated list of api=provider pairs, e.g. inference=fireworks,safety=llama-guard,agents=meta-reference. This is most useful for testing a single API surface.
  • --env: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.

Model parameters can be influenced by the following options:

  • --text-model: comma-separated list of text models.
  • --vision-model: comma-separated list of vision models.
  • --embedding-model: comma-separated list of embedding models.
  • --safety-shield: comma-separated list of safety shields.
  • --judge-model: comma-separated list of judge models.
  • --embedding-dimension: output dimensionality of the embedding model to use for testing. Default: 384

Each of these are comma-separated lists and can be used to generate multiple parameter combinations.

Experimental, under development, options:

  • --record-responses: record new API responses instead of using cached ones
  • --report: path where the test report should be written, e.g. --report=/path/to/report.md

Examples

Run all text inference tests with the together distribution:

pytest -s -v tests/api/inference/test_text_inference.py \
   --stack-config=together \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Run all text inference tests with the together distribution and meta-llama/Llama-3.1-8B-Instruct:

pytest -s -v tests/api/inference/test_text_inference.py \
   --stack-config=together \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Running all inference tests for a number of models:

TEXT_MODELS=meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct
VISION_MODELS=meta-llama/Llama-3.2-11B-Vision-Instruct
EMBEDDING_MODELS=all-MiniLM-L6-v2
TOGETHER_API_KEY=...

pytest -s -v tests/api/inference/ \
   --stack-config=together \
   --text-model=$TEXT_MODELS \
   --vision-model=$VISION_MODELS \
   --embedding-model=$EMBEDDING_MODELS

Same thing but instead of using the distribution, use an adhoc stack with just one provider (fireworks for inference):

FIREWORKS_API_KEY=...

pytest -s -v tests/api/inference/ \
   --stack-config=inference=fireworks \
   --text-model=$TEXT_MODELS \
   --vision-model=$VISION_MODELS \
   --embedding-model=$EMBEDDING_MODELS

Running Vector IO tests for a number of embedding models:

EMBEDDING_MODELS=all-MiniLM-L6-v2

pytest -s -v tests/api/vector_io/ \
   --stack-config=inference=sentence-transformers,vector_io=sqlite-vec \
   --embedding-model=$EMBEDDING_MODELS