llama-stack-mirror/tests/integration
slekkala1 26d3d25c87
feat: Add moderations create api (#3020)
# What does this PR do?
This PR adds Open AI Compatible moderations api. Currently only
implementing for llama guard safety provider
Image support, expand to other safety providers and Deprecation of
run_shield will be next steps.


## Test Plan
Added 2 new tests for safe/ unsafe text prompt examples for the new open
ai compatible moderations api usage
`SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest
-v tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
(Had some issue with previous PR
https://github.com/meta-llama/llama-stack/pull/2994 while updating and
accidentally close it , reopened new one )
2025-08-06 13:51:23 -07:00
..
agents chore(misc): make tests and starter faster (#3042) 2025-08-05 14:55:05 -07:00
datasets fix: test_datasets HF scenario in CI (#2090) 2025-05-06 14:09:15 +02:00
eval fix: fix jobs api literal return type (#1757) 2025-03-21 14:04:21 -07:00
files chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
fixtures chore(misc): make tests and starter faster (#3042) 2025-08-05 14:55:05 -07:00
inference chore(misc): make tests and starter faster (#3042) 2025-08-05 14:55:05 -07:00
inspect chore: default to pytest asyncio-mode=auto (#2730) 2025-07-11 13:00:24 -07:00
post_training fix: Post Training Model change in Tests in order to make it less intensive (#2991) 2025-07-31 11:22:34 -07:00
providers fix(ci, nvidia): do not use module level pytest skip for now 2025-07-31 12:32:31 -07:00
recordings fix: telemetry fixes (inference and core telemetry) (#2733) 2025-08-06 13:37:40 -07:00
safety feat: Add moderations create api (#3020) 2025-08-06 13:51:23 -07:00
scoring feat(api): (1/n) datasets api clean up (#1573) 2025-03-17 16:55:45 -07:00
telemetry fix: telemetry fixes (inference and core telemetry) (#2733) 2025-08-06 13:37:40 -07:00
test_cases feat: switch to async completion in LiteLLM OpenAI mixin (#3029) 2025-08-03 12:08:56 -07:00
tool_runtime refactor: introduce common 'ResourceNotFoundError' exception (#3032) 2025-08-06 10:22:55 -07:00
tools fix: toolgroups unregister (#1704) 2025-03-19 13:43:51 -07:00
vector_io chore(misc): make tests and starter faster (#3042) 2025-08-05 14:55:05 -07:00
__init__.py fix: remove ruff N999 (#1388) 2025-03-07 11:14:04 -08:00
conftest.py fix(ci): allow tests to skip llama stack client instantiation (#3052) 2025-08-06 11:15:41 -07:00
README.md feat: consolidate most distros into "starter" (#2516) 2025-07-04 15:58:03 +02:00

Llama Stack Integration Tests

We use pytest for parameterizing and running tests. You can see all options with:

cd tests/integration

# this will show a long list of options, look for "Custom options:"
pytest --help

Here are the most important options:

  • --stack-config: specify the stack config to use. You have four ways to point to a stack:
    • server:<config> - automatically start a server with the given config (e.g., server:fireworks). This provides one-step testing by auto-starting the server if the port is available, or reusing an existing server if already running.
    • server:<config>:<port> - same as above but with a custom port (e.g., server:together:8322)
    • a URL which points to a Llama Stack distribution server
    • a template (e.g., starter) or a path to a run.yaml file
    • a comma-separated list of api=provider pairs, e.g. inference=fireworks,safety=llama-guard,agents=meta-reference. This is most useful for testing a single API surface.
  • --env: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.

Model parameters can be influenced by the following options:

  • --text-model: comma-separated list of text models.
  • --vision-model: comma-separated list of vision models.
  • --embedding-model: comma-separated list of embedding models.
  • --safety-shield: comma-separated list of safety shields.
  • --judge-model: comma-separated list of judge models.
  • --embedding-dimension: output dimensionality of the embedding model to use for testing. Default: 384

Each of these are comma-separated lists and can be used to generate multiple parameter combinations. Note that tests will be skipped if no model is specified.

Examples

Testing against a Server

Run all text inference tests by auto-starting a server with the fireworks config:

pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=server:fireworks \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Run tests with auto-server startup on a custom port:

pytest -s -v tests/integration/inference/ \
   --stack-config=server:together:8322 \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Run multiple test suites with auto-server (eliminates manual server management):

# Auto-start server and run all integration tests
export FIREWORKS_API_KEY=<your_key>

pytest -s -v tests/integration/inference/ tests/integration/safety/ tests/integration/agents/ \
   --stack-config=server:fireworks \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Testing with Library Client

Run all text inference tests with the starter distribution using the together provider:

ENABLE_TOGETHER=together pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=starter \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Run all text inference tests with the starter distribution using the together provider and meta-llama/Llama-3.1-8B-Instruct:

ENABLE_TOGETHER=together pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=starter \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Running all inference tests for a number of models using the together provider:

TEXT_MODELS=meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct
VISION_MODELS=meta-llama/Llama-3.2-11B-Vision-Instruct
EMBEDDING_MODELS=all-MiniLM-L6-v2
ENABLE_TOGETHER=together
export TOGETHER_API_KEY=<together_api_key>

pytest -s -v tests/integration/inference/ \
   --stack-config=together \
   --text-model=$TEXT_MODELS \
   --vision-model=$VISION_MODELS \
   --embedding-model=$EMBEDDING_MODELS

Same thing but instead of using the distribution, use an adhoc stack with just one provider (fireworks for inference):

export FIREWORKS_API_KEY=<fireworks_api_key>

pytest -s -v tests/integration/inference/ \
   --stack-config=inference=fireworks \
   --text-model=$TEXT_MODELS \
   --vision-model=$VISION_MODELS \
   --embedding-model=$EMBEDDING_MODELS

Running Vector IO tests for a number of embedding models:

EMBEDDING_MODELS=all-MiniLM-L6-v2

pytest -s -v tests/integration/vector_io/ \
   --stack-config=inference=sentence-transformers,vector_io=sqlite-vec \
   --embedding-model=$EMBEDDING_MODELS