mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-24 07:38:04 +00:00

History

Ben Browning f0d56316a0 Use VectorStoreContent vs InterleavedContent in vector store files This extracts the existing logic to convert chunks to VectorStoreContent objects into a reusable method and uses that when returning our list of Vector Store File contents. It also adds an xfail test for deleting vector store files, as that's not implemented yet but parking the implementation of that for now. Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-06-19 10:58:29 -04:00
..
agents	fix: enable test_responses_store (#2290 )	2025-05-27 15:37:28 -07:00
datasets	fix: test_datasets HF scenario in CI (#2090 )	2025-05-06 14:09:15 +02:00
eval	fix: fix jobs api literal return type (#1757 )	2025-03-21 14:04:21 -07:00
files	test: skip files integrations tests for library client (#2407 )	2025-06-05 13:42:10 -07:00
fixtures	chore: remove recordable mock (#2088 )	2025-05-05 10:08:55 -07:00
inference	feat: Add `suffix` to openai_completions (#2449 )	2025-06-13 16:06:06 -07:00
inspect	test: add inspect unit test (#1417 )	2025-03-10 15:36:18 -07:00
post_training	feat: add huggingface post_training impl (#2132 )	2025-05-16 14:41:28 -07:00
providers	feat: Add NVIDIA NeMo datastore (#1852 )	2025-04-28 09:41:59 -07:00
safety	fix: misc fixes for tests kill horrible warnings	2025-04-12 17:12:11 -07:00
scoring	feat(api): (1/n) datasets api clean up (#1573 )	2025-03-17 16:55:45 -07:00
telemetry	fix: skip failing tests (#2243 )	2025-05-24 07:31:08 -07:00
test_cases	feat: Add `suffix` to openai_completions (#2449 )	2025-06-13 16:06:06 -07:00
tool_runtime	fix: allow running vector tests with embedding dimension (#2467 )	2025-06-19 13:29:04 +05:30
tools	fix: toolgroups unregister (#1704 )	2025-03-19 13:43:51 -07:00
vector_io	Use VectorStoreContent vs InterleavedContent in vector store files	2025-06-19 10:58:29 -04:00
__init__.py	fix: remove ruff N999 (#1388 )	2025-03-07 11:14:04 -08:00
conftest.py	fix: allow running vector tests with embedding dimension (#2467 )	2025-06-19 13:29:04 +05:30
README.md	chore: remove pytest reports (#2156 )	2025-05-13 22:40:15 -07:00

README.md

Llama Stack Integration Tests

We use pytest for parameterizing and running tests. You can see all options with:

cd tests/integration

# this will show a long list of options, look for "Custom options:"
pytest --help

Here are the most important options:

--stack-config: specify the stack config to use. You have three ways to point to a stack:
- a URL which points to a Llama Stack distribution server
- a template (e.g., fireworks, together) or a path to a run.yaml file
- a comma-separated list of api=provider pairs, e.g. inference=fireworks,safety=llama-guard,agents=meta-reference. This is most useful for testing a single API surface.
--env: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.

Model parameters can be influenced by the following options:

--text-model: comma-separated list of text models.
--vision-model: comma-separated list of vision models.
--embedding-model: comma-separated list of embedding models.
--safety-shield: comma-separated list of safety shields.
--judge-model: comma-separated list of judge models.
--embedding-dimension: output dimensionality of the embedding model to use for testing. Default: 384

Each of these are comma-separated lists and can be used to generate multiple parameter combinations. Note that tests will be skipped if no model is specified.

Experimental, under development, options:

--record-responses: record new API responses instead of using cached ones

Examples

Run all text inference tests with the together distribution:

pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=together \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Run all text inference tests with the together distribution and meta-llama/Llama-3.1-8B-Instruct:

pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=together \
   --text-model=meta-llama/Llama-3.1-8B-Instruct

Running all inference tests for a number of models:

TEXT_MODELS=meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct
VISION_MODELS=meta-llama/Llama-3.2-11B-Vision-Instruct
EMBEDDING_MODELS=all-MiniLM-L6-v2
export TOGETHER_API_KEY=<together_api_key>

pytest -s -v tests/integration/inference/ \
   --stack-config=together \
   --text-model=$TEXT_MODELS \
   --vision-model=$VISION_MODELS \
   --embedding-model=$EMBEDDING_MODELS

Same thing but instead of using the distribution, use an adhoc stack with just one provider (fireworks for inference):

export FIREWORKS_API_KEY=<fireworks_api_key>

pytest -s -v tests/integration/inference/ \
   --stack-config=inference=fireworks \
   --text-model=$TEXT_MODELS \
   --vision-model=$VISION_MODELS \
   --embedding-model=$EMBEDDING_MODELS

Running Vector IO tests for a number of embedding models:

EMBEDDING_MODELS=all-MiniLM-L6-v2

pytest -s -v tests/integration/vector_io/ \
   --stack-config=inference=sentence-transformers,vector_io=sqlite-vec \
   --embedding-model=$EMBEDDING_MODELS