mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 18:13:44 +00:00

History

grs da73f1a180 Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s Details Test Llama Stack Build / generate-matrix (push) Failing after 21s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 23s Details Test Llama Stack Build / build (push) Has been skipped Details Update ReadTheDocs / update-readthedocs (push) Failing after 20s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s Details fix: ensure assistant message is followed by tool call message as expected by openai (#3224 ) # What does this PR do? As described in #3134 a langchain example works against openai's responses impl, but not against llama stack's. This turned out to be due to the order of the inputs. The langchain example has the two function call outputs first, followed by each call result in turn. This seems to be valid as it is accepted by openai's impl. However in llama stack, these inputs are converted to chat completion inputs and the resulting order for that api is not accpeted by openai. This PR fixes the issue by ensuring that the converted chat completions inputs are in the expected order. Closes #3134 ## Test Plan Added unit and integration tests. Verified this fixes original issue as reported. --------- Signed-off-by: Gordon Sim <gsim@redhat.com>		2025-08-22 10:42:03 -07:00
..
agents	fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193 )	2025-08-18 17:02:24 -07:00
batches	feat: add batches API with OpenAI compatibility (with inference replay) (#3162 )	2025-08-15 15:34:15 -07:00
datasets	fix: test_datasets HF scenario in CI (#2090 )	2025-05-06 14:09:15 +02:00
eval	fix: fix jobs api literal return type (#1757 )	2025-03-21 14:04:21 -07:00
files	chore(files tests): update files integration tests and fix inline::localfs (#3195 )	2025-08-20 14:22:40 -04:00
fixtures	feat: Remove initialize() Method from LlamaStackAsLibrary (#2979 )	2025-08-21 15:59:04 -07:00
inference	fix: fix the error type in embedding test case (#3197 )	2025-08-21 16:19:51 -07:00
inspect	chore: default to pytest asyncio-mode=auto (#2730 )	2025-07-11 13:00:24 -07:00
non_ci/responses	fix: ensure assistant message is followed by tool call message as expected by openai (#3224 )	2025-08-22 10:42:03 -07:00
post_training	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 )	2025-08-20 07:15:35 -04:00
providers	fix(ci, nvidia): do not use module level pytest skip for now	2025-07-31 12:32:31 -07:00
recordings	fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193 )	2025-08-18 17:02:24 -07:00
safety	feat: Code scanner Provider impl for moderations api (#3100 )	2025-08-18 14:15:40 -07:00
scoring	feat(api): (1/n) datasets api clean up (#1573 )	2025-03-17 16:55:45 -07:00
telemetry	fix: telemetry fixes (inference and core telemetry) (#2733 )	2025-08-06 13:37:40 -07:00
test_cases	feat: switch to async completion in LiteLLM OpenAI mixin (#3029 )	2025-08-03 12:08:56 -07:00
tool_runtime	refactor: introduce common 'ResourceNotFoundError' exception (#3032 )	2025-08-06 10:22:55 -07:00
tools	fix: toolgroups unregister (#1704 )	2025-03-19 13:43:51 -07:00
vector_io	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 )	2025-08-20 07:15:35 -04:00
__init__.py	fix: remove ruff N999 (#1388 )	2025-03-07 11:14:04 -08:00
conftest.py	fix(tests): move llama stack client init back to fixture (#3071 )	2025-08-07 15:29:53 -07:00
README.md	test(recording): add a script to schedule recording workflow (#3170 )	2025-08-15 16:54:34 -07:00

README.md

Integration Testing Guide

Integration tests verify complete workflows across different providers using Llama Stack's record-replay system.

Quick Start

# Run all integration tests with existing recordings
LLAMA_STACK_TEST_INFERENCE_MODE=replay \
  LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings \
  uv run --group test \
  pytest -sv tests/integration/ --stack-config=starter

Configuration Options

You can see all options with:

cd tests/integration

# this will show a long list of options, look for "Custom options:"
pytest --help

Here are the most important options:

--stack-config: specify the stack config to use. You have four ways to point to a stack:
- server:<config> - automatically start a server with the given config (e.g., server:starter). This provides one-step testing by auto-starting the server if the port is available, or reusing an existing server if already running.
- server:<config>:<port> - same as above but with a custom port (e.g., server:starter:8322)
- a URL which points to a Llama Stack distribution server
- a distribution name (e.g., starter) or a path to a run.yaml file
- a comma-separated list of api=provider pairs, e.g. inference=ollama,safety=llama-guard,agents=meta-reference. This is most useful for testing a single API surface.
--env: set environment variables, e.g. --env KEY=value. this is a utility option to set environment variables required by various providers.

Model parameters can be influenced by the following options:

--text-model: comma-separated list of text models.
--vision-model: comma-separated list of vision models.
--embedding-model: comma-separated list of embedding models.
--safety-shield: comma-separated list of safety shields.
--judge-model: comma-separated list of judge models.
--embedding-dimension: output dimensionality of the embedding model to use for testing. Default: 384

Each of these are comma-separated lists and can be used to generate multiple parameter combinations. Note that tests will be skipped if no model is specified.

Examples

Testing against a Server

Run all text inference tests by auto-starting a server with the starter config:

OLLAMA_URL=http://localhost:11434 \
  pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=server:starter \
   --text-model=ollama/llama3.2:3b-instruct-fp16 \
   --embedding-model=sentence-transformers/all-MiniLM-L6-v2

Run tests with auto-server startup on a custom port:

OLLAMA_URL=http://localhost:11434 \
  pytest -s -v tests/integration/inference/ \
   --stack-config=server:starter:8322 \
   --text-model=ollama/llama3.2:3b-instruct-fp16 \
   --embedding-model=sentence-transformers/all-MiniLM-L6-v2

Testing with Library Client

The library client constructs the Stack "in-process" instead of using a server. This is useful during the iterative development process since you don't need to constantly start and stop servers.

You can do this by simply using --stack-config=starter instead of --stack-config=server:starter.

Using ad-hoc distributions

Sometimes, you may want to make up a distribution on the fly. This is useful for testing a single provider or a single API or a small combination of providers. You can do so by specifying a comma-separated list of api=provider pairs to the --stack-config option, e.g. inference=remote::ollama,safety=inline::llama-guard,agents=inline::meta-reference.

pytest -s -v tests/integration/inference/ \
   --stack-config=inference=remote::ollama,safety=inline::llama-guard,agents=inline::meta-reference \
   --text-model=$TEXT_MODELS \
   --vision-model=$VISION_MODELS \
   --embedding-model=$EMBEDDING_MODELS

Another example: Running Vector IO tests for embedding models:

pytest -s -v tests/integration/vector_io/ \
   --stack-config=inference=inline::sentence-transformers,vector_io=inline::sqlite-vec \
   --embedding-model=sentence-transformers/all-MiniLM-L6-v2

Recording Modes

The testing system supports three modes controlled by environment variables:

LIVE Mode (Default)

Tests make real API calls:

LLAMA_STACK_TEST_INFERENCE_MODE=live pytest tests/integration/

RECORD Mode

Captures API interactions for later replay:

LLAMA_STACK_TEST_INFERENCE_MODE=record \
LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings \
pytest tests/integration/inference/test_new_feature.py

REPLAY Mode

Uses cached responses instead of making API calls:

LLAMA_STACK_TEST_INFERENCE_MODE=replay \
LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings \
pytest tests/integration/

Note that right now you must specify the recording directory. This is because different tests use different recording directories and we don't (yet) have a fool-proof way to map a test to a recording directory. We are working on this.

Managing Recordings

Viewing Recordings

# See what's recorded
sqlite3 recordings/index.sqlite "SELECT endpoint, model, timestamp FROM recordings;"

# Inspect specific response
cat recordings/responses/abc123.json | jq '.'

Re-recording Tests

Remote Re-recording (Recommended)

Use the automated workflow script for easier re-recording:

./scripts/github/schedule-record-workflow.sh --test-subdirs "inference,agents"

See the main testing guide for full details.

Local Re-recording

# Re-record specific tests
LLAMA_STACK_TEST_INFERENCE_MODE=record \
LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings \
pytest -s -v --stack-config=server:starter tests/integration/inference/test_modified.py

Note that when re-recording tests, you must use a Stack pointing to a server (i.e., server:starter). This subtlety exists because the set of tests run in server are a superset of the set of tests run in the library client.

Writing Tests

Basic Test Pattern

def test_basic_completion(llama_stack_client, text_model_id):
    response = llama_stack_client.inference.completion(
        model_id=text_model_id,
        content=CompletionMessage(role="user", content="Hello"),
    )

    # Test structure, not AI output quality
    assert response.completion_message is not None
    assert isinstance(response.completion_message.content, str)
    assert len(response.completion_message.content) > 0

Provider-Specific Tests

def test_asymmetric_embeddings(llama_stack_client, embedding_model_id):
    if embedding_model_id not in MODELS_SUPPORTING_TASK_TYPE:
        pytest.skip(f"Model {embedding_model_id} doesn't support task types")

    query_response = llama_stack_client.inference.embeddings(
        model_id=embedding_model_id,
        contents=["What is machine learning?"],
        task_type="query",
    )

    assert query_response.embeddings is not None