llama-stack-mirror/tests/integration/recordings
IAN MILLER 007efa6eb5
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to replace the Llama Stack's default embedding
model by nomic-embed-text-v1.5.

These are the key reasons why Llama Stack community decided to switch
from all-MiniLM-L6-v2 to nomic-embed-text-v1.5:
1. The training data for
[all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data)
includes a lot of data sets with various licensing terms, so it is
tricky to know when/whether it is appropriate to use this model for
commercial applications.
2. The model is not particularly competitive on major benchmarks. For
example, if you look at the [MTEB
Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click
on Miscellaneous/BEIR to see English information retrieval accuracy, you
see that the top of the leaderboard is dominated by enormous models but
also that there are many, many models of relatively modest size whith
much higher Retrieval scores. If you want to look closely at the data, I
recommend clicking "Download Table" because it is easier to browse that
way.

More discussion info can be founded
[here](https://github.com/llamastack/llama-stack/issues/2418)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2418 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. Run `./scripts/unit-tests.sh`
2. Integration tests via CI wokrflow

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-10-14 10:44:20 -04:00
..
responses refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183) 2025-10-14 10:44:20 -04:00
README.md feat(tests): implement test isolation for inference recordings (#3681) 2025-10-04 11:34:18 -07:00

Test Recording System

This directory contains recorded inference API responses used for deterministic testing without requiring live API access.

Structure

  • responses/ - JSON files containing request/response pairs for inference operations

Recording Format

Each JSON file contains:

  • request - The normalized request parameters (method, endpoint, body)
  • response - The response body (serialized from Pydantic models)

Normalization

To reduce noise in git diffs, the recording system automatically normalizes fields that vary between runs but don't affect test behavior:

OpenAI-style responses

  • id - Deterministic hash based on request: rec-{request_hash[:12]}
  • created - Normalized to epoch: 0

Ollama-style responses

  • created_at - Normalized to: "1970-01-01T00:00:00.000000Z"
  • total_duration - Normalized to: 0
  • load_duration - Normalized to: 0
  • prompt_eval_duration - Normalized to: 0
  • eval_duration - Normalized to: 0

These normalizations ensure that re-recording tests produces minimal git diffs, making it easier to review actual changes to test behavior.

Usage

Replay mode (default)

Responses are replayed from recordings:

LLAMA_STACK_TEST_INFERENCE_MODE=replay pytest tests/integration/

Records only when no recording exists, otherwise replays. Use this for iterative development:

LLAMA_STACK_TEST_INFERENCE_MODE=record-if-missing pytest tests/integration/

Recording mode

Force-records all API interactions, overwriting existing recordings. Use with caution:

LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/

Live mode

Skip recordings entirely and use live APIs:

LLAMA_STACK_TEST_INFERENCE_MODE=live pytest tests/integration/

Re-normalizing Existing Recordings

If you need to apply normalization to existing recordings (e.g., after updating the normalization logic):

python scripts/normalize_recordings.py

Use --dry-run to preview changes without modifying files.