llama-stack-mirror/llama_stack
Ashwin Bharambe 08b4a1deb3
feat(tests): introduce inference record/replay to increase test reliability (#2941)
Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.

For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.

As expected, tests become much much faster (more than 3x in just
inference testing.)

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)
2025-07-29 12:41:31 -07:00
..
apis feat: add base64 encoded PDF support for OpenAI Chat Completions (#2881) 2025-07-29 06:23:41 -04:00
cli fix: separate build and run provider types (#2917) 2025-07-25 12:39:26 -07:00
distribution feat(tests): introduce inference record/replay to increase test reliability (#2941) 2025-07-29 12:41:31 -07:00
models chore(api): add mypy coverage to chat_format (#2654) 2025-07-18 11:56:53 +02:00
providers fix: Update SFTConfig parameter to fix CI and Post Training Workflow (#2948) 2025-07-29 11:14:04 -07:00
strong_typing chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
templates feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919) 2025-07-28 10:16:02 -07:00
testing feat(tests): introduce inference record/replay to increase test reliability (#2941) 2025-07-29 12:41:31 -07:00
ui fix: random breakage in llama_stack/ui/package.json 2025-07-29 12:31:29 -07:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py fix: use logger for console telemetry (#2844) 2025-07-24 16:26:59 -04:00
schema_utils.py feat(auth): API access control (#2822) 2025-07-24 15:30:48 -07:00