feat(tests): introduce inference record/replay to increase test reliability (#2941)

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 10:10:36 +00:00

Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.

For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.

As expected, tests become much much faster (more than 3x in just
inference testing.)

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)

This commit is contained in:

Ashwin Bharambe

2025-07-29 12:41:31 -07:00

• committed by

GitHub

parent abf1d6a703

commit 08b4a1deb3

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

33 changed files with 9880 additions and 2 deletions

									
										2

llama_stack/distribution/routers/inference.py
									
										View file
										
				@ -79,11 +79,9 @@ class InferenceRouter(Inference):

				    async def initialize(self) -> None:

				        logger.debug("InferenceRouter.initialize")

				        pass

				    async def shutdown(self) -> None:

				        logger.debug("InferenceRouter.shutdown")

				        pass

				    async def register_model(

				        self,

Rows
Columns

feat(tests): introduce inference record/replay to increase test reliability (#2941)

2 llama_stack/distribution/routers/inference.py Unescape Escape View file

2

llama_stack/distribution/routers/inference.py

View file