phoenix-oss/llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-22 20:42:24 +00:00

History

Ashwin Bharambe e59c13f2b8 feat(tests): introduce inference record/replay to increase test reliability Implements a comprehensive recording and replay system for inference API calls that eliminates dependency on online inference providers during testing. The system treats inference as deterministic by recording real API responses and replaying them in subsequent test runs. Applies to OpenAI clients (which should cover many inference requests) as well as OpenAI AsyncClient. For storing, we use a hybrid system: Sqlite for fast lookups and JSON files for easy greppability / debuggability. As expected, tests become much much faster (more than 3x in just inference testing.) ```bash LLAMA_STACK_INFERENCE_MODE=record uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` ```bash LLAMA_STACK_INFERENCE_MODE=replay LLAMA_STACK_TEST_ID=<test_id> uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` - `LLAMA_STACK_INFERENCE_MODE`: `live` (default), `record`, or `replay` - `LLAMA_STACK_TEST_ID`: Specify recording directory (auto-generated if not set) - `LLAMA_STACK_RECORDING_DIR`: Storage location (default: ~/.llama/recordings/)		2025-07-29 12:16:20 -07:00
..
client-sdk/post_training	feat: Add nemo customizer (#1448 )	2025-03-25 11:01:10 -07:00
common	feat(responses): implement full multi-turn support (#2295 )	2025-06-02 15:35:49 -07:00
external	fix: adjust provider type used in external provider test (#2921 )	2025-07-28 10:14:16 -07:00
integration	feat(tests): introduce inference record/replay to increase test reliability	2025-07-29 12:16:20 -07:00
unit	fix(openai-compat): restrict developer/assistant/system/tool messages to text-only content (#2932 )	2025-07-28 10:36:34 -07:00
verifications	fix(ollama): Download remote image URLs for Ollama (#2551 )	2025-06-30 20:36:11 +05:30
__init__.py	refactor(test): introduce --stack-config and simplify options (#1404 )	2025-03-05 17:02:02 -08:00
Containerfile	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
README.md	docs: revamp testing documentation (#2155 )	2025-05-13 11:28:29 -07:00

README.md

Llama Stack Tests

Llama Stack has multiple layers of testing done to ensure continuous functionality and prevent regressions to the codebase.

Testing Type	Details
Unit	unit/README.md
Integration	integration/README.md
Verification	verifications/README.md