llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Ashwin Bharambe 08b4a1deb3 feat(tests): introduce inference record/replay to increase test reliability (#2941 ) Implements a comprehensive recording and replay system for inference API calls that eliminates dependency on online inference providers during testing. The system treats inference as deterministic by recording real API responses and replaying them in subsequent test runs. Applies to OpenAI clients (which should cover many inference requests) as well as Ollama AsyncClient. For storing, we use a hybrid system: Sqlite for fast lookups and JSON files for easy greppability / debuggability. As expected, tests become much much faster (more than 3x in just inference testing.) ```bash LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \ uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` ```bash LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \ uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` - `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or `replay` - `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified for record or replay modes)		2025-07-29 12:41:31 -07:00
..
recordings	feat(tests): introduce inference record/replay to increase test reliability (#2941 )	2025-07-29 12:41:31 -07:00
__init__.py	fix: remove ruff N999 (#1388 )	2025-03-07 11:14:04 -08:00
dog.png	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
test_batch_inference.py	feat: add batch inference API to llama stack inference (#1945 )	2025-04-12 11:41:12 -07:00
test_embedding.py	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
test_openai_completion.py	feat: add base64 encoded PDF support for OpenAI Chat Completions (#2881 )	2025-07-29 06:23:41 -04:00
test_openai_embeddings.py	chore: Add OpenAI compatibility for Ollama embeddings (#2440 )	2025-06-13 14:28:51 -04:00
test_text_inference.py	fix: llama4 tool use prompt fix (#2103 )	2025-05-06 22:18:31 -07:00
test_vision_inference.py	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
vision_test_1.jpg	feat: introduce llama4 support (#1877 )	2025-04-05 11:53:35 -07:00
vision_test_2.jpg	feat: introduce llama4 support (#1877 )	2025-04-05 11:53:35 -07:00
vision_test_3.jpg	feat: introduce llama4 support (#1877 )	2025-04-05 11:53:35 -07:00