feat(tests): implement test isolation for inference recordings (#3681)

Uses test_id in request hashes and test-scoped subdirectories to prevent cross-test contamination. Model list endpoints exclude test_id to enable merging recordings from different servers. Additionally, this PR adds a `record-if-missing` mode (which we will use instead of `record` which records everything) which is very useful. 🤖 Co-authored with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
2025-12-04 18:13:44 +00:00 · 2025-10-04 11:34:18 -07:00 · 2025-10-04 11:34:18 -07:00 · 045a0c1d57
commit 045a0c1d57
parent f176196fba
428 changed files with 85345 additions and 104330 deletions
--- a/tests/integration/recordings/README.md
+++ b/tests/integration/recordings/README.md
@ -31,18 +31,24 @@ These normalizations ensure that re-recording tests produces minimal git diffs,

 ## Usage

-### Recording mode
-Set `LLAMA_STACK_TEST_INFERENCE_MODE=record` to capture new responses:
-```bash
-LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/
-```
-
 ### Replay mode (default)
 Responses are replayed from recordings:
 ```bash
 LLAMA_STACK_TEST_INFERENCE_MODE=replay pytest tests/integration/
 ```

+### Record-if-missing mode (recommended for adding new tests)
+Records only when no recording exists, otherwise replays. Use this for iterative development:
+```bash
+LLAMA_STACK_TEST_INFERENCE_MODE=record-if-missing pytest tests/integration/
+```
+
+### Recording mode
+**Force-records all API interactions**, overwriting existing recordings. Use with caution:
+```bash
+LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/
+```
+
 ### Live mode
 Skip recordings entirely and use live APIs:
 ```bash