feat(tests): implement test isolation for inference recordings (#3681)

Uses test_id in request hashes and test-scoped subdirectories to prevent
cross-test contamination. Model list endpoints exclude test_id to enable
merging recordings from different servers.

Additionally, this PR adds a `record-if-missing` mode (which we will use
instead of `record` which records everything) which is very useful.

🤖 Co-authored with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Ashwin Bharambe 2025-10-04 11:34:18 -07:00 committed by GitHub
parent f176196fba
commit 045a0c1d57
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
428 changed files with 85345 additions and 104330 deletions

View file

@ -31,18 +31,24 @@ These normalizations ensure that re-recording tests produces minimal git diffs,
## Usage
### Recording mode
Set `LLAMA_STACK_TEST_INFERENCE_MODE=record` to capture new responses:
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/
```
### Replay mode (default)
Responses are replayed from recordings:
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay pytest tests/integration/
```
### Record-if-missing mode (recommended for adding new tests)
Records only when no recording exists, otherwise replays. Use this for iterative development:
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record-if-missing pytest tests/integration/
```
### Recording mode
**Force-records all API interactions**, overwriting existing recordings. Use with caution:
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record pytest tests/integration/
```
### Live mode
Skip recordings entirely and use live APIs:
```bash