Continue to build on top of
https://github.com/meta-llama/llama-stack/pull/2941
## Test Plan
Run server with `LLAMA_STACK_TEST_INFERENCE_MODE=record` and then run
the integration tests with `--stack-config=server:starter`. Then restart
the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay` and re-run the
tests. Verify that no request hit Ollama at any point.