Summary:
Agent tests shouldn't need to run inference and tools calls repeatedly.
This PR introduces a way to record inference/tool calls and reuse them
in subsequent test runs, which makes the tests more reliable and saves
costs.
Test Plan:
Run when there's no recorded calls created (fails):
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B
```
Run when `--record-responses` to record calls:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses
```
Run without `--record-responses` again (succeeds):
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B
```