mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-24 08:47:26 +00:00
In replay mode, inference is instantenous. We don't need to wait 15 seconds for the batch to be done. Fixing polling to do exp backoff makes things work super fast. |
||
|---|---|---|
| .. | ||
| recordings | ||
| __init__.py | ||
| conftest.py | ||
| test_batches.py | ||
| test_batches_errors.py | ||
| test_batches_idempotency.py | ||