llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-18 07:18:53 +00:00

History

Ashwin Bharambe 4c9d944380 fix(perf): make batches tests finish 30x faster (#3834 ) In replay mode, inference is instantenous. We don't need to wait 15 seconds for the batch to be done. Fixing polling to do exp backoff makes things work super fast.		2025-10-17 09:16:44 +02:00
..
recordings	feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers (#3777 )	2025-10-10 16:21:44 -07:00
__init__.py	feat: add batches API with OpenAI compatibility (with inference replay) (#3162 )	2025-08-15 15:34:15 -07:00
conftest.py	fix(perf): make batches tests finish 30x faster (#3834 )	2025-10-17 09:16:44 +02:00
test_batches.py	feat: Add /v1/embeddings endpoint to batches API (#3384 )	2025-10-10 13:25:58 -07:00
test_batches_errors.py	feat: add batches API with OpenAI compatibility (with inference replay) (#3162 )	2025-08-15 15:34:15 -07:00
test_batches_idempotency.py	feat: Add optional idempotency support to batches API (#3171 )	2025-08-22 15:50:40 -07:00