mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-04 12:07:34 +00:00
- Wrap model loading with asyncio.to_thread() to prevent blocking during model download/initialization - Wrap encoding operations with asyncio.to_thread() to run in background thread - Convert _load_sentence_transformer_model() to async method This ensures the async event loop remains responsive during embedding operations. Closes: #3332 Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
---|---|---|
.. | ||
bedrock | ||
common | ||
datasetio | ||
inference | ||
kvstore | ||
memory | ||
responses | ||
scoring | ||
sqlstore | ||
telemetry | ||
tools | ||
vector_io | ||
__init__.py | ||
pagination.py | ||
scheduler.py |