mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-04 20:14:13 +00:00
- Wrap model loading with asyncio.to_thread() to prevent blocking during model download/initialization - Wrap encoding operations with asyncio.to_thread() to run in background thread - Convert _load_sentence_transformer_model() to async method This ensures the async event loop remains responsive during embedding operations. Closes: #3332 Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
---|---|---|
.. | ||
__init__.py | ||
embedding_mixin.py | ||
inference_store.py | ||
litellm_openai_mixin.py | ||
model_registry.py | ||
openai_compat.py | ||
openai_mixin.py | ||
prompt_adapter.py |