llama-stack-mirror/llama_stack/providers
Derek Higgins 5bbca56cfc
fix: Make SentenceTransformer embedding operations non-blocking (#3335)
- Wrap model loading with asyncio.to_thread() to prevent blocking during
model download/initialization
- Wrap encoding operations with asyncio.to_thread() to run in background
thread
- Convert _load_sentence_transformer_model() to async method

This ensures the async event loop remains responsive during embedding
operations.

Closes: #3332

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-04 13:58:41 -04:00
..
inline refactor: use generic WeightedInMemoryAggregator for hybrid search in SQLiteVecIndex (#3303) 2025-09-02 10:38:35 -07:00
registry chore(python-deps): replace ibm_watson_machine_learning with ibm_watsonx_ai (#3302) 2025-09-03 11:33:35 +02:00
remote feat(tests): auto-merge all model list responses and unify recordings (#3320) 2025-09-03 11:33:03 -07:00
utils fix: Make SentenceTransformer embedding operations non-blocking (#3335) 2025-09-04 13:58:41 -04:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
datatypes.py feat: create unregister shield API endpoint in Llama Stack (#2853) 2025-08-05 07:33:46 -07:00