mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-27 18:50:41 +00:00
# What does this PR do? Closes #1968. The asynchronous client in `VLLMInferenceAdapter` is now initialized directly before first use and not in `VLLMInferenceAdapter.initialize`. This prevents issues arising due to accessing an expired event loop from a completed `asyncio.run`. ## Test Plan Ran unit tests, including `test_remote_vllm.py`. Ran the code snippet mentioned in #1968. --------- Co-authored-by: Sébastien Han <seb@redhat.com> |
||
---|---|---|
.. | ||
apis | ||
cli | ||
distribution | ||
models | ||
providers | ||
strong_typing | ||
templates | ||
__init__.py | ||
env.py | ||
log.py | ||
schema_utils.py |