llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Ben Browning d86a893ead fix: Swap to AsyncOpenAI client in remote vllm provider (#1459 ) # What does this PR do? This switches from an OpenAI client to the AsyncOpenAI client in the remote vllm provider. The main benefit of this is that instead of each client call being a blocking operation that was blocking our server event loop, the client calls are now async operations that do not block the event loop. The actual fix is quite simple and straightforward. Creating a reliable reproducer of this with a unit test that verifies we were blocking the event loop before and are not blocking it any longer was a bit harder. Some other inference providers have this same issue, so we may want to make that simple delayed http server a bit more generic and pull it into a common place as other inference providers get fixed. (Closes #1457) ## Test Plan I verified the unit tests and test_text_inference tests pass with this change like below: ``` python -m pytest -v tests/unit ``` ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v -s \ tests/integration/inference/test_text_inference.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-03-07 14:48:00 -05:00
..
anthropic	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
bedrock	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
cerebras	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
databricks	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
fireworks	feat(logging): implement category-based logging (#1362 )	2025-03-07 11:34:30 -08:00
gemini	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
groq	fix: register provider model name and HF alias in run.yaml (#1304 )	2025-02-27 16:39:23 -08:00
nvidia	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
ollama	feat(logging): implement category-based logging (#1362 )	2025-03-07 11:34:30 -08:00
openai	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
passthrough	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
runpod	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
sambanova	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
sample	build: format codebase imports using ruff linter (#1028 )	2025-02-13 10:06:21 -08:00
tgi	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
together	feat(logging): implement category-based logging (#1362 )	2025-03-07 11:34:30 -08:00
vllm	fix: Swap to AsyncOpenAI client in remote vllm provider (#1459 )	2025-03-07 14:48:00 -05:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00