llama-stack/llama_stack/providers/remote/inference
Ben Browning d86a893ead
fix: Swap to AsyncOpenAI client in remote vllm provider (#1459)
# What does this PR do?

This switches from an OpenAI client to the AsyncOpenAI client in the
remote vllm provider. The main benefit of this is that instead of each
client call being a blocking operation that was blocking our server
event loop, the client calls are now async operations that do not block
the event loop.

The actual fix is quite simple and straightforward. Creating a reliable
reproducer of this with a unit test that verifies we were blocking the
event loop before and are not blocking it any longer was a bit harder.
Some other inference providers have this same issue, so we may want to
make that simple delayed http server a bit more generic and pull it into
a common place as other inference providers get fixed.

(Closes #1457)

## Test Plan

I verified the unit tests and test_text_inference tests pass with this
change like below:

```
python -m pytest -v tests/unit
```

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v -s \
tests/integration/inference/test_text_inference.py \
--text-model "meta-llama/Llama-3.2-3B-Instruct"
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-03-07 14:48:00 -05:00
..
anthropic feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
bedrock fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
cerebras fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
databricks fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
fireworks feat(logging): implement category-based logging (#1362) 2025-03-07 11:34:30 -08:00
gemini feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
groq fix: register provider model name and HF alias in run.yaml (#1304) 2025-02-27 16:39:23 -08:00
nvidia fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
ollama feat(logging): implement category-based logging (#1362) 2025-03-07 11:34:30 -08:00
openai feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
passthrough fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
runpod fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
sambanova fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
sample build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00
tgi fix: solve ruff B008 warnings (#1444) 2025-03-06 16:48:35 -08:00
together feat(logging): implement category-based logging (#1362) 2025-03-07 11:34:30 -08:00
vllm fix: Swap to AsyncOpenAI client in remote vllm provider (#1459) 2025-03-07 14:48:00 -05:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00