llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Ben Browning 9c4074ed49 fix: Gracefully handle no choices in remote vLLM response (#1424 ) # What does this PR do? This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list. A specific example of the type of stack trace this fixes: ``` File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response choice = chunk.choices[0] ~~~~~~~~~~~~~^^^ IndexError: list index out of range ``` Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details. This is related to #1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling. ## Test Plan As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is `test_process_vllm_chat_completion_stream_response_no_choices` in the new `tests/unit/providers/inference/test_remote_vllm.py`. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for #1236 (builtin tool calling) and #1325 (vLLM <= v0.6.3). This required fixing the signature of `_process_vllm_chat_completion_stream_response` to accept the actual type of chunks it was getting passed - specifically changing from our openai_compat `OpenAICompatCompletionResponse` to `openai.types.chat.chat_completion_chunk.ChatCompletionChunk`. It was not actually getting passed `OpenAICompatCompletionResponse` objects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed. Run these new unit tests like this: ``` pytest tests/unit/providers/inference/test_remote_vllm.py ``` Additionally, I ensured the existing `test_text_inference.py` tests passed via: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v tests/integration/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" \ --vision-inference-model "" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-03-05 15:07:54 -05:00
..
anthropic	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
bedrock	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
cerebras	fix: register provider model name and HF alias in run.yaml (#1304 )	2025-02-27 16:39:23 -08:00
databricks	fix: resolve type hint issues and import dependencies (#1176 )	2025-02-25 11:06:47 -08:00
fireworks	feat: add a configurable category-based logger (#1352 )	2025-03-02 18:51:14 -08:00
gemini	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
groq	fix: register provider model name and HF alias in run.yaml (#1304 )	2025-02-27 16:39:23 -08:00
nvidia	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 )	2025-02-28 09:36:49 -08:00
ollama	feat: add a configurable category-based logger (#1352 )	2025-03-02 18:51:14 -08:00
openai	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
passthrough	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
runpod	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
sambanova	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
sample	build: format codebase imports using ruff linter (#1028 )	2025-02-13 10:06:21 -08:00
tgi	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
together	feat: add a configurable category-based logger (#1352 )	2025-03-02 18:51:14 -08:00
vllm	fix: Gracefully handle no choices in remote vLLM response (#1424 )	2025-03-05 15:07:54 -05:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00