fix: Propagate the runtime error message to user (#4150)

# What does this PR do? For Runtime Exception the error is not propagated to the user and can be opaque. Before fix: `ERROR - Error processing message: Error code: 500 - {'detail': 'Internal server error: An unexpected error occurred.'} ` After fix: `[ERROR] Error code: 404 - {'detail': "Model 'claude-sonnet-4-5-20250929' not found. Use 'client.models.list()' to list available Models."} ` (Ran into this few times, while working with OCI + LLAMAStack and Sabre: Agentic framework integrations with LLAMAStack) ## Test Plan CI
2025-12-03 01:48:05 +00:00 · 2025-11-14 13:14:49 -08:00 · 2025-11-14 13:14:49 -08:00 · f596f850bf
commit f596f850bf
parent eb545034ab
1 changed files with 3 additions and 0 deletions
--- a/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py
+++ b/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py
@ -16,6 +16,7 @@ from llama_stack_api import (
    ApprovalFilter,
    Inference,
    MCPListToolsTool,
+    ModelNotFoundError,
    OpenAIAssistantMessageParam,
    OpenAIChatCompletion,
    OpenAIChatCompletionChunk,
@ -323,6 +324,8 @@ class StreamingResponseOrchestrator:
            if last_completion_result and last_completion_result.finish_reason == "length":
                final_status = "incomplete"

+        except ModelNotFoundError:
+            raise
        except Exception as exc:  # noqa: BLE001
            self.final_messages = messages.copy()
            self.sequence_number += 1