feat(responses)!: implement support for OpenAI compatible prompts in Responses API (#3965)

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for providing actual implementation of OpenAI
compatible prompts in Responses API. This is the follow up PR with
actual implementation after introducing #3942

The need of this functionality was initiated in #3514.

> Note, https://github.com/llamastack/llama-stack/pull/3514 is divided
on three separate PRs. Current PR is the third of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3321

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing, CI workflow with added unit tests

Comprehensive manual testing with new implementation:

**Test Prompts with Images with text on them in Responses API:**

I used this image for testing purposes: [iphone 17
image](https://github.com/user-attachments/assets/9e2ee821-e394-4bbd-b1c8-d48a3fa315de)

1. Upload an image:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/iphone.jpeg" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-d6d375f238e14f21952cc40246bc8504","bytes":556241,"created_at":1761750049,"expires_at":1793286049,"filename":"iphone.jpeg","purpose":"assistants"}%`

2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.",
    "variables": ["product_name", "description", "product_photo"]
  }'
```

`{"prompt":"You are a product analysis expert. Analyze the following
product:\n\nProduct Name: {{product_name}}\nDescription:
{{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed
analysis including quality assessment, target audience, and pricing
recommendations.","version":1,"prompt_id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":["product_name","description","product_photo"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please analyze this product",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62",
      "version": "1",
      "variables": {
        "product_name": {
          "type": "input_text",
          "text": "iPhone 17 Pro Max"
        },
         "product_photo": {
          "type": "input_image",
          "file_id": "file-d6d375f238e14f21952cc40246bc8504",
          "detail": "high"
        }
      }
    }
  }'
```


`{"created_at":1761750427,"error":null,"id":"resp_f897f914-e3b8-4783-8223-3ed0d32fcbc6","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"###
Product Analysis: iPhone 17 Pro Max\n\n**Quality Assessment:**\n\n-
**Display & Design:**\n - The 6.9-inch display is large, ideal for
streaming and productivity.\n - Anti-reflective technology and 120Hz
refresh rate enhance viewing experience, providing smoother visuals and
reducing glare.\n - Titanium frame suggests a premium build, offering
durability and a sleek appearance.\n\n- **Performance:**\n - The Apple
A19 Pro chip promises significant performance improvements, likely
leading to faster processing and efficient multitasking.\n - 12GB RAM is
substantial for a smartphone, ensuring smooth operation for demanding
apps and games.\n\n- **Camera System:**\n - The triple 48MP camera setup
(wide, ultra-wide, telephoto) is designed for versatile photography
needs, capturing high-resolution photos and videos.\n - The 24MP front
camera will appeal to selfie enthusiasts and content creators needing
quality front-facing shots.\n\n- **Connectivity:**\n - Wi-Fi 7 support
indicates future-proof wireless capabilities, providing faster and more
reliable internet connectivity.\n\n**Target Audience:**\n\n- **Tech
Enthusiasts:** Individuals interested in cutting-edge technology and
performance.\n- **Content Creators:** Users who need a robust camera
system for photo and video production.\n- **Luxury Consumers:** Those
who prefer premium materials and top-of-the-line specs.\n-
**Professionals:** Users who require efficient multitasking and
productivity features.\n\n**Pricing Recommendations:**\n\n- Given the
premium specifications, a higher price point is expected. Consider
pricing competitively within the high-end smartphone market while
justifying cost through unique features like the titanium frame and
advanced connectivity options.\n- Positioning around the $1,200 to
$1,500 range would align with expectations for top-tier devices,
catering to its target audience while ensuring
profitability.\n\nOverall, the iPhone 17 Pro Max showcases a blend of
innovative features and premium design, aimed at users seeking high
performance and superior
aesthetics.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_66f4d844-4d9e-4102-80fc-eb75b34b6dbd","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":{"product_name":{"text":"iPhone
17 Pro
Max","type":"input_text"},"product_photo":{"detail":"high","type":"input_image","file_id":"file-d6d375f238e14f21952cc40246bc8504","image_url":null}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":830,"output_tokens":394,"total_tokens":1224,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test Prompts with PDF files in Responses API:**

I used this PDF file for testing purposes:
[invoicesample.pdf](https://github.com/user-attachments/files/22958943/invoicesample.pdf)

1. Upload PDF:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/invoicesample.pdf" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","bytes":149568,"created_at":1761750730,"expires_at":1793286730,"filename":"invoicesample.pdf","purpose":"assistants"}%`


2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis",
    "variables": ["invoice_doc"]
  }'
```

`{"prompt":"You are an accounting and financial analysis expert. Analyze
the following invoice document:\n\nInvoice Document:
{{invoice_doc}}\n\nProvide a comprehensive
analysis","version":1,"prompt_id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":["invoice_doc"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please provide a detailed analysis of this invoice",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc",
      "version": "1",
      "variables": {
        "invoice_doc": {
          "type": "input_file",
          "file_id": "file-7fbb1043a4bb468cab60ffe4b8631d8e",
          "filename": "invoicesample.pdf"
        }
      }
    }
  }'
```


`{"created_at":1761750881,"error":null,"id":"resp_da866913-db06-4702-8000-174daed9dbbb","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"Here's
a detailed analysis of the invoice provided:\n\n### Seller
Information\n- **Business Name:** The invoice features a logo with
\"Sunny Farm\" indicating the business identity.\n- **Address:** 123
Somewhere St, Melbourne VIC 3000\n- **Contact Information:** Phone
number (03) 1234 5678\n\n### Buyer Information\n- **Name:** Denny
Gunawan\n- **Address:** 221 Queen St, Melbourne VIC 3000\n\n###
Transaction Details\n- **Invoice Number:** #20130304\n- **Date of
Transaction:** Not explicitly mentioned, likely inferred from the
invoice number or needs clarification.\n\n### Items Purchased\n1.
**Apple**\n - Price: $5.00/kg\n - Quantity: 1 kg\n - Subtotal:
$5.00\n\n2. **Orange**\n - Price: $1.99/kg\n - Quantity: 2 kg\n -
Subtotal: $3.98\n\n3. **Watermelon**\n - Price: $1.69/kg\n - Quantity: 3
kg\n - Subtotal: $5.07\n\n4. **Mango**\n - Price: $9.56/kg\n - Quantity:
2 kg\n - Subtotal: $19.12\n\n5. **Peach**\n - Price: $2.99/kg\n -
Quantity: 1 kg\n - Subtotal: $2.99\n\n### Financial Summary\n-
**Subtotal for Items:** $36.00\n- **GST (Goods and Services Tax):** 10%
of $36.00, which amounts to $3.60\n- **Total Amount Due:** $39.60\n\n###
Notes\n- The invoice includes a placeholder text: \"Lorem ipsum dolor
sit amet...\" which is typically used as filler text. This might
indicate a section intended for terms, conditions, or additional notes
that haven’t been completed.\n\n### Visual and Design Elements\n- The
invoice uses a simple and clear layout, featuring the business logo
prominently and stating essential information such as contact and
transaction details in a structured manner.\n- There is a \"Thank You\"
note at the bottom, which adds a professional and courteous
touch.\n\n### Considerations\n- Ensure the date of the transaction is
clear if there are any future references needed.\n- Replace filler text
with relevant terms and conditions or any special instructions
pertaining to the transaction.\n\nThis invoice appears standard,
representing a small business transaction with clearly itemized products
and applicable
taxes.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_39f3b39e-4684-4444-8e4d-e7395f88c9dc","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":{"invoice_doc":{"type":"input_file","file_data":null,"file_id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","file_url":null,"filename":"invoicesample.pdf"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":529,"output_tokens":513,"total_tokens":1042,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test simple text Prompt in Responses API:**

1. Create prompt:

```
 curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":["name","company","role","tone"],"is_default":false}%`

2. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "What is the capital of Ireland?",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef",
      "version": "1",
      "variables": {
        "name": {
          "type": "input_text",
          "text": "Alice"
        },
        "company": {
          "type": "input_text",
          "text": "Dummy Company"
        },
        "role": {
          "type": "input_text",
          "text": "Geography expert"
        },
        "tone": {
          "type": "input_text",
          "text": "professional and helpful"
        }
      }
    }
  }'

```


`{"created_at":1761751097,"error":null,"id":"resp_1b037b95-d9ae-4ad0-8e76-d953897ecaef","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"The
capital of Ireland is
Dublin.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_8e7c72b6-2aa2-4da6-8e57-da4e12fa3ce2","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":{"name":{"text":"Alice","type":"input_text"},"company":{"text":"Dummy
Company","type":"input_text"},"role":{"text":"Geography
expert","type":"input_text"},"tone":{"text":"professional and
helpful","type":"input_text"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":47,"output_tokens":7,"total_tokens":54,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`
This commit is contained in:
Ian Miller 2025-11-19 19:48:11 +00:00 committed by GitHub
parent 8852666982
commit 0757d5a917
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 770 additions and 17 deletions

View file

@ -27,8 +27,10 @@ async def get_provider_impl(
deps[Api.tool_runtime],
deps[Api.tool_groups],
deps[Api.conversations],
policy,
deps[Api.prompts],
deps[Api.files],
telemetry_enabled,
policy,
)
await impl.initialize()
return impl

View file

@ -12,6 +12,7 @@ from llama_stack.providers.utils.responses.responses_store import ResponsesStore
from llama_stack_api import (
Agents,
Conversations,
Files,
Inference,
ListOpenAIResponseInputItem,
ListOpenAIResponseObject,
@ -22,6 +23,7 @@ from llama_stack_api import (
OpenAIResponsePrompt,
OpenAIResponseText,
Order,
Prompts,
ResponseGuardrail,
Safety,
ToolGroups,
@ -45,6 +47,8 @@ class MetaReferenceAgentsImpl(Agents):
tool_runtime_api: ToolRuntime,
tool_groups_api: ToolGroups,
conversations_api: Conversations,
prompts_api: Prompts,
files_api: Files,
policy: list[AccessRule],
telemetry_enabled: bool = False,
):
@ -56,7 +60,8 @@ class MetaReferenceAgentsImpl(Agents):
self.tool_groups_api = tool_groups_api
self.conversations_api = conversations_api
self.telemetry_enabled = telemetry_enabled
self.prompts_api = prompts_api
self.files_api = files_api
self.in_memory_store = InmemoryKVStoreImpl()
self.openai_responses_impl: OpenAIResponsesImpl | None = None
self.policy = policy
@ -73,6 +78,8 @@ class MetaReferenceAgentsImpl(Agents):
vector_io_api=self.vector_io_api,
safety_api=self.safety_api,
conversations_api=self.conversations_api,
prompts_api=self.prompts_api,
files_api=self.files_api,
)
async def shutdown(self) -> None:

View file

@ -4,6 +4,7 @@
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
import re
import time
import uuid
from collections.abc import AsyncIterator
@ -18,13 +19,17 @@ from llama_stack.providers.utils.responses.responses_store import (
from llama_stack_api import (
ConversationItem,
Conversations,
Files,
Inference,
InvalidConversationIdError,
ListOpenAIResponseInputItem,
ListOpenAIResponseObject,
OpenAIChatCompletionContentPartParam,
OpenAIDeleteResponseObject,
OpenAIMessageParam,
OpenAIResponseInput,
OpenAIResponseInputMessageContentFile,
OpenAIResponseInputMessageContentImage,
OpenAIResponseInputMessageContentText,
OpenAIResponseInputTool,
OpenAIResponseMessage,
@ -34,7 +39,9 @@ from llama_stack_api import (
OpenAIResponseText,
OpenAIResponseTextFormat,
OpenAISystemMessageParam,
OpenAIUserMessageParam,
Order,
Prompts,
ResponseGuardrailSpec,
Safety,
ToolGroups,
@ -46,6 +53,7 @@ from .streaming import StreamingResponseOrchestrator
from .tool_executor import ToolExecutor
from .types import ChatCompletionContext, ToolContext
from .utils import (
convert_response_content_to_chat_content,
convert_response_input_to_chat_messages,
convert_response_text_to_chat_response_format,
extract_guardrail_ids,
@ -69,6 +77,8 @@ class OpenAIResponsesImpl:
vector_io_api: VectorIO, # VectorIO
safety_api: Safety | None,
conversations_api: Conversations,
prompts_api: Prompts,
files_api: Files,
):
self.inference_api = inference_api
self.tool_groups_api = tool_groups_api
@ -82,6 +92,8 @@ class OpenAIResponsesImpl:
tool_runtime_api=tool_runtime_api,
vector_io_api=vector_io_api,
)
self.prompts_api = prompts_api
self.files_api = files_api
async def _prepend_previous_response(
self,
@ -122,11 +134,13 @@ class OpenAIResponsesImpl:
# Use stored messages directly and convert only new input
message_adapter = TypeAdapter(list[OpenAIMessageParam])
messages = message_adapter.validate_python(previous_response.messages)
new_messages = await convert_response_input_to_chat_messages(input, previous_messages=messages)
new_messages = await convert_response_input_to_chat_messages(
input, previous_messages=messages, files_api=self.files_api
)
messages.extend(new_messages)
else:
# Backward compatibility: reconstruct from inputs
messages = await convert_response_input_to_chat_messages(all_input)
messages = await convert_response_input_to_chat_messages(all_input, files_api=self.files_api)
tool_context.recover_tools_from_previous_response(previous_response)
elif conversation is not None:
@ -138,7 +152,7 @@ class OpenAIResponsesImpl:
all_input = input
if not conversation_items.data:
# First turn - just convert the new input
messages = await convert_response_input_to_chat_messages(input)
messages = await convert_response_input_to_chat_messages(input, files_api=self.files_api)
else:
if not stored_messages:
all_input = conversation_items.data
@ -154,14 +168,82 @@ class OpenAIResponsesImpl:
all_input = input
messages = stored_messages or []
new_messages = await convert_response_input_to_chat_messages(all_input, previous_messages=messages)
new_messages = await convert_response_input_to_chat_messages(
all_input, previous_messages=messages, files_api=self.files_api
)
messages.extend(new_messages)
else:
all_input = input
messages = await convert_response_input_to_chat_messages(all_input)
messages = await convert_response_input_to_chat_messages(all_input, files_api=self.files_api)
return all_input, messages, tool_context
async def _prepend_prompt(
self,
messages: list[OpenAIMessageParam],
openai_response_prompt: OpenAIResponsePrompt | None,
) -> None:
"""Prepend prompt template to messages, resolving text/image/file variables.
:param messages: List of OpenAIMessageParam objects
:param openai_response_prompt: (Optional) OpenAIResponsePrompt object with variables
:returns: string of utf-8 characters
"""
if not openai_response_prompt or not openai_response_prompt.id:
return
prompt_version = int(openai_response_prompt.version) if openai_response_prompt.version else None
cur_prompt = await self.prompts_api.get_prompt(openai_response_prompt.id, prompt_version)
if not cur_prompt or not cur_prompt.prompt:
return
cur_prompt_text = cur_prompt.prompt
cur_prompt_variables = cur_prompt.variables
if not openai_response_prompt.variables:
messages.insert(0, OpenAISystemMessageParam(content=cur_prompt_text))
return
# Validate that all provided variables exist in the prompt
for name in openai_response_prompt.variables.keys():
if name not in cur_prompt_variables:
raise ValueError(f"Variable {name} not found in prompt {openai_response_prompt.id}")
# Separate text and media variables
text_substitutions = {}
media_content_parts: list[OpenAIChatCompletionContentPartParam] = []
for name, value in openai_response_prompt.variables.items():
# Text variable found
if isinstance(value, OpenAIResponseInputMessageContentText):
text_substitutions[name] = value.text
# Media variable found
elif isinstance(value, OpenAIResponseInputMessageContentImage | OpenAIResponseInputMessageContentFile):
converted_parts = await convert_response_content_to_chat_content([value], files_api=self.files_api)
if isinstance(converted_parts, list):
media_content_parts.extend(converted_parts)
# Eg: {{product_photo}} becomes "[Image: product_photo]"
# This gives the model textual context about what media exists in the prompt
var_type = value.type.replace("input_", "").replace("_", " ").title()
text_substitutions[name] = f"[{var_type}: {name}]"
def replace_variable(match: re.Match[str]) -> str:
var_name = match.group(1).strip()
return str(text_substitutions.get(var_name, match.group(0)))
pattern = r"\{\{\s*(\w+)\s*\}\}"
processed_prompt_text = re.sub(pattern, replace_variable, cur_prompt_text)
# Insert system message with resolved text
messages.insert(0, OpenAISystemMessageParam(content=processed_prompt_text))
# If we have media, create a new user message because allows to ingest images and files
if media_content_parts:
messages.append(OpenAIUserMessageParam(content=media_content_parts))
async def get_openai_response(
self,
response_id: str,
@ -297,6 +379,7 @@ class OpenAIResponsesImpl:
input=input,
conversation=conversation,
model=model,
prompt=prompt,
instructions=instructions,
previous_response_id=previous_response_id,
store=store,
@ -350,6 +433,7 @@ class OpenAIResponsesImpl:
instructions: str | None = None,
previous_response_id: str | None = None,
conversation: str | None = None,
prompt: OpenAIResponsePrompt | None = None,
store: bool | None = True,
temperature: float | None = None,
text: OpenAIResponseText | None = None,
@ -372,6 +456,9 @@ class OpenAIResponsesImpl:
if instructions:
messages.insert(0, OpenAISystemMessageParam(content=instructions))
# Prepend reusable prompt (if provided)
await self._prepend_prompt(messages, prompt)
# Structured outputs
response_format = await convert_response_text_to_chat_response_format(text)
@ -394,6 +481,7 @@ class OpenAIResponsesImpl:
ctx=ctx,
response_id=response_id,
created_at=created_at,
prompt=prompt,
text=text,
max_infer_iters=max_infer_iters,
parallel_tool_calls=parallel_tool_calls,

View file

@ -5,11 +5,14 @@
# the root directory of this source tree.
import asyncio
import base64
import mimetypes
import re
import uuid
from collections.abc import Sequence
from llama_stack_api import (
Files,
OpenAIAssistantMessageParam,
OpenAIChatCompletionContentPartImageParam,
OpenAIChatCompletionContentPartParam,
@ -18,6 +21,8 @@ from llama_stack_api import (
OpenAIChatCompletionToolCallFunction,
OpenAIChoice,
OpenAIDeveloperMessageParam,
OpenAIFile,
OpenAIFileFile,
OpenAIImageURL,
OpenAIJSONSchema,
OpenAIMessageParam,
@ -29,6 +34,7 @@ from llama_stack_api import (
OpenAIResponseInput,
OpenAIResponseInputFunctionToolCallOutput,
OpenAIResponseInputMessageContent,
OpenAIResponseInputMessageContentFile,
OpenAIResponseInputMessageContentImage,
OpenAIResponseInputMessageContentText,
OpenAIResponseInputTool,
@ -37,9 +43,11 @@ from llama_stack_api import (
OpenAIResponseMessage,
OpenAIResponseOutputMessageContent,
OpenAIResponseOutputMessageContentOutputText,
OpenAIResponseOutputMessageFileSearchToolCall,
OpenAIResponseOutputMessageFunctionToolCall,
OpenAIResponseOutputMessageMCPCall,
OpenAIResponseOutputMessageMCPListTools,
OpenAIResponseOutputMessageWebSearchToolCall,
OpenAIResponseText,
OpenAISystemMessageParam,
OpenAIToolMessageParam,
@ -49,6 +57,46 @@ from llama_stack_api import (
)
async def extract_bytes_from_file(file_id: str, files_api: Files) -> bytes:
"""
Extract raw bytes from file using the Files API.
:param file_id: The file identifier (e.g., "file-abc123")
:param files_api: Files API instance
:returns: Raw file content as bytes
:raises: ValueError if file cannot be retrieved
"""
try:
response = await files_api.openai_retrieve_file_content(file_id)
return bytes(response.body)
except Exception as e:
raise ValueError(f"Failed to retrieve file content for file_id '{file_id}': {str(e)}") from e
def generate_base64_ascii_text_from_bytes(raw_bytes: bytes) -> str:
"""
Converts raw binary bytes into a safe ASCII text representation for URLs
:param raw_bytes: the actual bytes that represents file content
:returns: string of utf-8 characters
"""
return base64.b64encode(raw_bytes).decode("utf-8")
def construct_data_url(ascii_text: str, mime_type: str | None) -> str:
"""
Construct data url with decoded data inside
:param ascii_text: ASCII content
:param mime_type: MIME type of file
:returns: data url string (eg. data:image/png,base64,%3Ch1%3EHello%2C%20World%21%3C%2Fh1%3E)
"""
if not mime_type:
mime_type = "application/octet-stream"
return f"data:{mime_type};base64,{ascii_text}"
async def convert_chat_choice_to_response_message(
choice: OpenAIChoice,
citation_files: dict[str, str] | None = None,
@ -78,11 +126,15 @@ async def convert_chat_choice_to_response_message(
async def convert_response_content_to_chat_content(
content: str | Sequence[OpenAIResponseInputMessageContent | OpenAIResponseOutputMessageContent],
files_api: Files | None,
) -> str | list[OpenAIChatCompletionContentPartParam]:
"""
Convert the content parts from an OpenAI Response API request into OpenAI Chat Completion content parts.
The content schemas of each API look similar, but are not exactly the same.
:param content: The content to convert
:param files_api: Files API for resolving file_id to raw file content (required if content contains files/images)
"""
if isinstance(content, str):
return content
@ -95,9 +147,68 @@ async def convert_response_content_to_chat_content(
elif isinstance(content_part, OpenAIResponseOutputMessageContentOutputText):
converted_parts.append(OpenAIChatCompletionContentPartTextParam(text=content_part.text))
elif isinstance(content_part, OpenAIResponseInputMessageContentImage):
detail = content_part.detail
image_mime_type = None
if content_part.image_url:
image_url = OpenAIImageURL(url=content_part.image_url, detail=content_part.detail)
image_url = OpenAIImageURL(url=content_part.image_url, detail=detail)
converted_parts.append(OpenAIChatCompletionContentPartImageParam(image_url=image_url))
elif content_part.file_id:
if files_api is None:
raise ValueError("file_ids are not supported by this implementation of the Stack")
image_file_response = await files_api.openai_retrieve_file(content_part.file_id)
if image_file_response.filename:
image_mime_type, _ = mimetypes.guess_type(image_file_response.filename)
raw_image_bytes = await extract_bytes_from_file(content_part.file_id, files_api)
ascii_text = generate_base64_ascii_text_from_bytes(raw_image_bytes)
image_data_url = construct_data_url(ascii_text, image_mime_type)
image_url = OpenAIImageURL(url=image_data_url, detail=detail)
converted_parts.append(OpenAIChatCompletionContentPartImageParam(image_url=image_url))
else:
raise ValueError(
f"Image content must have either 'image_url' or 'file_id'. "
f"Got image_url={content_part.image_url}, file_id={content_part.file_id}"
)
elif isinstance(content_part, OpenAIResponseInputMessageContentFile):
resolved_file_data = None
file_data = content_part.file_data
file_id = content_part.file_id
file_url = content_part.file_url
filename = content_part.filename
file_mime_type = None
if not any([file_data, file_id, file_url]):
raise ValueError(
f"File content must have at least one of 'file_data', 'file_id', or 'file_url'. "
f"Got file_data={file_data}, file_id={file_id}, file_url={file_url}"
)
if file_id:
if files_api is None:
raise ValueError("file_ids are not supported by this implementation of the Stack")
file_response = await files_api.openai_retrieve_file(file_id)
if not filename:
filename = file_response.filename
file_mime_type, _ = mimetypes.guess_type(file_response.filename)
raw_file_bytes = await extract_bytes_from_file(file_id, files_api)
ascii_text = generate_base64_ascii_text_from_bytes(raw_file_bytes)
resolved_file_data = construct_data_url(ascii_text, file_mime_type)
elif file_data:
if file_data.startswith("data:"):
resolved_file_data = file_data
else:
# Raw base64 data, wrap in data URL format
if filename:
file_mime_type, _ = mimetypes.guess_type(filename)
resolved_file_data = construct_data_url(file_data, file_mime_type)
elif file_url:
resolved_file_data = file_url
converted_parts.append(
OpenAIFile(
file=OpenAIFileFile(
file_data=resolved_file_data,
filename=filename,
)
)
)
elif isinstance(content_part, str):
converted_parts.append(OpenAIChatCompletionContentPartTextParam(text=content_part))
else:
@ -110,12 +221,14 @@ async def convert_response_content_to_chat_content(
async def convert_response_input_to_chat_messages(
input: str | list[OpenAIResponseInput],
previous_messages: list[OpenAIMessageParam] | None = None,
files_api: Files | None = None,
) -> list[OpenAIMessageParam]:
"""
Convert the input from an OpenAI Response API request into OpenAI Chat Completion messages.
:param input: The input to convert
:param previous_messages: Optional previous messages to check for function_call references
:param files_api: Files API for resolving file_id to raw file content (optional, required for file/image content)
"""
messages: list[OpenAIMessageParam] = []
if isinstance(input, list):
@ -169,6 +282,12 @@ async def convert_response_input_to_chat_messages(
elif isinstance(input_item, OpenAIResponseOutputMessageMCPListTools):
# the tool list will be handled separately
pass
elif isinstance(
input_item,
OpenAIResponseOutputMessageWebSearchToolCall | OpenAIResponseOutputMessageFileSearchToolCall,
):
# these tool calls are tracked internally but not converted to chat messages
pass
elif isinstance(input_item, OpenAIResponseMCPApprovalRequest) or isinstance(
input_item, OpenAIResponseMCPApprovalResponse
):
@ -176,7 +295,7 @@ async def convert_response_input_to_chat_messages(
pass
elif isinstance(input_item, OpenAIResponseMessage):
# Narrow type to OpenAIResponseMessage which has content and role attributes
content = await convert_response_content_to_chat_content(input_item.content)
content = await convert_response_content_to_chat_content(input_item.content, files_api)
message_type = await get_message_type_by_role(input_item.role)
if message_type is None:
raise ValueError(

View file

@ -34,6 +34,8 @@ def available_providers() -> list[ProviderSpec]:
Api.tool_runtime,
Api.tool_groups,
Api.conversations,
Api.prompts,
Api.files,
],
optional_api_dependencies=[
Api.safety,