mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-12 12:06:04 +00:00
Document known limitations of Responses
Signed-off-by: Bill Murdock <bmurdock@redhat.com>
This commit is contained in:
parent
007efa6eb5
commit
6f669dcb12
3 changed files with 249 additions and 2 deletions
|
|
@ -22,7 +22,6 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro
|
|||
## Provider Categories
|
||||
|
||||
- **[External Providers](external/index.mdx)** - Guide for building and using external providers
|
||||
- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer
|
||||
- **[Inference](inference/index.mdx)** - LLM and embedding model providers
|
||||
- **[Agents](agents/index.mdx)** - Agentic system providers
|
||||
- **[DatasetIO](datasetio/index.mdx)** - Dataset and data loader providers
|
||||
|
|
@ -31,3 +30,7 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro
|
|||
- **[Vector IO](vector_io/index.mdx)** - Vector database providers
|
||||
- **[Tool Runtime](tool_runtime/index.mdx)** - Tool and protocol providers
|
||||
- **[Files](files/index.mdx)** - File system and storage providers
|
||||
|
||||
## Other information about Providers
|
||||
- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer
|
||||
- **[OpenAI-Compatible Responses Limitations](./openai_responses_limitations.mdx)** - Known limitations of the Responses API in Llama Stack
|
||||
|
|
@ -1,3 +1,4 @@
|
|||
---
|
||||
title: OpenAI Compatibility
|
||||
description: OpenAI API Compatibility
|
||||
sidebar_label: OpenAI Compatibility
|
||||
|
|
@ -47,7 +48,7 @@ models = client.models.list()
|
|||
|
||||
#### Responses
|
||||
|
||||
> **Note:** The Responses API implementation is still in active development. While it is quite usable, there are still unimplemented parts of the API. We'd love feedback on any use-cases you try that do not work to help prioritize the pieces left to implement. Please open issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work.
|
||||
> **Note:** The Responses API implementation is still in active development. While it is quite usable, there are still unimplemented parts of the API. See [Known Limitations of the OpenAI-compatible Responses API in Llama Stack](./openai_responses_limitations.mdx) for more details.
|
||||
|
||||
##### Simple inference
|
||||
|
||||
|
|
|
|||
243
docs/docs/providers/openai_responses_limitations.mdx
Normal file
243
docs/docs/providers/openai_responses_limitations.mdx
Normal file
|
|
@ -0,0 +1,243 @@
|
|||
---
|
||||
title: Known Limitations of the OpenAI-compatible Responses API in Llama Stack
|
||||
description: Limitations of Responses API
|
||||
sidebar_label: Limitations of Responses API
|
||||
sidebar_position: 1
|
||||
---
|
||||
|
||||
## Unresolved Issues
|
||||
|
||||
This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025.
|
||||
See the OpenAI [changelog](https://platform.openai.com/docs/changelog) for details of any new functionality that has been added since that date. Links to issues are included so readers can read about status, post comments, and/or subscribe for updates relating to any limitations that are of specific interest to them. We would also love any other feedback on any use-cases you try that do not work to help prioritize the pieces left to implement.
|
||||
Please open new issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work that does not already have an open issue.
|
||||
|
||||
### Streaming
|
||||
|
||||
**Status:** Partial Implementation
|
||||
**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)
|
||||
|
||||
Streaming functionality for the Responses API is partially implemented and does work to some extent, but some streaming response objects that would be needed for full compatibility are still missing.
|
||||
|
||||
---
|
||||
|
||||
### Built-in Tools
|
||||
**Status:** Partial Implementation
|
||||
|
||||
OpenAI's Responses API includes an ecosystem of built-in tools (e.g., code interpreter) that lower the barrier to entry for agentic workflows. These tools are typically aligned with specific model training.
|
||||
|
||||
**Current Status in Llama Stack:**
|
||||
- Some built-in tools exist (file search, web search)
|
||||
- Missing tools include code interpreter, computer use, and image generation
|
||||
- Some built-in tools may require additional APIs (e.g., [containers API](https://platform.openai.com/docs/api-reference/containers) for code interpreter)
|
||||
|
||||
It's unclear whether there is demand for additional built-in tools in Llama Stack. No upstream issues have been filed for adding more built-in tools.
|
||||
|
||||
---
|
||||
|
||||
### Prompt Templates
|
||||
**Status:** Partial Implementation
|
||||
|
||||
**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)
|
||||
|
||||
OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing. This feature is under development for Llama Stack.
|
||||
|
||||
---
|
||||
|
||||
### Instructions
|
||||
**Status:** Partial Implementation + Work in Progress
|
||||
|
||||
**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)
|
||||
|
||||
In Llama Stack, the instructions parameter is already implemented for creating a response, but it is not yet included in the output response object.
|
||||
|
||||
---
|
||||
|
||||
### Response Branching
|
||||
**Status:** Not Working
|
||||
|
||||
Response branching, as discussed in the [Agents vs OpenAI Responses API documentation](https://llamastack.github.io/docs/building_applications/responses_vs_agents), is not currently functional.
|
||||
|
||||
---
|
||||
|
||||
### Include
|
||||
**Status:** Not Implemented
|
||||
|
||||
The `include` parameter allows you to provide a list of values that indicate additional information for the system to include in the model response. The [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create) specifies the following allowed values for this parameter.
|
||||
|
||||
- `web_search_call.action.sources`
|
||||
- `code_interpreter_call.outputs`
|
||||
- `computer_call_output.output.image_url`
|
||||
- `file_search_call.results`
|
||||
- `message.input_image.image_url`
|
||||
- `message.output_text.logprobs`
|
||||
- `reasoning.encrypted_content`
|
||||
|
||||
Some of these are not relevant to Llama Stack in its current form. For example, code interpreter is not implemented (see "Built-in tools" below), so `code_interpreter_call.outputs` would not be a useful directive to Llama Stack.
|
||||
|
||||
However, others might be useful. For example, `message.output_text.logprobs` can be useful for assessing how confident a model is in each token of its output.
|
||||
|
||||
---
|
||||
|
||||
### Tool Choice
|
||||
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548)
|
||||
|
||||
In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response. This feature is not implemented in Llama Stack.
|
||||
|
||||
---
|
||||
|
||||
### Safety Identification and Tracking
|
||||
|
||||
**Status:** Not Implemented
|
||||
|
||||
OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.
|
||||
|
||||
---
|
||||
|
||||
### Connectors
|
||||
**Status:** Not Implemented
|
||||
|
||||
Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).
|
||||
|
||||
**Open Questions:**
|
||||
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
|
||||
- Should there be a mechanism for administrators to add custom connectors via `run.yaml` or an API?
|
||||
|
||||
---
|
||||
|
||||
### Reasoning
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3551](https://github.com/llamastack/llama-stack/issues/3551)
|
||||
|
||||
---
|
||||
|
||||
### Service Tier
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3550](https://github.com/llamastack/llama-stack/issues/3550)
|
||||
|
||||
---
|
||||
|
||||
### Top Logprobs
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552)
|
||||
|
||||
The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above).
|
||||
It enables users to also get logprobs for alternative tokens.
|
||||
|
||||
---
|
||||
|
||||
### Max Tool Calls
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)
|
||||
|
||||
The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response. This feature needs full implementation and documentation.
|
||||
|
||||
---
|
||||
|
||||
### Max Output Tokens
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3562](https://github.com/llamastack/llama-stack/issues/3562)
|
||||
|
||||
---
|
||||
|
||||
### Metadata
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)
|
||||
|
||||
---
|
||||
|
||||
### Incomplete Details
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3567](https://github.com/llamastack/llama-stack/issues/3567)
|
||||
|
||||
---
|
||||
|
||||
### Background
|
||||
**Status:** Not Implemented
|
||||
|
||||
**Issue:** [#3568](https://github.com/llamastack/llama-stack/issues/3568)
|
||||
|
||||
---
|
||||
|
||||
### Global Guardrails
|
||||
**Status:** Feature Request
|
||||
|
||||
When calling the OpenAI Responses API, model outputs go through safety models configured by OpenAI administrators. Perhaps Llama Stack should provide a mechanism to configure safety models (or non-model logic) for all Responses requests, either through `run.yaml` or an administrative API.
|
||||
|
||||
---
|
||||
|
||||
### User-Controlled Guardrails
|
||||
|
||||
**Status:** Feature Request
|
||||
|
||||
**Issue:** [#3325](https://github.com/llamastack/llama-stack/issues/3325)
|
||||
|
||||
OpenAI has not released a way for users to configure their own guardrails. However, Llama Stack users may want this capability to complement or replace global guardrails. This could be implemented as a non-breaking, additive difference from the OpenAI API.
|
||||
|
||||
---
|
||||
|
||||
### MCP Elicitations
|
||||
**Status:** Unknown
|
||||
|
||||
Elicitations allow MCP servers to request additional information from users through the client during interactions (e.g., a tool requesting a username before proceeding).
|
||||
See the [MCP specification](https://modelcontextprotocol.io/specification/draft/client/elicitation) for details.
|
||||
|
||||
**Open Questions:**
|
||||
- Does this work in OpenAI's Responses API reference implementation?
|
||||
- If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
|
||||
- Does this work in Llama Stack?
|
||||
|
||||
---
|
||||
|
||||
### MCP Sampling
|
||||
**Status:** Unknown
|
||||
|
||||
Sampling allows MCP tools to query the generative AI model. See the [MCP specification](https://modelcontextprotocol.io/specification/draft/client/sampling) for details.
|
||||
|
||||
**Open Questions:**
|
||||
- Does this work in OpenAI's Responses API reference implementation?
|
||||
- If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
|
||||
- Does this work in Llama Stack?
|
||||
|
||||
---
|
||||
|
||||
### Parallel Tool Calls
|
||||
**Status:** Rumored Issue
|
||||
|
||||
There are reports that `parallel_tool_calls` may not work correctly. This needs verification and a ticket should be opened if confirmed.
|
||||
|
||||
---
|
||||
|
||||
## Resolved Issues
|
||||
|
||||
The following limitations have been addressed in recent releases:
|
||||
|
||||
### MCP and Function Tools with No Arguments
|
||||
**Status:** ✅ Resolved
|
||||
|
||||
MCP and function tools now work correctly even when they have no arguments.
|
||||
|
||||
---
|
||||
|
||||
### `require_approval` Parameter for MCP Tools
|
||||
**Status:** ✅ Resolved
|
||||
|
||||
The `require_approval` parameter for MCP tools in the Responses API now works correctly.
|
||||
|
||||
---
|
||||
|
||||
### MCP Tools with Array-Type Arguments
|
||||
**Status:** ✅ Resolved
|
||||
|
||||
**Fixed in:** [#3003](https://github.com/llamastack/llama-stack/pull/3003) (Agent API), [#3602](https://github.com/llamastack/llama-stack/pull/3602) (Responses API)
|
||||
|
||||
MCP tools now correctly handle array-type arguments in both the Agent API and Responses API.
|
||||
Loading…
Add table
Add a link
Reference in a new issue