Try again to pass CI using different python and pre-commit versions

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-12-12 20:12:33 +00:00 · 2025-10-14 15:06:16 -04:00 · 2025-10-14 15:06:16 -04:00 · 0a1cff3ccf
commit 0a1cff3ccf
parent 6f669dcb12
2 changed files with 13 additions and 3 deletions
--- a/docs/docs/providers/openai_responses_limitations.mdx
+++ b/docs/docs/providers/openai_responses_limitations.mdx
@ -7,7 +7,7 @@ sidebar_position: 1

 ## Unresolved Issues

-This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025.
+This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025 (OpenAI's client version `openai==1.107`).
 See the OpenAI [changelog](https://platform.openai.com/docs/changelog) for details of any new functionality that has been added since that date. Links to issues are included so readers can read about status, post comments, and/or subscribe for updates relating to any limitations that are of specific interest to them. We would also love any other feedback on any use-cases you try that do not work to help prioritize the pieces left to implement.
 Please open new issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work that does not already have an open issue.

@ -208,6 +208,16 @@ Sampling allows MCP tools to query the generative AI model. See the [MCP specifi
 - If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
 - Does this work in Llama Stack?

+### Prompt Caching
+**Status:** Unknown
+
+OpenAI provides a [prompt caching](https://platform.openai.com/docs/guides/prompt-caching) mechanism in Responses that is enabled for its most recent models.
+
+**Open Questions:**
+- Does this work in Llama Stack?
+- If not, is there a reasonable way to make that work for those inference providers that have this capability by passing through the provided `prompt_cache_key` to the inference provider?
+- Is there a reasonable way to make that work for inference providers that don't build in this capability by doing some sort of caching at the Llama Stack layer?
+
 ---

 ### Parallel Tool Calls