Remove experimental from rerank models doc

2025-12-12 04:00:42 +00:00 · 2025-10-17 14:51:17 -07:00 · 2025-10-17 14:51:17 -07:00 · ad52849072
commit ad52849072
parent 51c923f096
9 changed files with 13 additions and 12 deletions
--- a/docs/docs/providers/inference/index.mdx
+++ b/docs/docs/providers/inference/index.mdx
@ -6,7 +6,7 @@ description: "Inference
    This API provides the raw interface to the underlying models. Three kinds of models are supported:
    - LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
    - Embedding models: these models generate embeddings to be used for semantic search.
-    - Rerank models (Experimental): these models reorder the documents based on their relevance to a query."
+    - Rerank models: these models reorder the documents based on their relevance to a query."
 sidebar_label: Inference
 title: Inference
 ---
@ -22,6 +22,6 @@ Inference
    This API provides the raw interface to the underlying models. Three kinds of models are supported:
    - LLM models: these models generate "raw" and "chat" (conversational) completions.
    - Embedding models: these models generate embeddings to be used for semantic search.
-    - Rerank models (Experimental): these models reorder the documents based on their relevance to a query.
+    - Rerank models: these models reorder the documents based on their relevance to a query.

 This section contains documentation for all available providers for the **inference** API.
--- a/docs/static/deprecated-llama-stack-spec.html
+++ b/docs/static/deprecated-llama-stack-spec.html
@ -13459,7 +13459,7 @@
        },
        {
            "name": "Inference",
-            "description": "Llama Stack Inference API for generating completions, chat completions, and embeddings.\n\nThis API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models (Experimental): these models reorder the documents based on their relevance to a query.",
+            "description": "Llama Stack Inference API for generating completions, chat completions, and embeddings.\n\nThis API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models: these models reorder the documents based on their relevance to a query.",
            "x-displayName": "Inference"
        },
        {
--- a/docs/static/deprecated-llama-stack-spec.yaml
+++ b/docs/static/deprecated-llama-stack-spec.yaml
@ -10218,8 +10218,8 @@ tags:
      - Embedding models: these models generate embeddings to be used for semantic
      search.

-      - Rerank models (Experimental): these models reorder the documents based on
-      their relevance to a query.
+      - Rerank models: these models reorder the documents based on their relevance
+      to a query.
    x-displayName: Inference
  - name: Models
    description: ''
--- a/docs/static/llama-stack-spec.html
+++ b/docs/static/llama-stack-spec.html
@ -13262,7 +13262,7 @@
        },
        {
            "name": "Inference",
-            "description": "Llama Stack Inference API for generating completions, chat completions, and embeddings.\n\nThis API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models (Experimental): these models reorder the documents based on their relevance to a query.",
+            "description": "Llama Stack Inference API for generating completions, chat completions, and embeddings.\n\nThis API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models: these models reorder the documents based on their relevance to a query.",
            "x-displayName": "Inference"
        },
        {
--- a/docs/static/llama-stack-spec.yaml
+++ b/docs/static/llama-stack-spec.yaml
@ -10191,8 +10191,8 @@ tags:
      - Embedding models: these models generate embeddings to be used for semantic
      search.

-      - Rerank models (Experimental): these models reorder the documents based on
-      their relevance to a query.
+      - Rerank models: these models reorder the documents based on their relevance
+      to a query.
    x-displayName: Inference
  - name: Inspect
    description: >-
--- a/docs/static/stainless-llama-stack-spec.html
+++ b/docs/static/stainless-llama-stack-spec.html
@ -17952,7 +17952,7 @@
        },
        {
            "name": "Inference",
-            "description": "Llama Stack Inference API for generating completions, chat completions, and embeddings.\n\nThis API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models (Experimental): these models reorder the documents based on their relevance to a query.",
+            "description": "Llama Stack Inference API for generating completions, chat completions, and embeddings.\n\nThis API provides the raw interface to the underlying models. Three kinds of models are supported:\n- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.\n- Embedding models: these models generate embeddings to be used for semantic search.\n- Rerank models: these models reorder the documents based on their relevance to a query.",
            "x-displayName": "Inference"
        },
        {
--- a/docs/static/stainless-llama-stack-spec.yaml
+++ b/docs/static/stainless-llama-stack-spec.yaml
@ -13586,8 +13586,8 @@ tags:
      - Embedding models: these models generate embeddings to be used for semantic
      search.

-      - Rerank models (Experimental): these models reorder the documents based on
-      their relevance to a query.
+      - Rerank models: these models reorder the documents based on their relevance
+      to a query.
    x-displayName: Inference
  - name: Inspect
    description: >-
--- a/llama_stack/apis/inference/inference.py
+++ b/llama_stack/apis/inference/inference.py
@ -1237,7 +1237,7 @@ class Inference(InferenceProvider):
    This API provides the raw interface to the underlying models. Three kinds of models are supported:
    - LLM models: these models generate "raw" and "chat" (conversational) completions.
    - Embedding models: these models generate embeddings to be used for semantic search.
-    - Rerank models (Experimental): these models reorder the documents based on their relevance to a query.
+    - Rerank models: these models reorder the documents based on their relevance to a query.
    """

    @webmethod(route="/openai/v1/chat/completions", method="GET", level=LLAMA_STACK_API_V1, deprecated=True)
--- a/llama_stack/providers/utils/inference/openai_mixin.py
+++ b/llama_stack/providers/utils/inference/openai_mixin.py
@ -48,6 +48,7 @@ class OpenAIMixin(NeedsRequestProviderData, ABC, BaseModel):
    - overwrite_completion_id: If True, overwrites the 'id' field in OpenAI responses
    - download_images: If True, downloads images and converts to base64 for providers that require it
    - embedding_model_metadata: A dictionary mapping model IDs to their embedding metadata
+    - rerank_model_list: A list of model IDs for rerank models
    - provider_data_api_key_field: Optional field name in provider data to look for API key
    - list_provider_model_ids: Method to list available models from the provider
    - get_extra_client_params: Method to provide extra parameters to the AsyncOpenAI client