feat: Add rerank models and rerank API change (#3831)

# What does this PR do?  - Extend the model type to include rerank models. - Implement `rerank()` method in inference router. - Add `rerank_model_list` to `OpenAIMixin` to enable providers to register and identify rerank models - Update documentation.   ## Test Plan  ``` pytest tests/unit/providers/utils/inference/test_openai_mixin.py ```
2025-10-25 17:11:12 +00:00 · 2025-10-22 12:02:28 -07:00 · 2025-10-22 12:02:28 -07:00 · bb1ebb3c6b
commit bb1ebb3c6b
parent f2598d30e6
12 changed files with 186 additions and 43 deletions
--- a/docs/static/stainless-llama-stack-spec.yaml
+++ b/docs/static/stainless-llama-stack-spec.yaml
@ -6482,6 +6482,7 @@ components:
      enum:
        - llm
        - embedding
+        - rerank
      title: ModelType
      description: >-
        Enumeration of supported model types in Llama Stack.
@ -13585,13 +13586,16 @@ tags:
      embeddings.


-      This API provides the raw interface to the underlying models. Two kinds of models
-      are supported:
+      This API provides the raw interface to the underlying models. Three kinds of
+      models are supported:

      - LLM models: these models generate "raw" and "chat" (conversational) completions.

      - Embedding models: these models generate embeddings to be used for semantic
      search.
+
+      - Rerank models: these models reorder the documents based on their relevance
+      to a query.
    x-displayName: Inference
  - name: Inspect
    description: >-