feat: Add rerank models and rerank API change (#3831)

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Extend the model type to include rerank models.
- Implement `rerank()` method in inference router.
- Add `rerank_model_list` to `OpenAIMixin` to enable providers to
register and identify rerank models
- Update documentation.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
pytest tests/unit/providers/utils/inference/test_openai_mixin.py
```
This commit is contained in:
Jiayi Ni 2025-10-22 12:02:28 -07:00 committed by GitHub
parent f2598d30e6
commit bb1ebb3c6b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 186 additions and 43 deletions

View file

@ -6482,6 +6482,7 @@ components:
enum:
- llm
- embedding
- rerank
title: ModelType
description: >-
Enumeration of supported model types in Llama Stack.
@ -13585,13 +13586,16 @@ tags:
embeddings.
This API provides the raw interface to the underlying models. Two kinds of models
are supported:
This API provides the raw interface to the underlying models. Three kinds of
models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic
search.
- Rerank models: these models reorder the documents based on their relevance
to a query.
x-displayName: Inference
- name: Inspect
description: >-