feat: Add rerank models and rerank API change (#3831)

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Extend the model type to include rerank models.
- Implement `rerank()` method in inference router.
- Add `rerank_model_list` to `OpenAIMixin` to enable providers to
register and identify rerank models
- Update documentation.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
pytest tests/unit/providers/utils/inference/test_openai_mixin.py
```
This commit is contained in:
Jiayi Ni 2025-10-22 12:02:28 -07:00 committed by GitHub
parent f2598d30e6
commit bb1ebb3c6b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 186 additions and 43 deletions

View file

@ -1234,9 +1234,10 @@ class Inference(InferenceProvider):
Llama Stack Inference API for generating completions, chat completions, and embeddings.
This API provides the raw interface to the underlying models. Two kinds of models are supported:
This API provides the raw interface to the underlying models. Three kinds of models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions.
- Embedding models: these models generate embeddings to be used for semantic search.
- Rerank models: these models reorder the documents based on their relevance to a query.
"""
@webmethod(route="/openai/v1/chat/completions", method="GET", level=LLAMA_STACK_API_V1, deprecated=True)