mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-04 02:03:44 +00:00
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - Extend the model type to include rerank models. - Implement `rerank()` method in inference router. - Add `rerank_model_list` to `OpenAIMixin` to enable providers to register and identify rerank models - Update documentation. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> ``` pytest tests/unit/providers/utils/inference/test_openai_mixin.py ```
27 lines
1.1 KiB
Text
27 lines
1.1 KiB
Text
---
|
|
description: "Inference
|
|
|
|
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
|
|
|
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
|
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
|
|
- Embedding models: these models generate embeddings to be used for semantic search.
|
|
- Rerank models: these models reorder the documents based on their relevance to a query."
|
|
sidebar_label: Inference
|
|
title: Inference
|
|
---
|
|
|
|
# Inference
|
|
|
|
## Overview
|
|
|
|
Inference
|
|
|
|
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
|
|
|
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
|
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
|
- Embedding models: these models generate embeddings to be used for semantic search.
|
|
- Rerank models: these models reorder the documents based on their relevance to a query.
|
|
|
|
This section contains documentation for all available providers for the **inference** API.
|