mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-11 20:40:40 +00:00
fix: fixed document errors.
This commit is contained in:
parent
237b822945
commit
2c90d39a9f
1 changed files with 286 additions and 194 deletions
|
@ -10,32 +10,36 @@
|
||||||
2. [Introduction](#introduction)
|
2. [Introduction](#introduction)
|
||||||
3. [Problem Statement](#problem-statement)
|
3. [Problem Statement](#problem-statement)
|
||||||
4. [Proposed Reranking Solution](#proposed-reranking-solution)
|
4. [Proposed Reranking Solution](#proposed-reranking-solution)
|
||||||
4.1. [Extended API Endpoints](#extended-api-endpoints)
|
4.1. [Extended API Endpoints](#41-extended-api-endpoints)
|
||||||
4.1.1. [Enhanced Query Endpoint](#enhanced-query-endpoint)
|
4.1.1. [Query Endpoint](#411-query-endpoint)
|
||||||
4.1.2. [Dedicated Rerank Endpoint](#dedicated-rerank-endpoint)
|
4.1.2. [Dedicated Rerank Endpoint](#412-dedicated-rerank-endpoint)
|
||||||
4.2. [Data Models and Schemas](#data-models-and-schemas)
|
4.2. [Data Models and Schemas](#42-data-models-and-schemas)
|
||||||
4.3. [Implementation in the RAG Runtime](#implementation-in-the-rag-runtime)
|
4.3. [Implementation in the RAG Runtime](#43-implementation-in-the-rag-runtime)
|
||||||
4.3.1. [Inline Reranking Integration](#inline-reranking-integration)
|
4.3.1. [Inline Reranking Integration](#431-inline-reranking-integration)
|
||||||
4.3.2. [Reranker Service](#reranker-service)
|
4.3.2. [Reranker Service](#432-reranker-service)
|
||||||
4.4. [Example Configuration and Request](#example-configuration-and-request)
|
4.4. [Example Configuration and Request](#44-example-configuration-and-request)
|
||||||
5. [API Design Overview](#api-design-overview)
|
5. [API Design Overview](#api-design-overview)
|
||||||
6. [Considerations and Tradeoffs](#considerations-and-tradeoffs)
|
6. [Considerations and Tradeoffs](#6-considerations-and-tradeoffs)
|
||||||
7. [Conclusion](#conclusion)
|
7. [Conclusion](#7-conclusion)
|
||||||
8. [Approvals](#approvals)
|
8. [Approvals](#8-approvals)
|
||||||
|
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
This RFC proposes an enhancement to the Llama-Stack Retrieval-Augmented Generation (RAG) system through a configurable reranking component. Many enterprise users rely on legacy keyword search systems that already have significant investments in content synchronization and indexing. In these cases, re-ranking can improve accuracy by refining search results without requiring a full transition to a vector-based retrieval system. By incorporating an additional scoring step—using either a remote inference API or a self-hosted model—the system enhances document retrieval, providing more precise context for downstream tasks. Users have the flexibility to enable or disable reranking and to select a reranker from remote providers (e.g., LlamaRank, Voyage AI, Cohere) or self-hosted models (e.g., sentence-transformers, LLM-based inference). Additionally, telemetry updates are integrated to capture and report reranking metrics for enhanced observability and performance tuning.
|
This RFC proposes an enhancement to the Llama-Stack Retrieval-Augmented Generation (RAG) system through a configurable reranking component. Many enterprise users rely on legacy keyword search systems that already have significant investments in content synchronization and indexing. In these cases, re-ranking can improve accuracy by refining search results without requiring a full transition to a vector-based retrieval system. By incorporating an additional scoring step—using either a remote inference API or a self-hosted model—the system enhances document retrieval, providing more precise context for downstream tasks. Users have the flexibility to enable or disable reranking and to select a reranker from remote providers (e.g., LlamaRank, Voyage AI, Cohere) or self-hosted models (e.g., sentence-transformers, LLM-based inference). Additionally, telemetry updates are integrated to capture and report reranking metrics for enhanced observability and performance tuning.
|
||||||
By incorporating an additional scoring step—using either a remote inference API or a self-hosted model—the system enhances document retrieval, providing more precise context for downstream tasks. Users have the flexibility to enable or disable reranking and to select a reranker from remote providers (e.g., LlamaRank, Voyage AI, Cohere) or self-hosted models (e.g., sentence-transformers, LLM-based inference). Additionally, telemetry updates are integrated to capture and report reranking metrics for enhanced observability and performance tuning.
|
By incorporating an additional scoring step—using either a remote inference API or a self-hosted model—the system enhances document retrieval, providing more precise context for downstream tasks. Users have the flexibility to enable or disable reranking and to select a reranker from remote providers (e.g., LlamaRank, Voyage AI, Cohere) or self-hosted models (e.g., sentence-transformers, LLM-based inference). Additionally, telemetry updates are integrated to capture and report reranking metrics for enhanced observability and performance tuning.
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Current RAG implementations use embedding-based similarity search to retrieve document candidates; however, the preliminary ordering can be suboptimal for ambiguous or complex queries. For enterprise users who rely on keyword-based search systems, re-ranking can be especially impactful, as it enhances accuracy without requiring a full migration to vector search. This document outlines an approach that provides both API-based reranking and inline reranking, ensuring seamless integration with existing retrieval systems while emphasizing configurability, ease of implementation, and robust telemetry reporting.
|
Current RAG implementations use embedding-based similarity search to retrieve document candidates; however, the preliminary ordering can be suboptimal for ambiguous or complex queries. For enterprise users who rely on keyword-based search systems, re-ranking can be especially impactful, as it enhances accuracy without requiring a full migration to vector search. This document outlines an approach that provides both API-based reranking and inline reranking, ensuring seamless integration with existing retrieval systems while emphasizing configurability, ease of implementation, and robust telemetry reporting.
|
||||||
|
|
||||||
## Problem Statement
|
## Problem Statement
|
||||||
|
|
||||||
Existing RAG systems efficiently index and retrieve document chunks from vector stores, but they often lack a mechanism to refine initial results. This can lead to suboptimal context for LLMs and hinder overall performance. The case for re-ranking is especially strong for enterprise users relying on legacy keyword search systems, where significant investments have already been made in content synchronization and indexing. In these environments, re-ranking can substantially improve accuracy by refining outputs from established search infrastructure. While new vector stores using state-of-the-art dense models also benefit from re-ranking, the improvements tend to be less pronounced and may not justify the additional complexity and latency. Moreover, different operational needs mean that some users prefer a managed API solution, while others require inline control for low latency or data privacy.
|
Existing RAG systems efficiently index and retrieve document chunks from vector stores, but they often lack a mechanism to refine initial results. This can lead to suboptimal context for LLMs and hinder overall performance. The case for re-ranking is especially strong for enterprise users relying on legacy keyword search systems, where significant investments have already been made in content synchronization and indexing. In these environments, re-ranking can substantially improve accuracy by refining outputs from established search infrastructure. While new vector stores using state-of-the-art dense models also benefit from re-ranking, the improvements tend to be less pronounced and may not justify the additional complexity and latency. Moreover, different operational needs mean that some users prefer a managed API solution, while others require inline control for low latency or data privacy.
|
||||||
|
|
||||||
## Proposed Reranking Solution
|
## Proposed Reranking Solution
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
## 4.1. Extended API Endpoints
|
## 4.1. Extended API Endpoints
|
||||||
|
|
||||||
### 4.1.1. Query Endpoint
|
### 4.1.1. Query Endpoint
|
||||||
|
@ -66,18 +70,18 @@ class RAGToolRuntime(Protocol):
|
||||||
query_config: Optional[RAGQueryConfig] = None,
|
query_config: Optional[RAGQueryConfig] = None,
|
||||||
rerank_strategy: RankingStrategy = RankingStrategy.DEFAULT,
|
rerank_strategy: RankingStrategy = RankingStrategy.DEFAULT,
|
||||||
reranker_model_id: str = "my_model_id",
|
reranker_model_id: str = "my_model_id",
|
||||||
rerank_config: Optional[RAGRerankConfig] = None
|
rerank_config: Optional[RAGRerankConfig] = None,
|
||||||
) -> RAGQueryResult:
|
) -> RAGQueryResult: ...
|
||||||
...
|
|
||||||
```
|
```
|
||||||
> <sub>**Note:** Note: When rerank is enabled, the service will invoke the reranking process using the specified reranker_model_id and additional options defined in rerank_config.
|
|
||||||
|
> <sub>**Note:** Note: When rerank_strategy is not None, the service will invoke the reranking process using the specified reranker_model_id and additional options defined in rerank_config.
|
||||||
|
|
||||||
### 4.1.2. Dedicated Rerank Endpoint
|
### 4.1.2. Dedicated Rerank Endpoint
|
||||||
|
|
||||||
A new endpoint, `/tool-runtime/rag-tool/rerank`, is introduced. It accepts the following parameters:
|
A new endpoint, `/tool-runtime/rag-tool/rerank`, is introduced. It accepts the following parameters:
|
||||||
|
|
||||||
- **`query`** (*InterleavedContent*): The input search query.
|
- **`query`** (_InterleavedContent_): The input search query.
|
||||||
- **`retrieved_docs`** (*List[RAGDocument]*): A list of retrieved documents.
|
- **`retrieved_docs`** (_List[RAGDocument]_): A list of retrieved documents.
|
||||||
- `rerank_strategy` (RerankingStrategy): Determines the ranking strategy to be applied. Options include:
|
- `rerank_strategy` (RerankingStrategy): Determines the ranking strategy to be applied. Options include:
|
||||||
- `NONE` - No reranking applied.
|
- `NONE` - No reranking applied.
|
||||||
- `DEFAULT` (default) - Standard reranking based on computed relevance scores.
|
- `DEFAULT` (default) - Standard reranking based on computed relevance scores.
|
||||||
|
@ -85,9 +89,9 @@ A new endpoint, `/tool-runtime/rag-tool/rerank`, is introduced. It accepts the f
|
||||||
- `HYBRID` - Combines multiple ranking methods for improved results.
|
- `HYBRID` - Combines multiple ranking methods for improved results.
|
||||||
- `LLM_RERANK` - Uses an LLM-based model for reranking.
|
- `LLM_RERANK` - Uses an LLM-based model for reranking.
|
||||||
- etc… - More can be added when needed.
|
- etc… - More can be added when needed.
|
||||||
- **`reranker_model_id`** (*string*): Identifier of the reranker model.
|
- **`reranker_model_id`** (_string_): Identifier of the reranker model.
|
||||||
- **`top_k`** (*integer*): The number of top documents to return.
|
- **`top_k`** (_integer_): The number of top documents to return.
|
||||||
- **`rerank_config`** (*Optional[RAGRerankConfig]*): Configures additional options for the reranking process (e.g. `api_url`, `api_key`).
|
- **`rerank_config`** (_Optional[RAGRerankConfig]_): Configures additional options for the reranking process (e.g. `api_url`, `api_key`).
|
||||||
|
|
||||||
Below is an example implementation of the endpoint:
|
Below is an example implementation of the endpoint:
|
||||||
|
|
||||||
|
@ -100,11 +104,12 @@ async def rerank(
|
||||||
top_k: int = 5,
|
top_k: int = 5,
|
||||||
rerank_strategy: RerankingStrategy = RerankingStrategy.DEFAULT,
|
rerank_strategy: RerankingStrategy = RerankingStrategy.DEFAULT,
|
||||||
reranker_model_id: Optional[str] = None,
|
reranker_model_id: Optional[str] = None,
|
||||||
rerank_config: Optional[RAGRerankConfig] = None
|
rerank_config: Optional[RAGRerankConfig] = None,
|
||||||
) -> RerankResponse:
|
) -> RerankResponse:
|
||||||
"""Re-rank retrieved documents based on relevance"""
|
"""Re-rank retrieved documents based on relevance"""
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
## 4.2. Data Models and Schemas
|
## 4.2. Data Models and Schemas
|
||||||
|
|
||||||
The following Pydantic schemas define the data models that support the reranking process. They ensure that the input and output data structures conform to expected types and constraints.
|
The following Pydantic schemas define the data models that support the reranking process. They ensure that the input and output data structures conform to expected types and constraints.
|
||||||
|
@ -122,10 +127,12 @@ The following Pydantic schemas define the data models that support the reranking
|
||||||
Defines the different strategies available for reranking, ranging from basic ranking to more advanced techniques leveraging LLMs and hybrid approaches.
|
Defines the different strategies available for reranking, ranging from basic ranking to more advanced techniques leveraging LLMs and hybrid approaches.
|
||||||
|
|
||||||
Below is the updated schema definition:
|
Below is the updated schema definition:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from pydantic import BaseModel, Field
|
from pydantic import BaseModel, Field
|
||||||
from typing import List, Dict, Any, Optional
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
class RerankedDocument(BaseModel):
|
class RerankedDocument(BaseModel):
|
||||||
"""Represents a single document after reranking.
|
"""Represents a single document after reranking.
|
||||||
|
|
||||||
|
@ -133,9 +140,11 @@ class RerankedDocument(BaseModel):
|
||||||
index (int): The original position of the document.
|
index (int): The original position of the document.
|
||||||
relevance_score (float): The computed relevance score.
|
relevance_score (float): The computed relevance score.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
index: int
|
index: int
|
||||||
relevance_score: float
|
relevance_score: float
|
||||||
|
|
||||||
|
|
||||||
class RerankResponse(BaseModel):
|
class RerankResponse(BaseModel):
|
||||||
"""Defines the response structure for the rerank endpoint.
|
"""Defines the response structure for the rerank endpoint.
|
||||||
|
|
||||||
|
@ -145,24 +154,26 @@ class RerankResponse(BaseModel):
|
||||||
metadata (Dict[str, Any]): Additional metadata related to the
|
metadata (Dict[str, Any]): Additional metadata related to the
|
||||||
reranking process.
|
reranking process.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
reranked_documents: List[RerankedDocument]
|
reranked_documents: List[RerankedDocument]
|
||||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
@json_schema_type
|
@json_schema_type
|
||||||
class RAGRerankConfig(BaseModel):
|
class RAGRerankConfig(BaseModel):
|
||||||
"""Configuration settings for the reranking service.
|
"""Configuration settings for the reranking service.
|
||||||
|
|
||||||
Attributes:
|
Attributes:
|
||||||
is_external_reranker (bool): The type of provider. Defaults to False.
|
|
||||||
api_url (Optional[str]): The API endpoint for the external
|
api_url (Optional[str]): The API endpoint for the external
|
||||||
service (if applicable).
|
service (if applicable).
|
||||||
api_key (Optional[str]): The API key for authenticating with the
|
api_key (Optional[str]): The API key for authenticating with the
|
||||||
external service.
|
external service.
|
||||||
"""
|
"""
|
||||||
is_external_reranker: bool = False
|
|
||||||
api_url: Optional[str] = None
|
api_url: Optional[str] = None
|
||||||
api_key: Optional[str] = None
|
api_key: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
class RerankingStrategy(Enum):
|
class RerankingStrategy(Enum):
|
||||||
"""Defines different strategies for reranking documents.
|
"""Defines different strategies for reranking documents.
|
||||||
|
|
||||||
|
@ -173,6 +184,7 @@ class RerankingStrategy(Enum):
|
||||||
LLM_RERANK (str): Leverages an LLM-based model for reranking.
|
LLM_RERANK (str): Leverages an LLM-based model for reranking.
|
||||||
HYBRID (str): Combines multiple ranking methods for improved results.
|
HYBRID (str): Combines multiple ranking methods for improved results.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
NONE = "none"
|
NONE = "none"
|
||||||
DEFAULT = "default"
|
DEFAULT = "default"
|
||||||
BOOST = "boost"
|
BOOST = "boost"
|
||||||
|
@ -190,6 +202,7 @@ Below is an improved example of the integration:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
class MemoryToolRuntimeImpl(ToolsProtocolPrivate, ToolRuntime, RAGToolRuntime):
|
class MemoryToolRuntimeImpl(ToolsProtocolPrivate, ToolRuntime, RAGToolRuntime):
|
||||||
|
|
||||||
async def query(
|
async def query(
|
||||||
self,
|
self,
|
||||||
content: InterleavedContent,
|
content: InterleavedContent,
|
||||||
|
@ -197,45 +210,53 @@ class MemoryToolRuntimeImpl(ToolsProtocolPrivate, ToolRuntime, RAGToolRuntime):
|
||||||
query_config: Optional[RAGQueryConfig] = None,
|
query_config: Optional[RAGQueryConfig] = None,
|
||||||
rerank_strategy: RerankingStrategy = RerankingStrategy.DEFAULT,
|
rerank_strategy: RerankingStrategy = RerankingStrategy.DEFAULT,
|
||||||
reranker_model_id: str = "my_model_id",
|
reranker_model_id: str = "my_model_id",
|
||||||
rerank_config: Optional[RAGRerankConfig] = None
|
rerank_config: Optional[RAGRerankConfig] = None,
|
||||||
) -> RAGQueryResult:
|
) -> RAGQueryResult:
|
||||||
|
|
||||||
# ... [initial retrieval logic that produces `chunks` and `scores`] ...
|
# ... [initial retrieval logic that produces `chunks` and `scores`] ...
|
||||||
|
|
||||||
if rerank_strategy:
|
if rerank_strategy:
|
||||||
|
|
||||||
# Call RerankerService to obtain refined relevance scores.
|
# Call RerankerService to obtain refined relevance scores.
|
||||||
# Note: The reranker service uses the query, the retrieved
|
|
||||||
document chunks, and any additional config.
|
# Note: The reranker service uses the query, the retrieved document chunks, and any additional config.
|
||||||
|
|
||||||
reranked_results = await RerankerService.rerank_documents(
|
reranked_results = await RerankerService.rerank_documents(
|
||||||
query=content,
|
query=content,
|
||||||
documents=list(chunks),
|
documents=list(chunks),
|
||||||
rerank_strategy=rerank_strategy
|
rerank_strategy=rerank_strategy,
|
||||||
config=rerank_config
|
config=rerank_config,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Build a mapping from the original chunk index to its new
|
# Build a mapping from the original chunk index to its new reranked score.
|
||||||
reranked score.
|
|
||||||
index_to_score = {
|
index_to_score = {
|
||||||
doc.index: doc.relevance_score for doc in reranked_results.reranked_documents
|
doc.index: doc.relevance_score
|
||||||
|
for doc in reranked_results.reranked_documents
|
||||||
}
|
}
|
||||||
|
|
||||||
# Combine each chunk with its corresponding reranked score
|
# Combine each chunk with its corresponding reranked score (defaulting to 0.0 if absent).
|
||||||
(defaulting to 0.0 if absent).
|
|
||||||
sort_data = [
|
sort_data = [
|
||||||
(chunk, index_to_score.get(i, 0.0)) for i, chunk in enumerate(chunks)
|
(chunk, index_to_score.get(i, 0.0)) for i, chunk in enumerate(chunks)
|
||||||
]
|
]
|
||||||
|
|
||||||
else:
|
else:
|
||||||
|
|
||||||
# Use the original scores when reranking is not enabled.
|
# Use the original scores when reranking is not enabled.
|
||||||
|
|
||||||
sort_data = list(zip(chunks, scores))
|
sort_data = list(zip(chunks, scores))
|
||||||
|
|
||||||
# Sort chunks by score in descending order and extract the sorted
|
# Sort chunks by score in descending order and extract the sorted chunks.
|
||||||
chunks.
|
|
||||||
reranked_chunks = [
|
reranked_chunks = [
|
||||||
chunk for chunk, _ in sorted(sort_data, key=lambda pair: pair[1], reverse=True)
|
chunk
|
||||||
|
for chunk, _ in sorted(sort_data, key=lambda pair: pair[1], reverse=True)
|
||||||
]
|
]
|
||||||
|
|
||||||
# ... [further processing to build and return a RAGQueryResult] ...
|
# ... [further processing to build and return a RAGQueryResult] ...
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
> <sub>**Note:** Note: When a rerank_strategy is specified, an additional reranking process is applied. If the rerank_config includes a URL or API key, the system will use the external reranker; otherwise, it defaults to the local reranker. This design clearly separates enabling reranking from selecting the provider.
|
> <sub>**Note:** Note: When a rerank_strategy is specified, an additional reranking process is applied. If the rerank_config includes a URL or API key, the system will use the external reranker; otherwise, it defaults to the local reranker. This design clearly separates enabling reranking from selecting the provider.
|
||||||
|
|
||||||
### 4.3.2. Reranker Service
|
### 4.3.2. Reranker Service
|
||||||
|
@ -251,6 +272,7 @@ from abc import ABC, abstractmethod
|
||||||
from typing import List, Any
|
from typing import List, Any
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
class RerankerProvider(ABC):
|
class RerankerProvider(ABC):
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
async def compute_scores(self, query: str, chunks: List[Any]) -> np.ndarray:
|
async def compute_scores(self, query: str, chunks: List[Any]) -> np.ndarray:
|
||||||
|
@ -260,64 +282,82 @@ class RerankerProvider(ABC):
|
||||||
|
|
||||||
#### LocalReranker
|
#### LocalReranker
|
||||||
|
|
||||||
The `LocalReranker` computes relevance scores for query–document pairs using one of two methods, based on whether a model ID is provided:
|
Reranks query–document pairs based on a specified strategy. It requires a `model_id` and a `rerank_strategy`. If `model_id` is missing, a `ValueError` is raised.
|
||||||
|
|
||||||
#### Default (No Model ID Provided)
|
##### Initialization
|
||||||
If no model ID is given, the component initializes a pre-trained **CrossEncoder** model, for example, `"cross-encoder/ms-marco-MiniLM-L-6-v2"`. In this mode:
|
|
||||||
- It pairs the query with each document chunk’s content.
|
|
||||||
- It computes relevance scores by calling the model’s `predict` method.
|
|
||||||
- The computation runs in a non-blocking way using `asyncio.to_thread`.
|
|
||||||
|
|
||||||
#### Embedding-based (Model ID Provided)
|
- Call `LocalReranker(model_id, rerank_strategy)` to create an instance.
|
||||||
When a model ID is provided, the component switches to an **embedding-based approach**:
|
- The constructor immediately calls `_initialize_model()` asynchronously to set up the model.
|
||||||
1. It first attempts to retrieve the model from a local models registry.
|
- If `model_id` is not provided, a ValueError is raised.
|
||||||
2. If the model is found, it uses the retrieved provider model ID.
|
- If `rerank_strategy == RerankingStrategy.LLM_RERANK`, the component returns the provided model_id as is (placeholder behavior).
|
||||||
3. If not, it registers the model and then uses the new provider model ID.
|
- Otherwise, it assumes model_id corresponds to a cross-encoder or other model that supports `.predict()`.
|
||||||
4. Once the model is set:
|
|
||||||
- It computes embeddings for the query and each document chunk (via an `embeddings` function).
|
|
||||||
- It calculates relevance scores by measuring the **cosine similarity** between the query embedding and each chunk’s embedding.
|
|
||||||
|
|
||||||
This flexible design enables the LocalReranker to seamlessly switch between a default model and a user-provided model, offering redundancy and a quick setup option.
|
##### Scoring
|
||||||
|
|
||||||
> <sub> **Note:** For a default model setup, we need to agree on an appropriate default model—whether that be an **embedding model** or a **cross-encoder**—that provides reliable predictions for general-purpose tasks.
|
- Call compute_scores(query, chunks).
|
||||||
|
- The method pairs the query with each chunk’s content to form (query, content) pairs.
|
||||||
|
- The exact post-processing depends on rerank_strategy:
|
||||||
|
- `NONE`: Returns zeros of length chunks.
|
||||||
|
- `DEFAULT`: Returns the raw scores from predict().
|
||||||
|
- `BOOST`: Placeholder for boosted scores (currently the same as raw).
|
||||||
|
- `HYBRID`: Placeholder for a hybrid approach (currently the same as raw).
|
||||||
|
- `LLM_RERANK`: Placeholder for LLM-based logic.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
class LocalReranker(RerankerProvider):
|
class LocalReranker(RerankerProvider):
|
||||||
def __init__(self, model_id: Optional[str] = None) -> None:
|
|
||||||
self.models_registry = ModelsRoutingTable()
|
def __init__(
|
||||||
|
self,
|
||||||
|
model_id: Optional[str] = None,
|
||||||
|
rerank_strategy: RerankingStrategy = RerankingStrategy.DEFAULT,
|
||||||
|
) -> None:
|
||||||
|
|
||||||
self.model_id = model_id
|
self.model_id = model_id
|
||||||
self.method = "embedding" if model_id else "default"
|
self.rerank_strategy = rerank_strategy
|
||||||
self.model = asyncio.run(self._initialize_model(model_id))
|
self.model = asyncio.run(self._initialize_model())
|
||||||
|
|
||||||
async def _initialize_model(self, model_id: Optional[str]):
|
async def _initialize_model(self) -> Any:
|
||||||
if not model_id:
|
if not self.model_id:
|
||||||
return CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
|
raise ValueError(
|
||||||
|
"No model_id provided, but a model is required for this reranking strategy."
|
||||||
model = await self.models_registry.get_model(model_id)
|
|
||||||
if model:
|
|
||||||
return model.provider_model_id
|
|
||||||
|
|
||||||
return (await self.models_registry.register_model(
|
|
||||||
model_id=model_id,
|
|
||||||
provider_model_id=model_id,
|
|
||||||
metadata={"embedding_dimension": "512"}, # Needs to be set dynamically
|
|
||||||
model_type=ModelType.embedding
|
|
||||||
)).provider_model_id
|
|
||||||
|
|
||||||
async def compute_scores(self, query: str, chunks: List[Any]) -> np.ndarray:
|
|
||||||
if self.method == "default":
|
|
||||||
return await asyncio.to_thread(self.model.predict, [(query, chunk.content) for chunk in chunks])
|
|
||||||
|
|
||||||
query_embedding = np.array(
|
|
||||||
(await self.embeddings(self.model_id, [InterleavedContent(text=query)])).embeddings
|
|
||||||
).reshape(1, -1)
|
|
||||||
|
|
||||||
chunk_embeddings = np.array(
|
|
||||||
(await self.embeddings(self.model_id, [InterleavedContent(text=chunk.content) for chunk in chunks])).embeddings
|
|
||||||
)
|
)
|
||||||
|
|
||||||
return cosine_similarity(query_embedding, chunk_embeddings).flatten()
|
if self.rerank_strategy == RerankingStrategy.LLM_RERANK:
|
||||||
|
# Placeholder for an LLM-based model loader
|
||||||
|
return self.model_id
|
||||||
|
|
||||||
|
# Default: use model_id for cross-encoder/embedding-based model loading
|
||||||
|
return self.model_id
|
||||||
|
|
||||||
|
async def compute_scores(self, query: str, chunks: List[Any]) -> np.ndarray:
|
||||||
|
if self.rerank_strategy == RerankingStrategy.NONE:
|
||||||
|
return np.zeros(len(chunks))
|
||||||
|
|
||||||
|
# Create (query, content) pairs from chunks
|
||||||
|
pairs = [(query, chunk.content) for chunk in chunks]
|
||||||
|
|
||||||
|
# Compute scores in a non-blocking way
|
||||||
|
scores = await asyncio.to_thread(self.model.predict, pairs)
|
||||||
|
|
||||||
|
if self.rerank_strategy == RerankingStrategy.DEFAULT:
|
||||||
|
return scores
|
||||||
|
|
||||||
|
elif self.rerank_strategy == RerankingStrategy.BOOST:
|
||||||
|
# Placeholder: apply a boost factor to scores
|
||||||
|
return scores
|
||||||
|
|
||||||
|
elif self.rerank_strategy == RerankingStrategy.HYBRID:
|
||||||
|
# Placeholder: combine cross-encoder scores with embedding-based scores
|
||||||
|
return scores
|
||||||
|
|
||||||
|
elif self.rerank_strategy == RerankingStrategy.LLM_RERANK:
|
||||||
|
# Placeholder: use an LLM to compute scores
|
||||||
|
return scores
|
||||||
|
|
||||||
|
else:
|
||||||
|
raise ValueError(f"Unknown reranking strategy: {self.rerank_strategy}")
|
||||||
```
|
```
|
||||||
|
|
||||||
#### ExternalReranker
|
#### ExternalReranker
|
||||||
|
|
||||||
For scenarios where reranking is handled externally, this provider sends the search query and document chunk contents to a specified API endpoint. It constructs a JSON payload—including the query, document contents, API key, and any extra parameters—and performs an asynchronous HTTP POST request to obtain relevance scores. If the API call fails (due to connectivity issues, HTTP errors, or other unexpected issues), it raises a `RuntimeError` with a descriptive error message.
|
For scenarios where reranking is handled externally, this provider sends the search query and document chunk contents to a specified API endpoint. It constructs a JSON payload—including the query, document contents, API key, and any extra parameters—and performs an asynchronous HTTP POST request to obtain relevance scores. If the API call fails (due to connectivity issues, HTTP errors, or other unexpected issues), it raises a `RuntimeError` with a descriptive error message.
|
||||||
|
@ -330,7 +370,7 @@ class ExternalReranker(RerankerProvider):
|
||||||
api_key: str,
|
api_key: str,
|
||||||
headers: Optional[dict] = None,
|
headers: Optional[dict] = None,
|
||||||
timeout: float = 5.0,
|
timeout: float = 5.0,
|
||||||
**kwargs
|
**kwargs,
|
||||||
) -> None:
|
) -> None:
|
||||||
self.api_url = api_url
|
self.api_url = api_url
|
||||||
self.api_key = api_key
|
self.api_key = api_key
|
||||||
|
@ -349,31 +389,34 @@ class ExternalReranker(RerankerProvider):
|
||||||
}
|
}
|
||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient(timeout=self.timeout) as client:
|
async with httpx.AsyncClient(timeout=self.timeout) as client:
|
||||||
response = await client.post(self.api_url, json=payload, headers=self.headers)
|
response = await client.post(
|
||||||
|
self.api_url, json=payload, headers=self.headers
|
||||||
|
)
|
||||||
response.raise_for_status()
|
response.raise_for_status()
|
||||||
data = response.json()
|
data = response.json()
|
||||||
scores = np.array(data.get("scores", []))
|
scores = np.array(data.get("scores", []))
|
||||||
return scores
|
return scores
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
raise RuntimeError(f"Unexpected error while calling external reranker: {e}") from e
|
raise RuntimeError(
|
||||||
|
f"Unexpected error while calling external reranker: {e}"
|
||||||
|
) from e
|
||||||
```
|
```
|
||||||
|
|
||||||
#### RerankerProviderFactory
|
#### RerankerProviderFactory
|
||||||
|
|
||||||
This factory class selects between local and external reranking strategies based on the provided input parameters. When `is_external_reranker` is set to `True`, it checks that an `api_url` is supplied (raising a `ValueError` if not) and instantiates an `ExternalReranker` with the given API URL, headers, and timeout. Otherwise, it defaults to creating a `LocalReranker`, optionally using a provided model identifier.
|
This factory class determines whether to instantiate a local or external reranker based on the provided input parameters. If an `api_url` is supplied, it creates an `ExternalReranker` instance with the given API URL, headers, and timeout. If `api_url` is missing, a `ValueError` is raised. If no external provider is specified, the factory defaults to creating a `LocalReranker`, optionally using a provided model identifier.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
class RerankerProviderFactory:
|
class RerankerProviderFactory:
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def get_provider(is_external_reranker: bool = False, **kwargs) -> RerankerProvider:
|
def get_provider(**kwargs) -> RerankerProvider:
|
||||||
if is_external_reranker == True:
|
api_url = kwargs.get("api_url", False)
|
||||||
api_url = kwargs.get("api_url")
|
|
||||||
if not api_url:
|
if not api_url:
|
||||||
raise ValueError("api_url must be provided for the external provider")
|
raise ValueError("api_url must be provided for the external provider")
|
||||||
return ExternalReranker(
|
return ExternalReranker(
|
||||||
api_url=api_url,
|
api_url=api_url,
|
||||||
headers=kwargs.get("headers", {}),
|
headers=kwargs.get("headers", {}),
|
||||||
timeout=kwargs.get("timeout", 60.0)
|
timeout=kwargs.get("timeout", 60.0),
|
||||||
)
|
)
|
||||||
# Default to local provider
|
# Default to local provider
|
||||||
return LocalReranker(model=kwargs.get("model"))
|
return LocalReranker(model=kwargs.get("model"))
|
||||||
|
@ -384,9 +427,11 @@ class RerankerProviderFactory:
|
||||||
The `RerankerService` is the core component that orchestrates the entire document reranking process. It performs the following steps:
|
The `RerankerService` is the core component that orchestrates the entire document reranking process. It performs the following steps:
|
||||||
|
|
||||||
- #### Relevance Score Computation:
|
- #### Relevance Score Computation:
|
||||||
|
|
||||||
It begins by computing relevance scores for each document chunk using a configured provider (either local or external).
|
It begins by computing relevance scores for each document chunk using a configured provider (either local or external).
|
||||||
|
|
||||||
- #### Metric Calculation:
|
- #### Metric Calculation:
|
||||||
|
|
||||||
The service then calculates key statistical metrics from the scores:
|
The service then calculates key statistical metrics from the scores:
|
||||||
|
|
||||||
- **Mean Score**: The average relevance score.
|
- **Mean Score**: The average relevance score.
|
||||||
|
@ -396,12 +441,15 @@ The `RerankerService` is the core component that orchestrates the entire documen
|
||||||
> <sub> **Note**: Metrics can be added or removed based on the specific insights or data points you want to collect.
|
> <sub> **Note**: Metrics can be added or removed based on the specific insights or data points you want to collect.
|
||||||
|
|
||||||
#### Telemetry Logging
|
#### Telemetry Logging
|
||||||
|
|
||||||
To ensure observability, telemetry data—including raw scores, execution time, and the calculated metrics—is logged via tracing utilities.
|
To ensure observability, telemetry data—including raw scores, execution time, and the calculated metrics—is logged via tracing utilities.
|
||||||
|
|
||||||
#### Reranked Document Construction
|
#### Reranked Document Construction
|
||||||
|
|
||||||
It builds a list of reranked documents, where each document is annotated with an index and its relevance score. The list is then sorted in descending order of relevance.
|
It builds a list of reranked documents, where each document is annotated with an index and its relevance score. The list is then sorted in descending order of relevance.
|
||||||
|
|
||||||
#### Reranked Response and Document Metadata
|
#### Reranked Response and Document Metadata
|
||||||
|
|
||||||
Finally, the service returns a detailed response that includes both the sorted reranked documents and associated metadata.
|
Finally, the service returns a detailed response that includes both the sorted reranked documents and associated metadata.
|
||||||
|
|
||||||
For convenience, a static method `rerank_documents` is provided. This method dynamically selects the appropriate provider based on configuration parameters and initiates the reranking process, offering a simple entry point for external callers.
|
For convenience, a static method `rerank_documents` is provided. This method dynamically selects the appropriate provider based on configuration parameters and initiates the reranking process, offering a simple entry point for external callers.
|
||||||
|
@ -426,29 +474,59 @@ class RerankerService:
|
||||||
scores."""
|
scores."""
|
||||||
mean_score = float(np.mean(scores))
|
mean_score = float(np.mean(scores))
|
||||||
std_score = float(np.std(scores))
|
std_score = float(np.std(scores))
|
||||||
score_gap = float(np.max(scores) - np.partition(scores, -2)[-2]) if len(scores) > 1 else 0.0
|
score_gap = (
|
||||||
return {"mean_score": mean_score, "std_score": std_score, "score_gap": score_gap}
|
float(np.max(scores) - np.partition(scores, -2)[-2])
|
||||||
|
if len(scores) > 1
|
||||||
|
else 0.0
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"mean_score": mean_score,
|
||||||
|
"std_score": std_score,
|
||||||
|
"score_gap": score_gap,
|
||||||
|
}
|
||||||
|
|
||||||
def _log_telemetry(self, scores: np.ndarray, chunks: List[Any], exec_time_ms: float, metrics: Dict[str, float]) -> None:
|
def _log_telemetry(
|
||||||
|
self,
|
||||||
|
scores: np.ndarray,
|
||||||
|
chunks: List[Any],
|
||||||
|
exec_time_ms: float,
|
||||||
|
metrics: Dict[str, float],
|
||||||
|
) -> None:
|
||||||
"""Logs telemetry data for reranking performance and
|
"""Logs telemetry data for reranking performance and
|
||||||
analytics."""
|
analytics."""
|
||||||
current_span = get_current_span()
|
current_span = get_current_span()
|
||||||
if current_span:
|
if current_span:
|
||||||
current_span.set_attribute("reranker.raw_scores", serialize_value(scores))
|
current_span.set_attribute("reranker.raw_scores", serialize_value(scores))
|
||||||
current_span.set_attribute("reranker.top_score", serialize_value(float(np.max(scores)) if scores.size > 0 else None))
|
current_span.set_attribute(
|
||||||
current_span.set_attribute("reranker.chunk_count", serialize_value(len(chunks)))
|
"reranker.top_score",
|
||||||
current_span.set_attribute("reranker.execution_time_ms", serialize_value(exec_time_ms))
|
serialize_value(float(np.max(scores)) if scores.size > 0 else None),
|
||||||
current_span.set_attribute("reranker.mean_score", serialize_value(metrics["mean_score"]))
|
)
|
||||||
current_span.set_attribute("reranker.std_score", serialize_value(metrics["std_score"]))
|
current_span.set_attribute(
|
||||||
current_span.set_attribute("reranker.score_gap", serialize_value(metrics["score_gap"]))
|
"reranker.chunk_count", serialize_value(len(chunks))
|
||||||
|
)
|
||||||
|
current_span.set_attribute(
|
||||||
|
"reranker.execution_time_ms", serialize_value(exec_time_ms)
|
||||||
|
)
|
||||||
|
current_span.set_attribute(
|
||||||
|
"reranker.mean_score", serialize_value(metrics["mean_score"])
|
||||||
|
)
|
||||||
|
current_span.set_attribute(
|
||||||
|
"reranker.std_score", serialize_value(metrics["std_score"])
|
||||||
|
)
|
||||||
|
current_span.set_attribute(
|
||||||
|
"reranker.score_gap", serialize_value(metrics["score_gap"])
|
||||||
|
)
|
||||||
|
|
||||||
def _build_reranked_documents(self, scores: np.ndarray) -> List[RerankedDocument]:
|
def _build_reranked_documents(self, scores: np.ndarray) -> List[RerankedDocument]:
|
||||||
"""Creates and sorts reranked documents based on computed
|
"""Creates and sorts reranked documents based on computed
|
||||||
scores."""
|
scores."""
|
||||||
return sorted(
|
return sorted(
|
||||||
[RerankedDocument(index=i, relevance_score=score) for i, score in enumerate(scores)],
|
[
|
||||||
|
RerankedDocument(index=i, relevance_score=score)
|
||||||
|
for i, score in enumerate(scores)
|
||||||
|
],
|
||||||
key=lambda doc: doc.relevance_score,
|
key=lambda doc: doc.relevance_score,
|
||||||
reverse=True
|
reverse=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
async def _rerank_documents(self, query: str, chunks: List[Any]) -> RerankResponse:
|
async def _rerank_documents(self, query: str, chunks: List[Any]) -> RerankResponse:
|
||||||
|
@ -480,14 +558,14 @@ class RerankerService:
|
||||||
"raw_scores": scores,
|
"raw_scores": scores,
|
||||||
"execution_time_ms": exec_time_ms,
|
"execution_time_ms": exec_time_ms,
|
||||||
**metrics,
|
**metrics,
|
||||||
}
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
async def rerank_documents(
|
async def rerank_documents(
|
||||||
query: str,
|
query: str,
|
||||||
chunks: List[Any],
|
chunks: List[Any],
|
||||||
is_external_reranker: bool = False,
|
rerank_strategy: RerankingStrategy,
|
||||||
**provider_kwargs
|
**provider_kwargs
|
||||||
) -> RerankResponse:
|
) -> RerankResponse:
|
||||||
"""
|
"""
|
||||||
|
@ -497,14 +575,16 @@ class RerankerService:
|
||||||
Parameters:
|
Parameters:
|
||||||
query (str): The search query.
|
query (str): The search query.
|
||||||
chunks (List[Any]): List of document chunks.
|
chunks (List[Any]): List of document chunks.
|
||||||
is_external_reranker (bool, optional): Whether to use an external reranker. Defaults to False.
|
rerank_strategy:the reranking option to execute.
|
||||||
provider_kwargs (dict): Additional parameters for the provider (e.g., 'api_url' for external providers).
|
provider_kwargs (dict): Additional parameters for the provider (e.g., 'api_url' for external providers).
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
RerankResponse: The reranked documents with associated
|
RerankResponse: The reranked documents with associated
|
||||||
metadata.
|
metadata.
|
||||||
"""
|
"""
|
||||||
provider = RerankerProviderFactory.get_provider(is_external_reranker, **provider_kwargs)
|
provider = RerankerProviderFactory.get_provider(
|
||||||
|
rerank_strategy, **provider_kwargs
|
||||||
|
)
|
||||||
return await RerankerService(provider)._rerank_documents(query, chunks)
|
return await RerankerService(provider)._rerank_documents(query, chunks)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -527,39 +607,48 @@ agent_config = AgentConfig(
|
||||||
"query_config": {
|
"query_config": {
|
||||||
"max_tokens_in_context": 512,
|
"max_tokens_in_context": 512,
|
||||||
"max_chunks": 10,
|
"max_chunks": 10,
|
||||||
"query_generator_config": {
|
"query_generator_config": {"type": "simple", "separator": " "},
|
||||||
"type": "simple",
|
|
||||||
"separator": " "
|
|
||||||
}
|
|
||||||
},
|
},
|
||||||
"rerank_strategy": "DEFAULT",
|
"rerank_strategy": "DEFAULT",
|
||||||
"reranker_model_id": "model_name/model_path"
|
"reranker_model_id": "model_name/model_path",
|
||||||
"rerank_config: {
|
"rerank_config": {
|
||||||
"api_url": "https://api.together.xyz/v1",
|
"api_url": "https://api.together.xyz/v1",
|
||||||
"api_key": "API_KEY",
|
"api_key": "API_KEY",
|
||||||
}
|
},
|
||||||
}
|
},
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
tool_choice="auto"
|
tool_choice="auto",
|
||||||
)
|
)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
And an example API call using cURL:
|
And an example API call using cURL:
|
||||||
```python
|
|
||||||
|
```bash
|
||||||
curl -X POST "http://localhost:8000/tool-runtime/rag-tool/rerank" \
|
curl -X POST "http://localhost:8000/tool-runtime/rag-tool/rerank" \
|
||||||
|
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
|
|
||||||
-d '{
|
-d '{
|
||||||
|
|
||||||
"query": "Find relevant documents for query text.",
|
"query": "Find relevant documents for query text.",
|
||||||
|
|
||||||
"retrieved_docs": [/* List of RAGDocument objects */],
|
"retrieved_docs": [/* List of RAGDocument objects */],
|
||||||
|
|
||||||
"top_k": 5,
|
"top_k": 5,
|
||||||
|
|
||||||
"rerank_strategy": "DEFAULT",
|
"rerank_strategy": "DEFAULT",
|
||||||
|
|
||||||
"reranker_model_id": "LlamaRank",
|
"reranker_model_id": "LlamaRank",
|
||||||
|
|
||||||
"rerank_config: {
|
"rerank_config: {
|
||||||
|
|
||||||
"api_url": "https://api.together.xyz/v1",
|
"api_url": "https://api.together.xyz/v1",
|
||||||
|
|
||||||
"api_key": "API_KEY",
|
"api_key": "API_KEY",
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -571,6 +660,7 @@ curl -X POST "http://localhost:8000/tool-runtime/rag-tool/rerank" \
|
||||||
**Method:** `POST`
|
**Method:** `POST`
|
||||||
|
|
||||||
#### Parameters:
|
#### Parameters:
|
||||||
|
|
||||||
- `content`: Input query content.
|
- `content`: Input query content.
|
||||||
- `vector_db_ids`: List of vector database identifiers.
|
- `vector_db_ids`: List of vector database identifiers.
|
||||||
- `query_config`: Optional dictionary query configuration.
|
- `query_config`: Optional dictionary query configuration.
|
||||||
|
@ -586,13 +676,13 @@ curl -X POST "http://localhost:8000/tool-runtime/rag-tool/rerank" \
|
||||||
- `api_url`: URL for the external reranking service.
|
- `api_url`: URL for the external reranking service.
|
||||||
- `api_key`: Authentication key for the external service.
|
- `api_key`: Authentication key for the external service.
|
||||||
|
|
||||||
|
|
||||||
### 5.2. Dedicated Rerank Endpoint
|
### 5.2. Dedicated Rerank Endpoint
|
||||||
|
|
||||||
**Endpoint:** `/tool-runtime/rag-tool/rerank`
|
**Endpoint:** `/tool-runtime/rag-tool/rerank`
|
||||||
**Method:** `POST`
|
**Method:** `POST`
|
||||||
|
|
||||||
#### Parameters:
|
#### Parameters:
|
||||||
|
|
||||||
- `query`: Search query content.
|
- `query`: Search query content.
|
||||||
- `retrieved_docs`: List of initially retrieved documents.
|
- `retrieved_docs`: List of initially retrieved documents.
|
||||||
- `top_k`: Number of top documents to return.
|
- `top_k`: Number of top documents to return.
|
||||||
|
@ -605,35 +695,39 @@ curl -X POST "http://localhost:8000/tool-runtime/rag-tool/rerank" \
|
||||||
- etc… - More can be added when needed.
|
- etc… - More can be added when needed.
|
||||||
- `reranker_model_id`: Identifier for the reranking model.
|
- `reranker_model_id`: Identifier for the reranking model.
|
||||||
- `rerank_config`: Optional dictionary rerank configuration.
|
- `rerank_config`: Optional dictionary rerank configuration.
|
||||||
- `is_external_reranker`: Boolean flag indicating whether the reranker provider is self-hosted (`False` by default) or an external API (`True`).
|
|
||||||
- `api_url`: URL for the external reranking service.
|
- `api_url`: URL for the external reranking service.
|
||||||
- `api_key`: Authentication key for the external service.
|
- `api_key`: Authentication key for the external service.
|
||||||
|
|
||||||
## 6. Considerations and Tradeoffs
|
## 6. Considerations and Tradeoffs
|
||||||
|
|
||||||
### Flexibility vs. Complexity
|
#### Flexibility vs. Complexity
|
||||||
|
|
||||||
- **Flexibility**: The design allows users to choose between local and external reranking solutions and even swap out the default model.
|
- **Flexibility**: The design allows users to choose between local and external reranking solutions and even swap out the default model.
|
||||||
- **Complexity**: This added flexibility introduces extra configuration options, requiring careful management of different providers and error-handling scenarios.
|
- **Complexity**: This added flexibility introduces extra configuration options, requiring careful management of different providers and error-handling scenarios.
|
||||||
|
|
||||||
### Performance vs. Latency
|
#### Performance vs. Latency
|
||||||
|
|
||||||
- **Performance Improvement**: Reranking can enhance document relevance, providing more precise context for downstream tasks.
|
- **Performance Improvement**: Reranking can enhance document relevance, providing more precise context for downstream tasks.
|
||||||
- **Latency Overhead**: The additional scoring step can introduce extra latency, especially when using external API calls or complex models.
|
- **Latency Overhead**: The additional scoring step can introduce extra latency, especially when using external API calls or complex models.
|
||||||
|
|
||||||
### Observability vs. Implementation Overhead
|
#### Observability vs. Implementation Overhead
|
||||||
|
|
||||||
- **Observability**: Detailed telemetry (e.g., raw scores, computed metrics, execution time) improves debugging and performance tuning.
|
- **Observability**: Detailed telemetry (e.g., raw scores, computed metrics, execution time) improves debugging and performance tuning.
|
||||||
- **Overhead**: Collecting and processing this telemetry data can add to system overhead and complexity.
|
- **Overhead**: Collecting and processing this telemetry data can add to system overhead and complexity.
|
||||||
|
|
||||||
### Local vs. External Provider Tradeoffs
|
#### Local vs. External Provider Tradeoffs
|
||||||
|
|
||||||
- **Local Provider**: Offers lower latency and greater control, suitable for environments with strict data privacy or low latency requirements.
|
- **Local Provider**: Offers lower latency and greater control, suitable for environments with strict data privacy or low latency requirements.
|
||||||
- **External Provider**: Enables managed, scalable inference but depends on network connectivity and may have higher operational costs or variability in response times.
|
- **External Provider**: Enables managed, scalable inference but depends on network connectivity and may have higher operational costs or variability in response times.
|
||||||
|
|
||||||
### Legal and Intellectual Property Risks
|
#### Legal and Intellectual Property Risks
|
||||||
|
|
||||||
- **Legal Uncertainty:**
|
- **Legal Uncertainty:**
|
||||||
Some reranking models, such as cross-encoders trained on MS MARCO, may be released under permissive licenses (e.g., Apache 2.0) but are trained on datasets with non-commercial use restrictions. This creates ambiguity regarding their legal use in enterprise or commercial environments.
|
Some reranking models, such as cross-encoders trained on MS MARCO, may be released under permissive licenses (e.g., Apache 2.0) but are trained on datasets with non-commercial use restrictions. This creates ambiguity regarding their legal use in enterprise or commercial environments.
|
||||||
|
|
||||||
- **Risk of Default Model Selection:**
|
- **Risk of Default Model Selection:**
|
||||||
Given potential IP concerns, it may not be advisable to provide a default reranking model. Instead:
|
Given potential IP concerns, it may not be advisable to provide a default reranking model. Instead:
|
||||||
|
|
||||||
- **User Selection:** Users should explicitly select their own reranking model to ensure compliance with their legal and licensing policies.
|
- **User Selection:** Users should explicitly select their own reranking model to ensure compliance with their legal and licensing policies.
|
||||||
- **Model-Agnostic System:** The system should remain model-agnostic, allowing integration with vetted rerankers that meet organizational requirements.
|
- **Model-Agnostic System:** The system should remain model-agnostic, allowing integration with vetted rerankers that meet organizational requirements.
|
||||||
- **Legal Disclaimer:** A legal disclaimer should be included, clarifying that users bear responsibility for verifying model licensing.
|
- **Legal Disclaimer:** A legal disclaimer should be included, clarifying that users bear responsibility for verifying model licensing.
|
||||||
|
@ -643,18 +737,16 @@ curl -X POST "http://localhost:8000/tool-runtime/rag-tool/rerank" \
|
||||||
|
|
||||||
By not enforcing a default reranker, this approach shifts responsibility to users, allowing them to make informed decisions based on their legal and compliance needs.
|
By not enforcing a default reranker, this approach shifts responsibility to users, allowing them to make informed decisions based on their legal and compliance needs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## 7. Conclusion
|
## 7. Conclusion
|
||||||
|
|
||||||
The proposed reranking mechanism addresses the shortcomings of traditional document retrieval by refining initial results to deliver more relevant and precise context for downstream tasks. By offering both external API and local inference options, the solution provides a flexible and scalable approach that can be tailored to diverse operational scenarios. With defined API endpoints and telemetry, this design lays the foundation for iterative enhancements and further collaboration, ensuring the system can evolve to meet emerging requirements.
|
The proposed reranking mechanism addresses the shortcomings of traditional document retrieval by refining initial results to deliver more relevant and precise context for downstream tasks. By offering both external API and local inference options, the solution provides a flexible and scalable approach that can be tailored to diverse operational scenarios. With defined API endpoints and telemetry, this design lays the foundation for iterative enhancements and further collaboration, ensuring the system can evolve to meet emerging requirements.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 8. Conclusion
|
## 8. Approval
|
||||||
|
|
||||||
| Person | Role | Approval Date |
|
| Person | Role | Approval Date |
|
||||||
|--------------|-------------|---------------|
|
| ----------- | -------------- | ------------- |
|
||||||
| Kevin Cogan | Author / ET IC | |
|
| Kevin Cogan | Author / ET IC | |
|
||||||
| PM | | |
|
| PM | | |
|
||||||
| Architect | | |
|
| Architect | | |
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue