mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-19 01:39:39 +00:00
feat: Enhance Vector Stores config with full configurations (#4397)
# What does this PR do? Enhances the Vector Stores config with full set of appropriate configurations - Add FileIngestionParams, ChunkRetrievalParams, and FileBatchParams subconfigs - Update RAG memory, OpenAI vector store mixin, and vector store utils to use configuration - Fix import organization across vector store components - Add comprehensive vector stores configuration documentation - Update docs navigation to include vector store configuration guide - Delete `memory/constants.py` and move constant values directly into Pydantic models ## Test Plan Tests updated + CI --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
a7d509aaf9
commit
2d149e3d2d
22 changed files with 3249 additions and 110 deletions
261
docs/docs/concepts/vector_stores_configuration.mdx
Normal file
261
docs/docs/concepts/vector_stores_configuration.mdx
Normal file
|
|
@ -0,0 +1,261 @@
|
||||||
|
# Vector Stores Configuration
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Llama Stack provides a variety of configuration options for vector stores through the `VectorStoresConfig`. This configuration allows you to customize file processing, chunk retrieval, search behavior, and performance parameters to optimize File Search and your RAG (Retrieval Augmented Generation) applications.
|
||||||
|
|
||||||
|
The configuration affects all vector store providers and operations across the entire stack, particularly the OpenAI-compatible vector store APIs.
|
||||||
|
|
||||||
|
## Configuration Structure
|
||||||
|
|
||||||
|
Vector store configuration is organized into logical subconfigs that group related settings. For example, the yaml below provides an example configuration for the Faiss provider.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
vector_stores:
|
||||||
|
default_provider_id: "faiss"
|
||||||
|
default_embedding_model:
|
||||||
|
provider_id: "sentence-transformers"
|
||||||
|
model_id: "all-MiniLM-L6-v2"
|
||||||
|
|
||||||
|
# Query rewriting for enhanced search
|
||||||
|
rewrite_query_params:
|
||||||
|
model:
|
||||||
|
provider_id: "ollama"
|
||||||
|
model_id: "llama3.2:3b-instruct-fp16"
|
||||||
|
prompt: "Rewrite this search query to improve retrieval results by expanding it with relevant synonyms and related terms: {query}"
|
||||||
|
max_tokens: 100
|
||||||
|
temperature: 0.3
|
||||||
|
|
||||||
|
# File processing during file ingestion
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512
|
||||||
|
default_chunk_overlap_tokens: 128
|
||||||
|
|
||||||
|
# Chunk retrieval and ranking during search
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5
|
||||||
|
max_tokens_in_context: 4000
|
||||||
|
default_reranker_strategy: "rrf"
|
||||||
|
rrf_impact_factor: 60.0
|
||||||
|
weighted_search_alpha: 0.5
|
||||||
|
|
||||||
|
# Batch processing performance settings
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3
|
||||||
|
file_batch_chunk_size: 10
|
||||||
|
cleanup_interval_seconds: 86400
|
||||||
|
|
||||||
|
# Tool output and prompt formatting
|
||||||
|
file_search_params:
|
||||||
|
header_template: "## Knowledge Search Results\n\nI found {num_chunks} relevant chunks:\n\n"
|
||||||
|
footer_template: "\n---\n\nEnd of search results."
|
||||||
|
|
||||||
|
context_prompt_params:
|
||||||
|
chunk_annotation_template: "**Source {index}:**\n{chunk.content}\n\n"
|
||||||
|
context_template: "Use the above information to answer: {query}"
|
||||||
|
|
||||||
|
annotation_prompt_params:
|
||||||
|
enable_annotations: true
|
||||||
|
annotation_instruction_template: "Cite sources using [Source X] format."
|
||||||
|
chunk_annotation_template: "[Source {index}] {chunk_text} (File: {file_id})"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration Sections
|
||||||
|
|
||||||
|
### File Ingestion Parameters
|
||||||
|
|
||||||
|
The `file_ingestion_params` configuration controls how files are processed during ingestion into vector stores when using `client.vector_stores.files.create()`:
|
||||||
|
|
||||||
|
#### `file_ingestion_params`
|
||||||
|
|
||||||
|
| Parameter | Type | Default | Description |
|
||||||
|
|-----------|------|---------|-------------|
|
||||||
|
| `default_chunk_size_tokens` | `int` | `512` | Default token count for file/document chunks when not explicitly specified |
|
||||||
|
| `default_chunk_overlap_tokens` | `int` | `128` | Number of tokens to overlap between chunks (original default: 512 // 4) |
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512 # Smaller chunks for precision
|
||||||
|
default_chunk_overlap_tokens: 128 # Fixed token overlap for context continuity
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use Cases:**
|
||||||
|
- **Smaller chunks (256-512)**: Better for precise factual retrieval
|
||||||
|
- **Larger chunks (800-1200)**: Better for context-heavy applications
|
||||||
|
- **Higher overlap (200-300 tokens)**: Reduces context loss at chunk boundaries
|
||||||
|
- **Lower overlap (50-100 tokens)**: More efficient storage, faster processing
|
||||||
|
|
||||||
|
### Chunk Retrieval Parameters
|
||||||
|
|
||||||
|
The `chunk_retrieval_params` controls search behavior and ranking strategies when using `client.vector_stores.search()`:
|
||||||
|
|
||||||
|
#### `chunk_retrieval_params`
|
||||||
|
|
||||||
|
| Parameter | Type | Default | Description |
|
||||||
|
|-----------|------|---------|-------------|
|
||||||
|
| `chunk_multiplier` | `int` | `5` | Over-retrieval factor for OpenAI API compatibility (affects all providers) |
|
||||||
|
| `max_tokens_in_context` | `int` | `4000` | Maximum tokens allowed in RAG context before truncation |
|
||||||
|
| `default_reranker_strategy` | `str` | `"rrf"` | Default ranking strategy: `"rrf"`, `"weighted"`, or `"normalized"` |
|
||||||
|
| `rrf_impact_factor` | `float` | `60.0` | Impact factor for Reciprocal Rank Fusion (RRF) reranking |
|
||||||
|
| `weighted_search_alpha` | `float` | `0.5` | Alpha weight for weighted search reranking (0.0-1.0) |
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5 # Retrieve 5x chunks for reranking
|
||||||
|
max_tokens_in_context: 4000 # Context window limit
|
||||||
|
default_reranker_strategy: "rrf" # Use RRF for hybrid search
|
||||||
|
rrf_impact_factor: 60.0 # RRF ranking parameter
|
||||||
|
weighted_search_alpha: 0.5 # 50/50 vector/keyword weight
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ranking Strategies:**
|
||||||
|
|
||||||
|
- **RRF (Reciprocal Rank Fusion)**: Combines vector and keyword rankings with configurable impact factor
|
||||||
|
- **Weighted**: Linear combination with adjustable alpha (0=keyword only, 1=vector only)
|
||||||
|
- **Normalized**: Normalizes scores before combination
|
||||||
|
|
||||||
|
### File Batch Parameters
|
||||||
|
|
||||||
|
The `file_batch_params` controls performance and concurrency for batch file processing when using `client.vector_stores.file_batches.*`:
|
||||||
|
|
||||||
|
#### `file_batch_params`
|
||||||
|
|
||||||
|
| Parameter | Type | Default | Description |
|
||||||
|
|-----------|------|---------|-------------|
|
||||||
|
| `max_concurrent_files_per_batch` | `int` | `3` | Maximum files processed concurrently in file batches |
|
||||||
|
| `file_batch_chunk_size` | `int` | `10` | Number of files to process in each batch chunk |
|
||||||
|
| `cleanup_interval_seconds` | `int` | `86400` | Interval for cleaning up expired file batches (24 hours) |
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3 # Process 3 files simultaneously
|
||||||
|
file_batch_chunk_size: 10 # Handle 10 files per chunk
|
||||||
|
cleanup_interval_seconds: 86400 # Clean up daily
|
||||||
|
```
|
||||||
|
|
||||||
|
**Performance Tuning:**
|
||||||
|
- **Higher concurrency**: Faster processing, more memory usage
|
||||||
|
- **Lower concurrency**: Slower processing, less resource usage
|
||||||
|
- **Larger chunk size**: Fewer iterations, more memory per iteration
|
||||||
|
- **Smaller chunk size**: More iterations, better memory distribution
|
||||||
|
|
||||||
|
## Advanced Configuration
|
||||||
|
|
||||||
|
### Default Provider and Model Settings
|
||||||
|
|
||||||
|
Set system-wide defaults for vector operations:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
vector_stores:
|
||||||
|
default_provider_id: "faiss" # Default vector store provider
|
||||||
|
default_embedding_model: # Default embedding model
|
||||||
|
provider_id: "sentence-transformers"
|
||||||
|
model_id: "all-MiniLM-L6-v2"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Query Rewriting Configuration
|
||||||
|
|
||||||
|
Enable intelligent query expansion for better search results:
|
||||||
|
|
||||||
|
#### `rewrite_query_params`
|
||||||
|
|
||||||
|
| Parameter | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| `model` | `QualifiedModel` | LLM model for query rewriting/expansion |
|
||||||
|
| `prompt` | `str` | Prompt template (must contain `{query}` placeholder) |
|
||||||
|
| `max_tokens` | `int` | Maximum tokens for expansion (1-4096) |
|
||||||
|
| `temperature` | `float` | Generation temperature (0.0-2.0) |
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
rewrite_query_params:
|
||||||
|
model:
|
||||||
|
provider_id: "meta-reference"
|
||||||
|
model_id: "llama3.2"
|
||||||
|
prompt: |
|
||||||
|
Expand this search query with related terms and synonyms for better vector search.
|
||||||
|
Keep the expansion focused and relevant.
|
||||||
|
|
||||||
|
Original query: {query}
|
||||||
|
|
||||||
|
Expanded query:
|
||||||
|
max_tokens: 100
|
||||||
|
temperature: 0.3
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: Query rewriting is optional. Omit this section to disable query expansion.
|
||||||
|
|
||||||
|
### Output Formatting Configuration
|
||||||
|
|
||||||
|
Customize how search results are formatted for RAG applications:
|
||||||
|
|
||||||
|
#### `file_search_params`
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
file_search_params:
|
||||||
|
header_template: |
|
||||||
|
## Knowledge Search Results
|
||||||
|
|
||||||
|
I found {num_chunks} relevant chunks from your knowledge base:
|
||||||
|
|
||||||
|
footer_template: |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
End of search results. Use this information to provide a comprehensive answer.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `context_prompt_params`
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
context_prompt_params:
|
||||||
|
chunk_annotation_template: |
|
||||||
|
**Source {index}:**
|
||||||
|
{chunk.content}
|
||||||
|
|
||||||
|
*Metadata: {metadata}*
|
||||||
|
|
||||||
|
context_template: |
|
||||||
|
Based on the search results above, please answer this question: {query}
|
||||||
|
|
||||||
|
Provide specific details from the sources and cite them appropriately.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `annotation_prompt_params`
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
annotation_prompt_params:
|
||||||
|
enable_annotations: true
|
||||||
|
annotation_instruction_template: |
|
||||||
|
When citing information, use the format [Source X] where X is the source number.
|
||||||
|
Always cite specific sources for factual claims.
|
||||||
|
chunk_annotation_template: |
|
||||||
|
[Source {index}] {chunk_text}
|
||||||
|
|
||||||
|
Source: {file_id}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Provider-Specific Considerations
|
||||||
|
|
||||||
|
### OpenAI-Compatible API
|
||||||
|
|
||||||
|
All configuration options affect the OpenAI-compatible vector store API:
|
||||||
|
|
||||||
|
- `chunk_multiplier` affects over-retrieval in search operations
|
||||||
|
- `file_ingestion_params` control chunking during file attachment
|
||||||
|
- `file_batch_params` control batch processing performance
|
||||||
|
|
||||||
|
### RAG Tools
|
||||||
|
|
||||||
|
The RAG tool runtime respects these configurations:
|
||||||
|
|
||||||
|
- Uses `default_chunk_size_tokens` for file insertion
|
||||||
|
- Applies `max_tokens_in_context` for context window management
|
||||||
|
- Uses formatting templates for tool output
|
||||||
|
|
||||||
|
### All Vector Store Providers
|
||||||
|
|
||||||
|
These settings apply across all vector store providers:
|
||||||
|
|
||||||
|
- **Inline providers**: FAISS, SQLite-vec, Milvus
|
||||||
|
- **Remote providers**: ChromaDB, Qdrant, Weaviate, PGVector
|
||||||
|
- **Hybrid providers**: Milvus (supports both inline and remote)
|
||||||
|
|
@ -14,7 +14,7 @@ RAG (Retrieval-Augmented Generation) tool runtime for document ingestion, chunki
|
||||||
|
|
||||||
| Field | Type | Required | Default | Description |
|
| Field | Type | Required | Default | Description |
|
||||||
|-------|------|----------|---------|-------------|
|
|-------|------|----------|---------|-------------|
|
||||||
| `vector_stores_config` | `VectorStoresConfig` | No | `default_provider_id=None default_embedding_model=None rewrite_query_params=None file_search_params=FileSearchParams(header_template='knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n', footer_template='END of knowledge_search tool results.\n') context_prompt_params=ContextPromptParams(chunk_annotation_template='Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n', context_template='The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query.{annotation_instruction}\n') annotation_prompt_params=AnnotationPromptParams(enable_annotations=True, annotation_instruction_template=" Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones.", chunk_annotation_template='[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n')` | Configuration for vector store prompt templates and behavior |
|
| `vector_stores_config` | `VectorStoresConfig` | No | `default_provider_id=None default_embedding_model=None rewrite_query_params=None file_search_params=FileSearchParams(header_template='knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n', footer_template='END of knowledge_search tool results.\n') context_prompt_params=ContextPromptParams(chunk_annotation_template='Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n', context_template='The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query. {annotation_instruction}\n') annotation_prompt_params=AnnotationPromptParams(enable_annotations=True, annotation_instruction_template="Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones.", chunk_annotation_template='[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n') file_ingestion_params=FileIngestionParams(default_chunk_size_tokens=512, default_chunk_overlap_tokens=128) chunk_retrieval_params=ChunkRetrievalParams(chunk_multiplier=5, max_tokens_in_context=4000, default_reranker_strategy='rrf', rrf_impact_factor=60.0, weighted_search_alpha=0.5) file_batch_params=FileBatchParams(max_concurrent_files_per_batch=3, file_batch_chunk_size=10, cleanup_interval_seconds=86400)` | Configuration for vector store prompt templates and behavior |
|
||||||
|
|
||||||
## Sample Configuration
|
## Sample Configuration
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -41,6 +41,15 @@ const sidebars: SidebarsConfig = {
|
||||||
'concepts/apis/api_leveling',
|
'concepts/apis/api_leveling',
|
||||||
],
|
],
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
type: 'category',
|
||||||
|
label: 'Vector Stores',
|
||||||
|
collapsed: true,
|
||||||
|
items: [
|
||||||
|
'concepts/file_operations_vector_stores',
|
||||||
|
'concepts/vector_stores_configuration',
|
||||||
|
],
|
||||||
|
},
|
||||||
'concepts/distributions',
|
'concepts/distributions',
|
||||||
'concepts/resources',
|
'concepts/resources',
|
||||||
],
|
],
|
||||||
|
|
|
||||||
|
|
@ -18,15 +18,6 @@ from llama_stack.core.storage.datatypes import (
|
||||||
StorageConfig,
|
StorageConfig,
|
||||||
)
|
)
|
||||||
from llama_stack.log import LoggingConfig
|
from llama_stack.log import LoggingConfig
|
||||||
from llama_stack.providers.utils.memory.constants import (
|
|
||||||
DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE,
|
|
||||||
DEFAULT_CHUNK_ANNOTATION_TEMPLATE,
|
|
||||||
DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE,
|
|
||||||
DEFAULT_CONTEXT_TEMPLATE,
|
|
||||||
DEFAULT_FILE_SEARCH_FOOTER_TEMPLATE,
|
|
||||||
DEFAULT_FILE_SEARCH_HEADER_TEMPLATE,
|
|
||||||
DEFAULT_QUERY_REWRITE_PROMPT,
|
|
||||||
)
|
|
||||||
from llama_stack_api import (
|
from llama_stack_api import (
|
||||||
Api,
|
Api,
|
||||||
Benchmark,
|
Benchmark,
|
||||||
|
|
@ -367,7 +358,7 @@ class RewriteQueryParams(BaseModel):
|
||||||
description="LLM model for query rewriting/expansion in vector search.",
|
description="LLM model for query rewriting/expansion in vector search.",
|
||||||
)
|
)
|
||||||
prompt: str = Field(
|
prompt: str = Field(
|
||||||
default=DEFAULT_QUERY_REWRITE_PROMPT,
|
default="Expand this query with relevant synonyms and related terms. Return only the improved query, no explanations:\n\n{query}\n\nImproved query:",
|
||||||
description="Prompt template for query rewriting. Use {query} as placeholder for the original query.",
|
description="Prompt template for query rewriting. Use {query} as placeholder for the original query.",
|
||||||
)
|
)
|
||||||
max_tokens: int = Field(
|
max_tokens: int = Field(
|
||||||
|
|
@ -407,11 +398,11 @@ class FileSearchParams(BaseModel):
|
||||||
"""Configuration for file search tool output formatting."""
|
"""Configuration for file search tool output formatting."""
|
||||||
|
|
||||||
header_template: str = Field(
|
header_template: str = Field(
|
||||||
default=DEFAULT_FILE_SEARCH_HEADER_TEMPLATE,
|
default="knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n",
|
||||||
description="Template for the header text shown before search results. Available placeholders: {num_chunks} number of chunks found.",
|
description="Template for the header text shown before search results. Available placeholders: {num_chunks} number of chunks found.",
|
||||||
)
|
)
|
||||||
footer_template: str = Field(
|
footer_template: str = Field(
|
||||||
default=DEFAULT_FILE_SEARCH_FOOTER_TEMPLATE,
|
default="END of knowledge_search tool results.\n",
|
||||||
description="Template for the footer text shown after search results.",
|
description="Template for the footer text shown after search results.",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -433,11 +424,11 @@ class ContextPromptParams(BaseModel):
|
||||||
"""Configuration for LLM prompt content and chunk formatting."""
|
"""Configuration for LLM prompt content and chunk formatting."""
|
||||||
|
|
||||||
chunk_annotation_template: str = Field(
|
chunk_annotation_template: str = Field(
|
||||||
default=DEFAULT_CHUNK_ANNOTATION_TEMPLATE,
|
default="Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n",
|
||||||
description="Template for formatting individual chunks in search results. Available placeholders: {index} 1-based chunk index, {chunk.content} chunk content, {metadata} chunk metadata dict.",
|
description="Template for formatting individual chunks in search results. Available placeholders: {index} 1-based chunk index, {chunk.content} chunk content, {metadata} chunk metadata dict.",
|
||||||
)
|
)
|
||||||
context_template: str = Field(
|
context_template: str = Field(
|
||||||
default=DEFAULT_CONTEXT_TEMPLATE,
|
default='The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query. {annotation_instruction}\n',
|
||||||
description="Template for explaining the search results to the model. Available placeholders: {query} user's query, {num_chunks} number of chunks.",
|
description="Template for explaining the search results to the model. Available placeholders: {query} user's query, {num_chunks} number of chunks.",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -470,11 +461,11 @@ class AnnotationPromptParams(BaseModel):
|
||||||
description="Whether to include annotation information in results.",
|
description="Whether to include annotation information in results.",
|
||||||
)
|
)
|
||||||
annotation_instruction_template: str = Field(
|
annotation_instruction_template: str = Field(
|
||||||
default=DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE,
|
default="Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones.",
|
||||||
description="Instructions for how the model should cite sources. Used when enable_annotations is True.",
|
description="Instructions for how the model should cite sources. Used when enable_annotations is True.",
|
||||||
)
|
)
|
||||||
chunk_annotation_template: str = Field(
|
chunk_annotation_template: str = Field(
|
||||||
default=DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE,
|
default="[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n",
|
||||||
description="Template for chunks with annotation information. Available placeholders: {index} 1-based chunk index, {metadata_text} formatted metadata, {file_id} document identifier, {chunk_text} chunk content.",
|
description="Template for chunks with annotation information. Available placeholders: {index} 1-based chunk index, {metadata_text} formatted metadata, {file_id} document identifier, {chunk_text} chunk content.",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -499,6 +490,61 @@ class AnnotationPromptParams(BaseModel):
|
||||||
return v
|
return v
|
||||||
|
|
||||||
|
|
||||||
|
class FileIngestionParams(BaseModel):
|
||||||
|
"""Configuration for file processing during ingestion."""
|
||||||
|
|
||||||
|
default_chunk_size_tokens: int = Field(
|
||||||
|
default=512,
|
||||||
|
description="Default chunk size for RAG tool operations when not specified",
|
||||||
|
)
|
||||||
|
default_chunk_overlap_tokens: int = Field(
|
||||||
|
default=128,
|
||||||
|
description="Default overlap in tokens between chunks (original default: 512 // 4 = 128)",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class ChunkRetrievalParams(BaseModel):
|
||||||
|
"""Configuration for chunk retrieval and ranking during search."""
|
||||||
|
|
||||||
|
chunk_multiplier: int = Field(
|
||||||
|
default=5,
|
||||||
|
description="Multiplier for OpenAI API over-retrieval (affects all providers)",
|
||||||
|
)
|
||||||
|
max_tokens_in_context: int = Field(
|
||||||
|
default=4000,
|
||||||
|
description="Maximum tokens allowed in RAG context before truncation",
|
||||||
|
)
|
||||||
|
default_reranker_strategy: str = Field(
|
||||||
|
default="rrf",
|
||||||
|
description="Default reranker when not specified: 'rrf', 'weighted', or 'normalized'",
|
||||||
|
)
|
||||||
|
rrf_impact_factor: float = Field(
|
||||||
|
default=60.0,
|
||||||
|
description="Impact factor for RRF (Reciprocal Rank Fusion) reranking",
|
||||||
|
)
|
||||||
|
weighted_search_alpha: float = Field(
|
||||||
|
default=0.5,
|
||||||
|
description="Alpha weight for weighted search reranking (0.0-1.0)",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FileBatchParams(BaseModel):
|
||||||
|
"""Configuration for file batch processing."""
|
||||||
|
|
||||||
|
max_concurrent_files_per_batch: int = Field(
|
||||||
|
default=3,
|
||||||
|
description="Maximum files processed concurrently in file batches",
|
||||||
|
)
|
||||||
|
file_batch_chunk_size: int = Field(
|
||||||
|
default=10,
|
||||||
|
description="Number of files to process in each batch chunk",
|
||||||
|
)
|
||||||
|
cleanup_interval_seconds: int = Field(
|
||||||
|
default=86400, # 24 hours
|
||||||
|
description="Interval for cleaning up expired file batches (seconds)",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class VectorStoresConfig(BaseModel):
|
class VectorStoresConfig(BaseModel):
|
||||||
"""Configuration for vector stores in the stack."""
|
"""Configuration for vector stores in the stack."""
|
||||||
|
|
||||||
|
|
@ -527,6 +573,19 @@ class VectorStoresConfig(BaseModel):
|
||||||
description="Configuration for source annotation and attribution features.",
|
description="Configuration for source annotation and attribution features.",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
file_ingestion_params: FileIngestionParams = Field(
|
||||||
|
default_factory=FileIngestionParams,
|
||||||
|
description="Configuration for file processing during ingestion.",
|
||||||
|
)
|
||||||
|
chunk_retrieval_params: ChunkRetrievalParams = Field(
|
||||||
|
default_factory=ChunkRetrievalParams,
|
||||||
|
description="Configuration for chunk retrieval and ranking during search.",
|
||||||
|
)
|
||||||
|
file_batch_params: FileBatchParams = Field(
|
||||||
|
default_factory=FileBatchParams,
|
||||||
|
description="Configuration for file batch processing.",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class SafetyConfig(BaseModel):
|
class SafetyConfig(BaseModel):
|
||||||
"""Configuration for default moderations model."""
|
"""Configuration for default moderations model."""
|
||||||
|
|
|
||||||
|
|
@ -11,6 +11,9 @@ def redact_sensitive_fields(data: dict[str, Any]) -> dict[str, Any]:
|
||||||
"""Redact sensitive information from config before printing."""
|
"""Redact sensitive information from config before printing."""
|
||||||
sensitive_patterns = ["api_key", "api_token", "password", "secret", "token"]
|
sensitive_patterns = ["api_key", "api_token", "password", "secret", "token"]
|
||||||
|
|
||||||
|
# Specific configuration field names that should NOT be redacted despite containing "token"
|
||||||
|
safe_token_fields = ["chunk_size_tokens", "max_tokens", "default_chunk_overlap_tokens"]
|
||||||
|
|
||||||
def _redact_value(v: Any) -> Any:
|
def _redact_value(v: Any) -> Any:
|
||||||
if isinstance(v, dict):
|
if isinstance(v, dict):
|
||||||
return _redact_dict(v)
|
return _redact_dict(v)
|
||||||
|
|
@ -21,7 +24,10 @@ def redact_sensitive_fields(data: dict[str, Any]) -> dict[str, Any]:
|
||||||
def _redact_dict(d: dict[str, Any]) -> dict[str, Any]:
|
def _redact_dict(d: dict[str, Any]) -> dict[str, Any]:
|
||||||
result = {}
|
result = {}
|
||||||
for k, v in d.items():
|
for k, v in d.items():
|
||||||
if any(pattern in k.lower() for pattern in sensitive_patterns):
|
# Don't redact if it's a safe field
|
||||||
|
if any(safe_field in k.lower() for safe_field in safe_token_fields):
|
||||||
|
result[k] = _redact_value(v)
|
||||||
|
elif any(pattern in k.lower() for pattern in sensitive_patterns):
|
||||||
result[k] = "********"
|
result[k] = "********"
|
||||||
else:
|
else:
|
||||||
result[k] = _redact_value(v)
|
result[k] = _redact_value(v)
|
||||||
|
|
|
||||||
|
|
@ -296,19 +296,32 @@ vector_stores:
|
||||||
'
|
'
|
||||||
context_template: 'The above results were retrieved to help answer the user''s
|
context_template: 'The above results were retrieved to help answer the user''s
|
||||||
query: "{query}". Use them as supporting information only in answering this
|
query: "{query}". Use them as supporting information only in answering this
|
||||||
query.{annotation_instruction}
|
query. {annotation_instruction}
|
||||||
|
|
||||||
'
|
'
|
||||||
annotation_prompt_params:
|
annotation_prompt_params:
|
||||||
enable_annotations: true
|
enable_annotations: true
|
||||||
annotation_instruction_template: ' Cite sources immediately at the end of sentences
|
annotation_instruction_template: Cite sources immediately at the end of sentences
|
||||||
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''.
|
before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
|
||||||
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
||||||
new ones.'
|
new ones.
|
||||||
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
||||||
|
|
||||||
{chunk_text}
|
{chunk_text}
|
||||||
|
|
||||||
'
|
'
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512
|
||||||
|
default_chunk_overlap_tokens: 128
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5
|
||||||
|
max_tokens_in_context: 4000
|
||||||
|
default_reranker_strategy: rrf
|
||||||
|
rrf_impact_factor: 60.0
|
||||||
|
weighted_search_alpha: 0.5
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3
|
||||||
|
file_batch_chunk_size: 10
|
||||||
|
cleanup_interval_seconds: 86400
|
||||||
safety:
|
safety:
|
||||||
default_shield_id: llama-guard
|
default_shield_id: llama-guard
|
||||||
|
|
|
||||||
|
|
@ -305,19 +305,32 @@ vector_stores:
|
||||||
'
|
'
|
||||||
context_template: 'The above results were retrieved to help answer the user''s
|
context_template: 'The above results were retrieved to help answer the user''s
|
||||||
query: "{query}". Use them as supporting information only in answering this
|
query: "{query}". Use them as supporting information only in answering this
|
||||||
query.{annotation_instruction}
|
query. {annotation_instruction}
|
||||||
|
|
||||||
'
|
'
|
||||||
annotation_prompt_params:
|
annotation_prompt_params:
|
||||||
enable_annotations: true
|
enable_annotations: true
|
||||||
annotation_instruction_template: ' Cite sources immediately at the end of sentences
|
annotation_instruction_template: Cite sources immediately at the end of sentences
|
||||||
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''.
|
before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
|
||||||
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
||||||
new ones.'
|
new ones.
|
||||||
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
||||||
|
|
||||||
{chunk_text}
|
{chunk_text}
|
||||||
|
|
||||||
'
|
'
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512
|
||||||
|
default_chunk_overlap_tokens: 128
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5
|
||||||
|
max_tokens_in_context: 4000
|
||||||
|
default_reranker_strategy: rrf
|
||||||
|
rrf_impact_factor: 60.0
|
||||||
|
weighted_search_alpha: 0.5
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3
|
||||||
|
file_batch_chunk_size: 10
|
||||||
|
cleanup_interval_seconds: 86400
|
||||||
safety:
|
safety:
|
||||||
default_shield_id: llama-guard
|
default_shield_id: llama-guard
|
||||||
|
|
|
||||||
|
|
@ -299,19 +299,32 @@ vector_stores:
|
||||||
'
|
'
|
||||||
context_template: 'The above results were retrieved to help answer the user''s
|
context_template: 'The above results were retrieved to help answer the user''s
|
||||||
query: "{query}". Use them as supporting information only in answering this
|
query: "{query}". Use them as supporting information only in answering this
|
||||||
query.{annotation_instruction}
|
query. {annotation_instruction}
|
||||||
|
|
||||||
'
|
'
|
||||||
annotation_prompt_params:
|
annotation_prompt_params:
|
||||||
enable_annotations: true
|
enable_annotations: true
|
||||||
annotation_instruction_template: ' Cite sources immediately at the end of sentences
|
annotation_instruction_template: Cite sources immediately at the end of sentences
|
||||||
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''.
|
before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
|
||||||
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
||||||
new ones.'
|
new ones.
|
||||||
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
||||||
|
|
||||||
{chunk_text}
|
{chunk_text}
|
||||||
|
|
||||||
'
|
'
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512
|
||||||
|
default_chunk_overlap_tokens: 128
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5
|
||||||
|
max_tokens_in_context: 4000
|
||||||
|
default_reranker_strategy: rrf
|
||||||
|
rrf_impact_factor: 60.0
|
||||||
|
weighted_search_alpha: 0.5
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3
|
||||||
|
file_batch_chunk_size: 10
|
||||||
|
cleanup_interval_seconds: 86400
|
||||||
safety:
|
safety:
|
||||||
default_shield_id: llama-guard
|
default_shield_id: llama-guard
|
||||||
|
|
|
||||||
|
|
@ -308,19 +308,32 @@ vector_stores:
|
||||||
'
|
'
|
||||||
context_template: 'The above results were retrieved to help answer the user''s
|
context_template: 'The above results were retrieved to help answer the user''s
|
||||||
query: "{query}". Use them as supporting information only in answering this
|
query: "{query}". Use them as supporting information only in answering this
|
||||||
query.{annotation_instruction}
|
query. {annotation_instruction}
|
||||||
|
|
||||||
'
|
'
|
||||||
annotation_prompt_params:
|
annotation_prompt_params:
|
||||||
enable_annotations: true
|
enable_annotations: true
|
||||||
annotation_instruction_template: ' Cite sources immediately at the end of sentences
|
annotation_instruction_template: Cite sources immediately at the end of sentences
|
||||||
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''.
|
before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
|
||||||
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
||||||
new ones.'
|
new ones.
|
||||||
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
||||||
|
|
||||||
{chunk_text}
|
{chunk_text}
|
||||||
|
|
||||||
'
|
'
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512
|
||||||
|
default_chunk_overlap_tokens: 128
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5
|
||||||
|
max_tokens_in_context: 4000
|
||||||
|
default_reranker_strategy: rrf
|
||||||
|
rrf_impact_factor: 60.0
|
||||||
|
weighted_search_alpha: 0.5
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3
|
||||||
|
file_batch_chunk_size: 10
|
||||||
|
cleanup_interval_seconds: 86400
|
||||||
safety:
|
safety:
|
||||||
default_shield_id: llama-guard
|
default_shield_id: llama-guard
|
||||||
|
|
|
||||||
|
|
@ -296,19 +296,32 @@ vector_stores:
|
||||||
'
|
'
|
||||||
context_template: 'The above results were retrieved to help answer the user''s
|
context_template: 'The above results were retrieved to help answer the user''s
|
||||||
query: "{query}". Use them as supporting information only in answering this
|
query: "{query}". Use them as supporting information only in answering this
|
||||||
query.{annotation_instruction}
|
query. {annotation_instruction}
|
||||||
|
|
||||||
'
|
'
|
||||||
annotation_prompt_params:
|
annotation_prompt_params:
|
||||||
enable_annotations: true
|
enable_annotations: true
|
||||||
annotation_instruction_template: ' Cite sources immediately at the end of sentences
|
annotation_instruction_template: Cite sources immediately at the end of sentences
|
||||||
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''.
|
before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
|
||||||
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
||||||
new ones.'
|
new ones.
|
||||||
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
||||||
|
|
||||||
{chunk_text}
|
{chunk_text}
|
||||||
|
|
||||||
'
|
'
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512
|
||||||
|
default_chunk_overlap_tokens: 128
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5
|
||||||
|
max_tokens_in_context: 4000
|
||||||
|
default_reranker_strategy: rrf
|
||||||
|
rrf_impact_factor: 60.0
|
||||||
|
weighted_search_alpha: 0.5
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3
|
||||||
|
file_batch_chunk_size: 10
|
||||||
|
cleanup_interval_seconds: 86400
|
||||||
safety:
|
safety:
|
||||||
default_shield_id: llama-guard
|
default_shield_id: llama-guard
|
||||||
|
|
|
||||||
|
|
@ -305,19 +305,32 @@ vector_stores:
|
||||||
'
|
'
|
||||||
context_template: 'The above results were retrieved to help answer the user''s
|
context_template: 'The above results were retrieved to help answer the user''s
|
||||||
query: "{query}". Use them as supporting information only in answering this
|
query: "{query}". Use them as supporting information only in answering this
|
||||||
query.{annotation_instruction}
|
query. {annotation_instruction}
|
||||||
|
|
||||||
'
|
'
|
||||||
annotation_prompt_params:
|
annotation_prompt_params:
|
||||||
enable_annotations: true
|
enable_annotations: true
|
||||||
annotation_instruction_template: ' Cite sources immediately at the end of sentences
|
annotation_instruction_template: Cite sources immediately at the end of sentences
|
||||||
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''.
|
before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
|
||||||
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
Do not add extra punctuation. Use only the file IDs provided, do not invent
|
||||||
new ones.'
|
new ones.
|
||||||
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
|
||||||
|
|
||||||
{chunk_text}
|
{chunk_text}
|
||||||
|
|
||||||
'
|
'
|
||||||
|
file_ingestion_params:
|
||||||
|
default_chunk_size_tokens: 512
|
||||||
|
default_chunk_overlap_tokens: 128
|
||||||
|
chunk_retrieval_params:
|
||||||
|
chunk_multiplier: 5
|
||||||
|
max_tokens_in_context: 4000
|
||||||
|
default_reranker_strategy: rrf
|
||||||
|
rrf_impact_factor: 60.0
|
||||||
|
weighted_search_alpha: 0.5
|
||||||
|
file_batch_params:
|
||||||
|
max_concurrent_files_per_batch: 3
|
||||||
|
file_batch_chunk_size: 10
|
||||||
|
cleanup_interval_seconds: 86400
|
||||||
safety:
|
safety:
|
||||||
default_shield_id: llama-guard
|
default_shield_id: llama-guard
|
||||||
|
|
|
||||||
|
|
@ -11,11 +11,8 @@ from typing import Any
|
||||||
|
|
||||||
from opentelemetry import trace
|
from opentelemetry import trace
|
||||||
|
|
||||||
|
from llama_stack.core.datatypes import VectorStoresConfig
|
||||||
from llama_stack.log import get_logger
|
from llama_stack.log import get_logger
|
||||||
from llama_stack.providers.utils.memory.constants import (
|
|
||||||
DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE,
|
|
||||||
DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE,
|
|
||||||
)
|
|
||||||
from llama_stack_api import (
|
from llama_stack_api import (
|
||||||
ImageContentItem,
|
ImageContentItem,
|
||||||
OpenAIChatCompletionContentPartImageParam,
|
OpenAIChatCompletionContentPartImageParam,
|
||||||
|
|
@ -175,8 +172,10 @@ class ToolExecutor:
|
||||||
self.vector_stores_config.annotation_prompt_params.annotation_instruction_template
|
self.vector_stores_config.annotation_prompt_params.annotation_instruction_template
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
chunk_annotation_template = DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE
|
# Use defaults from VectorStoresConfig when annotations disabled
|
||||||
annotation_instruction_template = DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE
|
default_config = VectorStoresConfig()
|
||||||
|
chunk_annotation_template = default_config.annotation_prompt_params.chunk_annotation_template
|
||||||
|
annotation_instruction_template = default_config.annotation_prompt_params.annotation_instruction_template
|
||||||
|
|
||||||
content_items = []
|
content_items = []
|
||||||
content_items.append(TextContentItem(text=header_template.format(num_chunks=len(search_results))))
|
content_items.append(TextContentItem(text=header_template.format(num_chunks=len(search_results))))
|
||||||
|
|
|
||||||
|
|
@ -116,8 +116,10 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
|
||||||
self,
|
self,
|
||||||
documents: list[RAGDocument],
|
documents: list[RAGDocument],
|
||||||
vector_store_id: str,
|
vector_store_id: str,
|
||||||
chunk_size_in_tokens: int = 512,
|
chunk_size_in_tokens: int | None = None,
|
||||||
) -> None:
|
) -> None:
|
||||||
|
if chunk_size_in_tokens is None:
|
||||||
|
chunk_size_in_tokens = self.config.vector_stores_config.file_ingestion_params.default_chunk_size_tokens
|
||||||
if not documents:
|
if not documents:
|
||||||
return
|
return
|
||||||
|
|
||||||
|
|
@ -145,10 +147,11 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
|
||||||
log.error(f"Failed to upload file for document {doc.document_id}: {e}")
|
log.error(f"Failed to upload file for document {doc.document_id}: {e}")
|
||||||
continue
|
continue
|
||||||
|
|
||||||
|
overlap_tokens = self.config.vector_stores_config.file_ingestion_params.default_chunk_overlap_tokens
|
||||||
chunking_strategy = VectorStoreChunkingStrategyStatic(
|
chunking_strategy = VectorStoreChunkingStrategyStatic(
|
||||||
static=VectorStoreChunkingStrategyStaticConfig(
|
static=VectorStoreChunkingStrategyStaticConfig(
|
||||||
max_chunk_size_tokens=chunk_size_in_tokens,
|
max_chunk_size_tokens=chunk_size_in_tokens,
|
||||||
chunk_overlap_tokens=chunk_size_in_tokens // 4,
|
chunk_overlap_tokens=overlap_tokens,
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -180,7 +183,9 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
|
||||||
"No vector DBs were provided to the knowledge search tool. Please provide at least one vector DB ID."
|
"No vector DBs were provided to the knowledge search tool. Please provide at least one vector DB ID."
|
||||||
)
|
)
|
||||||
|
|
||||||
query_config = query_config or RAGQueryConfig()
|
query_config = query_config or RAGQueryConfig(
|
||||||
|
max_tokens_in_context=self.config.vector_stores_config.chunk_retrieval_params.max_tokens_in_context
|
||||||
|
)
|
||||||
query = await generate_rag_query(
|
query = await generate_rag_query(
|
||||||
query_config.query_generator_config,
|
query_config.query_generator_config,
|
||||||
content,
|
content,
|
||||||
|
|
@ -319,7 +324,9 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
|
||||||
if query_config:
|
if query_config:
|
||||||
query_config = TypeAdapter(RAGQueryConfig).validate_python(query_config)
|
query_config = TypeAdapter(RAGQueryConfig).validate_python(query_config)
|
||||||
else:
|
else:
|
||||||
query_config = RAGQueryConfig()
|
query_config = RAGQueryConfig(
|
||||||
|
max_tokens_in_context=self.config.vector_stores_config.chunk_retrieval_params.max_tokens_in_context
|
||||||
|
)
|
||||||
|
|
||||||
query = kwargs["query"]
|
query = kwargs["query"]
|
||||||
result = await self.query(
|
result = await self.query(
|
||||||
|
|
|
||||||
|
|
@ -4,6 +4,4 @@
|
||||||
# This source code is licensed under the terms described in the LICENSE file in
|
# This source code is licensed under the terms described in the LICENSE file in
|
||||||
# the root directory of this source tree.
|
# the root directory of this source tree.
|
||||||
|
|
||||||
from .constants import DEFAULT_QUERY_REWRITE_PROMPT
|
__all__ = []
|
||||||
|
|
||||||
__all__ = ["DEFAULT_QUERY_REWRITE_PROMPT"]
|
|
||||||
|
|
|
||||||
|
|
@ -1,22 +0,0 @@
|
||||||
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
|
||||||
# All rights reserved.
|
|
||||||
#
|
|
||||||
# This source code is licensed under the terms described in the LICENSE file in
|
|
||||||
# the root directory of this source tree.
|
|
||||||
|
|
||||||
# Default prompt template for query rewriting in vector search
|
|
||||||
DEFAULT_QUERY_REWRITE_PROMPT = "Expand this query with relevant synonyms and related terms. Return only the improved query, no explanations:\n\n{query}\n\nImproved query:"
|
|
||||||
|
|
||||||
# Default templates for file search tool output formatting
|
|
||||||
DEFAULT_FILE_SEARCH_HEADER_TEMPLATE = (
|
|
||||||
"knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n"
|
|
||||||
)
|
|
||||||
DEFAULT_FILE_SEARCH_FOOTER_TEMPLATE = "END of knowledge_search tool results.\n"
|
|
||||||
|
|
||||||
# Default templates for LLM prompt content and chunk formatting
|
|
||||||
DEFAULT_CHUNK_ANNOTATION_TEMPLATE = "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n"
|
|
||||||
DEFAULT_CONTEXT_TEMPLATE = 'The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query.{annotation_instruction}\n'
|
|
||||||
|
|
||||||
# Default templates for source annotation and attribution features
|
|
||||||
DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE = " Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones."
|
|
||||||
DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE = "[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n"
|
|
||||||
|
|
@ -15,6 +15,7 @@ from typing import Annotated, Any
|
||||||
from fastapi import Body
|
from fastapi import Body
|
||||||
from pydantic import TypeAdapter
|
from pydantic import TypeAdapter
|
||||||
|
|
||||||
|
from llama_stack.core.datatypes import VectorStoresConfig
|
||||||
from llama_stack.core.id_generation import generate_object_id
|
from llama_stack.core.id_generation import generate_object_id
|
||||||
from llama_stack.log import get_logger
|
from llama_stack.log import get_logger
|
||||||
from llama_stack.providers.utils.memory.vector_store import (
|
from llama_stack.providers.utils.memory.vector_store import (
|
||||||
|
|
@ -59,10 +60,6 @@ EMBEDDING_DIMENSION = 768
|
||||||
logger = get_logger(name=__name__, category="providers::utils")
|
logger = get_logger(name=__name__, category="providers::utils")
|
||||||
|
|
||||||
# Constants for OpenAI vector stores
|
# Constants for OpenAI vector stores
|
||||||
CHUNK_MULTIPLIER = 5
|
|
||||||
FILE_BATCH_CLEANUP_INTERVAL_SECONDS = 24 * 60 * 60 # 1 day in seconds
|
|
||||||
MAX_CONCURRENT_FILES_PER_BATCH = 3 # Maximum concurrent file processing within a batch
|
|
||||||
FILE_BATCH_CHUNK_SIZE = 10 # Process files in chunks of this size
|
|
||||||
|
|
||||||
VERSION = "v3"
|
VERSION = "v3"
|
||||||
VECTOR_DBS_PREFIX = f"vector_stores:{VERSION}::"
|
VECTOR_DBS_PREFIX = f"vector_stores:{VERSION}::"
|
||||||
|
|
@ -85,11 +82,13 @@ class OpenAIVectorStoreMixin(ABC):
|
||||||
self,
|
self,
|
||||||
files_api: Files | None = None,
|
files_api: Files | None = None,
|
||||||
kvstore: KVStore | None = None,
|
kvstore: KVStore | None = None,
|
||||||
|
vector_stores_config: VectorStoresConfig | None = None,
|
||||||
):
|
):
|
||||||
self.openai_vector_stores: dict[str, dict[str, Any]] = {}
|
self.openai_vector_stores: dict[str, dict[str, Any]] = {}
|
||||||
self.openai_file_batches: dict[str, dict[str, Any]] = {}
|
self.openai_file_batches: dict[str, dict[str, Any]] = {}
|
||||||
self.files_api = files_api
|
self.files_api = files_api
|
||||||
self.kvstore = kvstore
|
self.kvstore = kvstore
|
||||||
|
self.vector_stores_config = vector_stores_config or VectorStoresConfig()
|
||||||
self._last_file_batch_cleanup_time = 0
|
self._last_file_batch_cleanup_time = 0
|
||||||
self._file_batch_tasks: dict[str, asyncio.Task[None]] = {}
|
self._file_batch_tasks: dict[str, asyncio.Task[None]] = {}
|
||||||
self._vector_store_locks: dict[str, asyncio.Lock] = {}
|
self._vector_store_locks: dict[str, asyncio.Lock] = {}
|
||||||
|
|
@ -619,7 +618,7 @@ class OpenAIVectorStoreMixin(ABC):
|
||||||
else 0.0
|
else 0.0
|
||||||
)
|
)
|
||||||
params = {
|
params = {
|
||||||
"max_chunks": max_num_results * CHUNK_MULTIPLIER,
|
"max_chunks": max_num_results * self.vector_stores_config.chunk_retrieval_params.chunk_multiplier,
|
||||||
"score_threshold": score_threshold,
|
"score_threshold": score_threshold,
|
||||||
"mode": search_mode,
|
"mode": search_mode,
|
||||||
}
|
}
|
||||||
|
|
@ -1072,7 +1071,10 @@ class OpenAIVectorStoreMixin(ABC):
|
||||||
|
|
||||||
# Run cleanup if needed (throttled to once every 1 day)
|
# Run cleanup if needed (throttled to once every 1 day)
|
||||||
current_time = int(time.time())
|
current_time = int(time.time())
|
||||||
if current_time - self._last_file_batch_cleanup_time >= FILE_BATCH_CLEANUP_INTERVAL_SECONDS:
|
if (
|
||||||
|
current_time - self._last_file_batch_cleanup_time
|
||||||
|
>= self.vector_stores_config.file_batch_params.cleanup_interval_seconds
|
||||||
|
):
|
||||||
logger.info("Running throttled cleanup of expired file batches")
|
logger.info("Running throttled cleanup of expired file batches")
|
||||||
asyncio.create_task(self._cleanup_expired_file_batches())
|
asyncio.create_task(self._cleanup_expired_file_batches())
|
||||||
self._last_file_batch_cleanup_time = current_time
|
self._last_file_batch_cleanup_time = current_time
|
||||||
|
|
@ -1089,7 +1091,7 @@ class OpenAIVectorStoreMixin(ABC):
|
||||||
batch_info: dict[str, Any],
|
batch_info: dict[str, Any],
|
||||||
) -> None:
|
) -> None:
|
||||||
"""Process files with controlled concurrency and chunking."""
|
"""Process files with controlled concurrency and chunking."""
|
||||||
semaphore = asyncio.Semaphore(MAX_CONCURRENT_FILES_PER_BATCH)
|
semaphore = asyncio.Semaphore(self.vector_stores_config.file_batch_params.max_concurrent_files_per_batch)
|
||||||
|
|
||||||
async def process_single_file(file_id: str) -> tuple[str, bool]:
|
async def process_single_file(file_id: str) -> tuple[str, bool]:
|
||||||
"""Process a single file with concurrency control."""
|
"""Process a single file with concurrency control."""
|
||||||
|
|
@ -1108,12 +1110,13 @@ class OpenAIVectorStoreMixin(ABC):
|
||||||
|
|
||||||
# Process files in chunks to avoid creating too many tasks at once
|
# Process files in chunks to avoid creating too many tasks at once
|
||||||
total_files = len(file_ids)
|
total_files = len(file_ids)
|
||||||
for chunk_start in range(0, total_files, FILE_BATCH_CHUNK_SIZE):
|
chunk_size = self.vector_stores_config.file_batch_params.file_batch_chunk_size
|
||||||
chunk_end = min(chunk_start + FILE_BATCH_CHUNK_SIZE, total_files)
|
for chunk_start in range(0, total_files, chunk_size):
|
||||||
|
chunk_end = min(chunk_start + chunk_size, total_files)
|
||||||
chunk = file_ids[chunk_start:chunk_end]
|
chunk = file_ids[chunk_start:chunk_end]
|
||||||
|
|
||||||
chunk_num = chunk_start // FILE_BATCH_CHUNK_SIZE + 1
|
chunk_num = chunk_start // chunk_size + 1
|
||||||
total_chunks = (total_files + FILE_BATCH_CHUNK_SIZE - 1) // FILE_BATCH_CHUNK_SIZE
|
total_chunks = (total_files + chunk_size - 1) // chunk_size
|
||||||
logger.info(
|
logger.info(
|
||||||
f"Processing chunk {chunk_num} of {total_chunks} ({len(chunk)} files, {chunk_start + 1}-{chunk_end} of {total_files} total files)"
|
f"Processing chunk {chunk_num} of {total_chunks} ({len(chunk)} files, {chunk_start + 1}-{chunk_end} of {total_files} total files)"
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -17,6 +17,7 @@ import numpy as np
|
||||||
from numpy.typing import NDArray
|
from numpy.typing import NDArray
|
||||||
from pydantic import BaseModel
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from llama_stack.core.datatypes import VectorStoresConfig
|
||||||
from llama_stack.log import get_logger
|
from llama_stack.log import get_logger
|
||||||
from llama_stack.models.llama.llama3.tokenizer import Tokenizer
|
from llama_stack.models.llama.llama3.tokenizer import Tokenizer
|
||||||
from llama_stack.providers.utils.inference.prompt_adapter import (
|
from llama_stack.providers.utils.inference.prompt_adapter import (
|
||||||
|
|
@ -262,6 +263,7 @@ class VectorStoreWithIndex:
|
||||||
vector_store: VectorStore
|
vector_store: VectorStore
|
||||||
index: EmbeddingIndex
|
index: EmbeddingIndex
|
||||||
inference_api: Api.inference
|
inference_api: Api.inference
|
||||||
|
vector_stores_config: VectorStoresConfig | None = None
|
||||||
|
|
||||||
async def insert_chunks(
|
async def insert_chunks(
|
||||||
self,
|
self,
|
||||||
|
|
@ -294,6 +296,8 @@ class VectorStoreWithIndex:
|
||||||
query: InterleavedContent,
|
query: InterleavedContent,
|
||||||
params: dict[str, Any] | None = None,
|
params: dict[str, Any] | None = None,
|
||||||
) -> QueryChunksResponse:
|
) -> QueryChunksResponse:
|
||||||
|
config = self.vector_stores_config or VectorStoresConfig()
|
||||||
|
|
||||||
if params is None:
|
if params is None:
|
||||||
params = {}
|
params = {}
|
||||||
k = params.get("max_chunks", 3)
|
k = params.get("max_chunks", 3)
|
||||||
|
|
@ -302,19 +306,25 @@ class VectorStoreWithIndex:
|
||||||
|
|
||||||
ranker = params.get("ranker")
|
ranker = params.get("ranker")
|
||||||
if ranker is None:
|
if ranker is None:
|
||||||
reranker_type = RERANKER_TYPE_RRF
|
reranker_type = (
|
||||||
reranker_params = {"impact_factor": 60.0}
|
RERANKER_TYPE_RRF
|
||||||
|
if config.chunk_retrieval_params.default_reranker_strategy == "rrf"
|
||||||
|
else config.chunk_retrieval_params.default_reranker_strategy
|
||||||
|
)
|
||||||
|
reranker_params = {"impact_factor": config.chunk_retrieval_params.rrf_impact_factor}
|
||||||
else:
|
else:
|
||||||
strategy = ranker.get("strategy", "rrf")
|
strategy = ranker.get("strategy", config.chunk_retrieval_params.default_reranker_strategy)
|
||||||
if strategy == "weighted":
|
if strategy == "weighted":
|
||||||
weights = ranker.get("params", {}).get("weights", [0.5, 0.5])
|
weights = ranker.get("params", {}).get("weights", [0.5, 0.5])
|
||||||
reranker_type = RERANKER_TYPE_WEIGHTED
|
reranker_type = RERANKER_TYPE_WEIGHTED
|
||||||
reranker_params = {"alpha": weights[0] if len(weights) > 0 else 0.5}
|
reranker_params = {
|
||||||
|
"alpha": weights[0] if len(weights) > 0 else config.chunk_retrieval_params.weighted_search_alpha
|
||||||
|
}
|
||||||
elif strategy == "normalized":
|
elif strategy == "normalized":
|
||||||
reranker_type = RERANKER_TYPE_NORMALIZED
|
reranker_type = RERANKER_TYPE_NORMALIZED
|
||||||
else:
|
else:
|
||||||
reranker_type = RERANKER_TYPE_RRF
|
reranker_type = RERANKER_TYPE_RRF
|
||||||
k_value = ranker.get("params", {}).get("k", 60.0)
|
k_value = ranker.get("params", {}).get("k", config.chunk_retrieval_params.rrf_impact_factor)
|
||||||
reranker_params = {"impact_factor": k_value}
|
reranker_params = {"impact_factor": k_value}
|
||||||
|
|
||||||
query_string = interleaved_content_as_str(query)
|
query_string = interleaved_content_as_str(query)
|
||||||
|
|
|
||||||
1569
tests/integration/responses/recordings/0995df80c05acd7a1c386b09d5b4520ffff5233bf1fdd222607ec879cb5bcdb1.json
generated
Normal file
1569
tests/integration/responses/recordings/0995df80c05acd7a1c386b09d5b4520ffff5233bf1fdd222607ec879cb5bcdb1.json
generated
Normal file
File diff suppressed because it is too large
Load diff
1164
tests/integration/responses/recordings/b6ea82498b4cd08dbbfec50c2bf7e20bf3f40ed0acbe79695f18c787ad0e3ed7.json
generated
Normal file
1164
tests/integration/responses/recordings/b6ea82498b4cd08dbbfec50c2bf7e20bf3f40ed0acbe79695f18c787ad0e3ed7.json
generated
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -156,7 +156,6 @@ async def test_query_rewrite_functionality():
|
||||||
from unittest.mock import MagicMock
|
from unittest.mock import MagicMock
|
||||||
|
|
||||||
from llama_stack.core.datatypes import QualifiedModel, RewriteQueryParams, VectorStoresConfig
|
from llama_stack.core.datatypes import QualifiedModel, RewriteQueryParams, VectorStoresConfig
|
||||||
from llama_stack.providers.utils.memory.constants import DEFAULT_QUERY_REWRITE_PROMPT
|
|
||||||
from llama_stack_api import VectorStoreSearchResponsePage
|
from llama_stack_api import VectorStoreSearchResponsePage
|
||||||
|
|
||||||
mock_routing_table = Mock()
|
mock_routing_table = Mock()
|
||||||
|
|
@ -197,7 +196,7 @@ async def test_query_rewrite_functionality():
|
||||||
|
|
||||||
# Verify default prompt is used
|
# Verify default prompt is used
|
||||||
prompt_text = chat_call_args.messages[0].content
|
prompt_text = chat_call_args.messages[0].content
|
||||||
expected_prompt = DEFAULT_QUERY_REWRITE_PROMPT.format(query="test query")
|
expected_prompt = "Expand this query with relevant synonyms and related terms. Return only the improved query, no explanations:\n\ntest query\n\nImproved query:"
|
||||||
assert prompt_text == expected_prompt
|
assert prompt_text == expected_prompt
|
||||||
|
|
||||||
# Verify routing table was called with rewritten query and rewrite_query=False
|
# Verify routing table was called with rewritten query and rewrite_query=False
|
||||||
|
|
|
||||||
|
|
@ -110,22 +110,23 @@ class TestOptionalArchitecture:
|
||||||
assert config.annotation_prompt_params is not None
|
assert config.annotation_prompt_params is not None
|
||||||
assert "{num_chunks}" in config.file_search_params.header_template
|
assert "{num_chunks}" in config.file_search_params.header_template
|
||||||
|
|
||||||
def test_guaranteed_defaults_match_constants(self):
|
def test_guaranteed_defaults_have_expected_values(self):
|
||||||
"""Test that guaranteed defaults match expected constant values."""
|
"""Test that guaranteed defaults have expected hardcoded values."""
|
||||||
from llama_stack.providers.utils.memory.constants import (
|
|
||||||
DEFAULT_CONTEXT_TEMPLATE,
|
|
||||||
DEFAULT_FILE_SEARCH_HEADER_TEMPLATE,
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create config with guaranteed defaults
|
# Create config with guaranteed defaults
|
||||||
config = VectorStoresConfig()
|
config = VectorStoresConfig()
|
||||||
|
|
||||||
# Verify defaults match constants
|
# Verify defaults have expected values
|
||||||
header_template = config.file_search_params.header_template
|
header_template = config.file_search_params.header_template
|
||||||
context_template = config.context_prompt_params.context_template
|
context_template = config.context_prompt_params.context_template
|
||||||
|
|
||||||
assert header_template == DEFAULT_FILE_SEARCH_HEADER_TEMPLATE
|
assert (
|
||||||
assert context_template == DEFAULT_CONTEXT_TEMPLATE
|
header_template
|
||||||
|
== "knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n"
|
||||||
|
)
|
||||||
|
assert (
|
||||||
|
context_template
|
||||||
|
== 'The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query. {annotation_instruction}\n'
|
||||||
|
)
|
||||||
|
|
||||||
# Verify templates can be formatted successfully
|
# Verify templates can be formatted successfully
|
||||||
formatted_header = header_template.format(num_chunks=3)
|
formatted_header = header_template.format(num_chunks=3)
|
||||||
|
|
|
||||||
|
|
@ -1091,13 +1091,11 @@ async def test_max_concurrent_files_per_batch(vector_io_adapter):
|
||||||
# Give time for the semaphore logic to start processing files
|
# Give time for the semaphore logic to start processing files
|
||||||
await asyncio.sleep(0.2)
|
await asyncio.sleep(0.2)
|
||||||
|
|
||||||
# Verify that only MAX_CONCURRENT_FILES_PER_BATCH files are processing concurrently
|
# Verify that only max_concurrent_files_per_batch files are processing concurrently
|
||||||
# The semaphore in _process_files_with_concurrency should limit this
|
# The semaphore in _process_files_with_concurrency should limit this
|
||||||
from llama_stack.providers.utils.memory.openai_vector_store_mixin import MAX_CONCURRENT_FILES_PER_BATCH
|
max_concurrent_files = vector_io_adapter.vector_stores_config.file_batch_params.max_concurrent_files_per_batch
|
||||||
|
|
||||||
assert active_files == MAX_CONCURRENT_FILES_PER_BATCH, (
|
assert active_files == max_concurrent_files, f"Expected {max_concurrent_files} active files, got {active_files}"
|
||||||
f"Expected {MAX_CONCURRENT_FILES_PER_BATCH} active files, got {active_files}"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Verify batch is in progress
|
# Verify batch is in progress
|
||||||
assert batch.status == "in_progress"
|
assert batch.status == "in_progress"
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue