feat: Enhance Vector Stores config with full configurations (#4397)

# What does this PR do?

Enhances the Vector Stores config with full set of appropriate
configurations
- Add FileIngestionParams, ChunkRetrievalParams, and FileBatchParams
subconfigs
- Update RAG memory, OpenAI vector store mixin, and vector store utils
to use configuration
  - Fix import organization across vector store components
  - Add comprehensive vector stores configuration documentation
  - Update docs navigation to include vector store configuration guide
- Delete `memory/constants.py` and move constant values directly into
Pydantic models

## Test Plan
Tests updated + CI

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
Francisco Javier Arceo 2025-12-17 16:56:46 -05:00 committed by GitHub
parent a7d509aaf9
commit 2d149e3d2d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
22 changed files with 3249 additions and 110 deletions

View file

@ -0,0 +1,261 @@
# Vector Stores Configuration
## Overview
Llama Stack provides a variety of configuration options for vector stores through the `VectorStoresConfig`. This configuration allows you to customize file processing, chunk retrieval, search behavior, and performance parameters to optimize File Search and your RAG (Retrieval Augmented Generation) applications.
The configuration affects all vector store providers and operations across the entire stack, particularly the OpenAI-compatible vector store APIs.
## Configuration Structure
Vector store configuration is organized into logical subconfigs that group related settings. For example, the yaml below provides an example configuration for the Faiss provider.
```yaml
vector_stores:
default_provider_id: "faiss"
default_embedding_model:
provider_id: "sentence-transformers"
model_id: "all-MiniLM-L6-v2"
# Query rewriting for enhanced search
rewrite_query_params:
model:
provider_id: "ollama"
model_id: "llama3.2:3b-instruct-fp16"
prompt: "Rewrite this search query to improve retrieval results by expanding it with relevant synonyms and related terms: {query}"
max_tokens: 100
temperature: 0.3
# File processing during file ingestion
file_ingestion_params:
default_chunk_size_tokens: 512
default_chunk_overlap_tokens: 128
# Chunk retrieval and ranking during search
chunk_retrieval_params:
chunk_multiplier: 5
max_tokens_in_context: 4000
default_reranker_strategy: "rrf"
rrf_impact_factor: 60.0
weighted_search_alpha: 0.5
# Batch processing performance settings
file_batch_params:
max_concurrent_files_per_batch: 3
file_batch_chunk_size: 10
cleanup_interval_seconds: 86400
# Tool output and prompt formatting
file_search_params:
header_template: "## Knowledge Search Results\n\nI found {num_chunks} relevant chunks:\n\n"
footer_template: "\n---\n\nEnd of search results."
context_prompt_params:
chunk_annotation_template: "**Source {index}:**\n{chunk.content}\n\n"
context_template: "Use the above information to answer: {query}"
annotation_prompt_params:
enable_annotations: true
annotation_instruction_template: "Cite sources using [Source X] format."
chunk_annotation_template: "[Source {index}] {chunk_text} (File: {file_id})"
```
## Configuration Sections
### File Ingestion Parameters
The `file_ingestion_params` configuration controls how files are processed during ingestion into vector stores when using `client.vector_stores.files.create()`:
#### `file_ingestion_params`
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `default_chunk_size_tokens` | `int` | `512` | Default token count for file/document chunks when not explicitly specified |
| `default_chunk_overlap_tokens` | `int` | `128` | Number of tokens to overlap between chunks (original default: 512 // 4) |
```yaml
file_ingestion_params:
default_chunk_size_tokens: 512 # Smaller chunks for precision
default_chunk_overlap_tokens: 128 # Fixed token overlap for context continuity
```
**Use Cases:**
- **Smaller chunks (256-512)**: Better for precise factual retrieval
- **Larger chunks (800-1200)**: Better for context-heavy applications
- **Higher overlap (200-300 tokens)**: Reduces context loss at chunk boundaries
- **Lower overlap (50-100 tokens)**: More efficient storage, faster processing
### Chunk Retrieval Parameters
The `chunk_retrieval_params` controls search behavior and ranking strategies when using `client.vector_stores.search()`:
#### `chunk_retrieval_params`
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `chunk_multiplier` | `int` | `5` | Over-retrieval factor for OpenAI API compatibility (affects all providers) |
| `max_tokens_in_context` | `int` | `4000` | Maximum tokens allowed in RAG context before truncation |
| `default_reranker_strategy` | `str` | `"rrf"` | Default ranking strategy: `"rrf"`, `"weighted"`, or `"normalized"` |
| `rrf_impact_factor` | `float` | `60.0` | Impact factor for Reciprocal Rank Fusion (RRF) reranking |
| `weighted_search_alpha` | `float` | `0.5` | Alpha weight for weighted search reranking (0.0-1.0) |
```yaml
chunk_retrieval_params:
chunk_multiplier: 5 # Retrieve 5x chunks for reranking
max_tokens_in_context: 4000 # Context window limit
default_reranker_strategy: "rrf" # Use RRF for hybrid search
rrf_impact_factor: 60.0 # RRF ranking parameter
weighted_search_alpha: 0.5 # 50/50 vector/keyword weight
```
**Ranking Strategies:**
- **RRF (Reciprocal Rank Fusion)**: Combines vector and keyword rankings with configurable impact factor
- **Weighted**: Linear combination with adjustable alpha (0=keyword only, 1=vector only)
- **Normalized**: Normalizes scores before combination
### File Batch Parameters
The `file_batch_params` controls performance and concurrency for batch file processing when using `client.vector_stores.file_batches.*`:
#### `file_batch_params`
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `max_concurrent_files_per_batch` | `int` | `3` | Maximum files processed concurrently in file batches |
| `file_batch_chunk_size` | `int` | `10` | Number of files to process in each batch chunk |
| `cleanup_interval_seconds` | `int` | `86400` | Interval for cleaning up expired file batches (24 hours) |
```yaml
file_batch_params:
max_concurrent_files_per_batch: 3 # Process 3 files simultaneously
file_batch_chunk_size: 10 # Handle 10 files per chunk
cleanup_interval_seconds: 86400 # Clean up daily
```
**Performance Tuning:**
- **Higher concurrency**: Faster processing, more memory usage
- **Lower concurrency**: Slower processing, less resource usage
- **Larger chunk size**: Fewer iterations, more memory per iteration
- **Smaller chunk size**: More iterations, better memory distribution
## Advanced Configuration
### Default Provider and Model Settings
Set system-wide defaults for vector operations:
```yaml
vector_stores:
default_provider_id: "faiss" # Default vector store provider
default_embedding_model: # Default embedding model
provider_id: "sentence-transformers"
model_id: "all-MiniLM-L6-v2"
```
### Query Rewriting Configuration
Enable intelligent query expansion for better search results:
#### `rewrite_query_params`
| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | `QualifiedModel` | LLM model for query rewriting/expansion |
| `prompt` | `str` | Prompt template (must contain `{query}` placeholder) |
| `max_tokens` | `int` | Maximum tokens for expansion (1-4096) |
| `temperature` | `float` | Generation temperature (0.0-2.0) |
```yaml
rewrite_query_params:
model:
provider_id: "meta-reference"
model_id: "llama3.2"
prompt: |
Expand this search query with related terms and synonyms for better vector search.
Keep the expansion focused and relevant.
Original query: {query}
Expanded query:
max_tokens: 100
temperature: 0.3
```
**Note**: Query rewriting is optional. Omit this section to disable query expansion.
### Output Formatting Configuration
Customize how search results are formatted for RAG applications:
#### `file_search_params`
```yaml
file_search_params:
header_template: |
## Knowledge Search Results
I found {num_chunks} relevant chunks from your knowledge base:
footer_template: |
---
End of search results. Use this information to provide a comprehensive answer.
```
#### `context_prompt_params`
```yaml
context_prompt_params:
chunk_annotation_template: |
**Source {index}:**
{chunk.content}
*Metadata: {metadata}*
context_template: |
Based on the search results above, please answer this question: {query}
Provide specific details from the sources and cite them appropriately.
```
#### `annotation_prompt_params`
```yaml
annotation_prompt_params:
enable_annotations: true
annotation_instruction_template: |
When citing information, use the format [Source X] where X is the source number.
Always cite specific sources for factual claims.
chunk_annotation_template: |
[Source {index}] {chunk_text}
Source: {file_id}
```
## Provider-Specific Considerations
### OpenAI-Compatible API
All configuration options affect the OpenAI-compatible vector store API:
- `chunk_multiplier` affects over-retrieval in search operations
- `file_ingestion_params` control chunking during file attachment
- `file_batch_params` control batch processing performance
### RAG Tools
The RAG tool runtime respects these configurations:
- Uses `default_chunk_size_tokens` for file insertion
- Applies `max_tokens_in_context` for context window management
- Uses formatting templates for tool output
### All Vector Store Providers
These settings apply across all vector store providers:
- **Inline providers**: FAISS, SQLite-vec, Milvus
- **Remote providers**: ChromaDB, Qdrant, Weaviate, PGVector
- **Hybrid providers**: Milvus (supports both inline and remote)

View file

@ -14,7 +14,7 @@ RAG (Retrieval-Augmented Generation) tool runtime for document ingestion, chunki
| Field | Type | Required | Default | Description | | Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------| |-------|------|----------|---------|-------------|
| `vector_stores_config` | `VectorStoresConfig` | No | `default_provider_id=None default_embedding_model=None rewrite_query_params=None file_search_params=FileSearchParams(header_template='knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n', footer_template='END of knowledge_search tool results.\n') context_prompt_params=ContextPromptParams(chunk_annotation_template='Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n', context_template='The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query.{annotation_instruction}\n') annotation_prompt_params=AnnotationPromptParams(enable_annotations=True, annotation_instruction_template=" Cite sources immediately at the end of sentences before punctuation, using `&lt;|file-id|&gt;` format like 'This is a fact &lt;|file-Cn3MSNn72ENTiiq11Qda4A|&gt;.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones.", chunk_annotation_template='[{index}] {metadata_text} cite as &lt;|{file_id}|&gt;\n{chunk_text}\n')` | Configuration for vector store prompt templates and behavior | | `vector_stores_config` | `VectorStoresConfig` | No | `default_provider_id=None default_embedding_model=None rewrite_query_params=None file_search_params=FileSearchParams(header_template='knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n', footer_template='END of knowledge_search tool results.\n') context_prompt_params=ContextPromptParams(chunk_annotation_template='Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n', context_template='The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query. {annotation_instruction}\n') annotation_prompt_params=AnnotationPromptParams(enable_annotations=True, annotation_instruction_template="Cite sources immediately at the end of sentences before punctuation, using `&lt;|file-id|&gt;` format like 'This is a fact &lt;|file-Cn3MSNn72ENTiiq11Qda4A|&gt;.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones.", chunk_annotation_template='[{index}] {metadata_text} cite as &lt;|{file_id}|&gt;\n{chunk_text}\n') file_ingestion_params=FileIngestionParams(default_chunk_size_tokens=512, default_chunk_overlap_tokens=128) chunk_retrieval_params=ChunkRetrievalParams(chunk_multiplier=5, max_tokens_in_context=4000, default_reranker_strategy='rrf', rrf_impact_factor=60.0, weighted_search_alpha=0.5) file_batch_params=FileBatchParams(max_concurrent_files_per_batch=3, file_batch_chunk_size=10, cleanup_interval_seconds=86400)` | Configuration for vector store prompt templates and behavior |
## Sample Configuration ## Sample Configuration

View file

@ -41,6 +41,15 @@ const sidebars: SidebarsConfig = {
'concepts/apis/api_leveling', 'concepts/apis/api_leveling',
], ],
}, },
{
type: 'category',
label: 'Vector Stores',
collapsed: true,
items: [
'concepts/file_operations_vector_stores',
'concepts/vector_stores_configuration',
],
},
'concepts/distributions', 'concepts/distributions',
'concepts/resources', 'concepts/resources',
], ],

View file

@ -18,15 +18,6 @@ from llama_stack.core.storage.datatypes import (
StorageConfig, StorageConfig,
) )
from llama_stack.log import LoggingConfig from llama_stack.log import LoggingConfig
from llama_stack.providers.utils.memory.constants import (
DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE,
DEFAULT_CHUNK_ANNOTATION_TEMPLATE,
DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE,
DEFAULT_CONTEXT_TEMPLATE,
DEFAULT_FILE_SEARCH_FOOTER_TEMPLATE,
DEFAULT_FILE_SEARCH_HEADER_TEMPLATE,
DEFAULT_QUERY_REWRITE_PROMPT,
)
from llama_stack_api import ( from llama_stack_api import (
Api, Api,
Benchmark, Benchmark,
@ -367,7 +358,7 @@ class RewriteQueryParams(BaseModel):
description="LLM model for query rewriting/expansion in vector search.", description="LLM model for query rewriting/expansion in vector search.",
) )
prompt: str = Field( prompt: str = Field(
default=DEFAULT_QUERY_REWRITE_PROMPT, default="Expand this query with relevant synonyms and related terms. Return only the improved query, no explanations:\n\n{query}\n\nImproved query:",
description="Prompt template for query rewriting. Use {query} as placeholder for the original query.", description="Prompt template for query rewriting. Use {query} as placeholder for the original query.",
) )
max_tokens: int = Field( max_tokens: int = Field(
@ -407,11 +398,11 @@ class FileSearchParams(BaseModel):
"""Configuration for file search tool output formatting.""" """Configuration for file search tool output formatting."""
header_template: str = Field( header_template: str = Field(
default=DEFAULT_FILE_SEARCH_HEADER_TEMPLATE, default="knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n",
description="Template for the header text shown before search results. Available placeholders: {num_chunks} number of chunks found.", description="Template for the header text shown before search results. Available placeholders: {num_chunks} number of chunks found.",
) )
footer_template: str = Field( footer_template: str = Field(
default=DEFAULT_FILE_SEARCH_FOOTER_TEMPLATE, default="END of knowledge_search tool results.\n",
description="Template for the footer text shown after search results.", description="Template for the footer text shown after search results.",
) )
@ -433,11 +424,11 @@ class ContextPromptParams(BaseModel):
"""Configuration for LLM prompt content and chunk formatting.""" """Configuration for LLM prompt content and chunk formatting."""
chunk_annotation_template: str = Field( chunk_annotation_template: str = Field(
default=DEFAULT_CHUNK_ANNOTATION_TEMPLATE, default="Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n",
description="Template for formatting individual chunks in search results. Available placeholders: {index} 1-based chunk index, {chunk.content} chunk content, {metadata} chunk metadata dict.", description="Template for formatting individual chunks in search results. Available placeholders: {index} 1-based chunk index, {chunk.content} chunk content, {metadata} chunk metadata dict.",
) )
context_template: str = Field( context_template: str = Field(
default=DEFAULT_CONTEXT_TEMPLATE, default='The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query. {annotation_instruction}\n',
description="Template for explaining the search results to the model. Available placeholders: {query} user's query, {num_chunks} number of chunks.", description="Template for explaining the search results to the model. Available placeholders: {query} user's query, {num_chunks} number of chunks.",
) )
@ -470,11 +461,11 @@ class AnnotationPromptParams(BaseModel):
description="Whether to include annotation information in results.", description="Whether to include annotation information in results.",
) )
annotation_instruction_template: str = Field( annotation_instruction_template: str = Field(
default=DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE, default="Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones.",
description="Instructions for how the model should cite sources. Used when enable_annotations is True.", description="Instructions for how the model should cite sources. Used when enable_annotations is True.",
) )
chunk_annotation_template: str = Field( chunk_annotation_template: str = Field(
default=DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE, default="[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n",
description="Template for chunks with annotation information. Available placeholders: {index} 1-based chunk index, {metadata_text} formatted metadata, {file_id} document identifier, {chunk_text} chunk content.", description="Template for chunks with annotation information. Available placeholders: {index} 1-based chunk index, {metadata_text} formatted metadata, {file_id} document identifier, {chunk_text} chunk content.",
) )
@ -499,6 +490,61 @@ class AnnotationPromptParams(BaseModel):
return v return v
class FileIngestionParams(BaseModel):
"""Configuration for file processing during ingestion."""
default_chunk_size_tokens: int = Field(
default=512,
description="Default chunk size for RAG tool operations when not specified",
)
default_chunk_overlap_tokens: int = Field(
default=128,
description="Default overlap in tokens between chunks (original default: 512 // 4 = 128)",
)
class ChunkRetrievalParams(BaseModel):
"""Configuration for chunk retrieval and ranking during search."""
chunk_multiplier: int = Field(
default=5,
description="Multiplier for OpenAI API over-retrieval (affects all providers)",
)
max_tokens_in_context: int = Field(
default=4000,
description="Maximum tokens allowed in RAG context before truncation",
)
default_reranker_strategy: str = Field(
default="rrf",
description="Default reranker when not specified: 'rrf', 'weighted', or 'normalized'",
)
rrf_impact_factor: float = Field(
default=60.0,
description="Impact factor for RRF (Reciprocal Rank Fusion) reranking",
)
weighted_search_alpha: float = Field(
default=0.5,
description="Alpha weight for weighted search reranking (0.0-1.0)",
)
class FileBatchParams(BaseModel):
"""Configuration for file batch processing."""
max_concurrent_files_per_batch: int = Field(
default=3,
description="Maximum files processed concurrently in file batches",
)
file_batch_chunk_size: int = Field(
default=10,
description="Number of files to process in each batch chunk",
)
cleanup_interval_seconds: int = Field(
default=86400, # 24 hours
description="Interval for cleaning up expired file batches (seconds)",
)
class VectorStoresConfig(BaseModel): class VectorStoresConfig(BaseModel):
"""Configuration for vector stores in the stack.""" """Configuration for vector stores in the stack."""
@ -527,6 +573,19 @@ class VectorStoresConfig(BaseModel):
description="Configuration for source annotation and attribution features.", description="Configuration for source annotation and attribution features.",
) )
file_ingestion_params: FileIngestionParams = Field(
default_factory=FileIngestionParams,
description="Configuration for file processing during ingestion.",
)
chunk_retrieval_params: ChunkRetrievalParams = Field(
default_factory=ChunkRetrievalParams,
description="Configuration for chunk retrieval and ranking during search.",
)
file_batch_params: FileBatchParams = Field(
default_factory=FileBatchParams,
description="Configuration for file batch processing.",
)
class SafetyConfig(BaseModel): class SafetyConfig(BaseModel):
"""Configuration for default moderations model.""" """Configuration for default moderations model."""

View file

@ -11,6 +11,9 @@ def redact_sensitive_fields(data: dict[str, Any]) -> dict[str, Any]:
"""Redact sensitive information from config before printing.""" """Redact sensitive information from config before printing."""
sensitive_patterns = ["api_key", "api_token", "password", "secret", "token"] sensitive_patterns = ["api_key", "api_token", "password", "secret", "token"]
# Specific configuration field names that should NOT be redacted despite containing "token"
safe_token_fields = ["chunk_size_tokens", "max_tokens", "default_chunk_overlap_tokens"]
def _redact_value(v: Any) -> Any: def _redact_value(v: Any) -> Any:
if isinstance(v, dict): if isinstance(v, dict):
return _redact_dict(v) return _redact_dict(v)
@ -21,7 +24,10 @@ def redact_sensitive_fields(data: dict[str, Any]) -> dict[str, Any]:
def _redact_dict(d: dict[str, Any]) -> dict[str, Any]: def _redact_dict(d: dict[str, Any]) -> dict[str, Any]:
result = {} result = {}
for k, v in d.items(): for k, v in d.items():
if any(pattern in k.lower() for pattern in sensitive_patterns): # Don't redact if it's a safe field
if any(safe_field in k.lower() for safe_field in safe_token_fields):
result[k] = _redact_value(v)
elif any(pattern in k.lower() for pattern in sensitive_patterns):
result[k] = "********" result[k] = "********"
else: else:
result[k] = _redact_value(v) result[k] = _redact_value(v)

View file

@ -296,19 +296,32 @@ vector_stores:
' '
context_template: 'The above results were retrieved to help answer the user''s context_template: 'The above results were retrieved to help answer the user''s
query: "{query}". Use them as supporting information only in answering this query: "{query}". Use them as supporting information only in answering this
query.{annotation_instruction} query. {annotation_instruction}
' '
annotation_prompt_params: annotation_prompt_params:
enable_annotations: true enable_annotations: true
annotation_instruction_template: ' Cite sources immediately at the end of sentences annotation_instruction_template: Cite sources immediately at the end of sentences
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''. before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
Do not add extra punctuation. Use only the file IDs provided, do not invent Do not add extra punctuation. Use only the file IDs provided, do not invent
new ones.' new ones.
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|> chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
{chunk_text} {chunk_text}
' '
file_ingestion_params:
default_chunk_size_tokens: 512
default_chunk_overlap_tokens: 128
chunk_retrieval_params:
chunk_multiplier: 5
max_tokens_in_context: 4000
default_reranker_strategy: rrf
rrf_impact_factor: 60.0
weighted_search_alpha: 0.5
file_batch_params:
max_concurrent_files_per_batch: 3
file_batch_chunk_size: 10
cleanup_interval_seconds: 86400
safety: safety:
default_shield_id: llama-guard default_shield_id: llama-guard

View file

@ -305,19 +305,32 @@ vector_stores:
' '
context_template: 'The above results were retrieved to help answer the user''s context_template: 'The above results were retrieved to help answer the user''s
query: "{query}". Use them as supporting information only in answering this query: "{query}". Use them as supporting information only in answering this
query.{annotation_instruction} query. {annotation_instruction}
' '
annotation_prompt_params: annotation_prompt_params:
enable_annotations: true enable_annotations: true
annotation_instruction_template: ' Cite sources immediately at the end of sentences annotation_instruction_template: Cite sources immediately at the end of sentences
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''. before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
Do not add extra punctuation. Use only the file IDs provided, do not invent Do not add extra punctuation. Use only the file IDs provided, do not invent
new ones.' new ones.
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|> chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
{chunk_text} {chunk_text}
' '
file_ingestion_params:
default_chunk_size_tokens: 512
default_chunk_overlap_tokens: 128
chunk_retrieval_params:
chunk_multiplier: 5
max_tokens_in_context: 4000
default_reranker_strategy: rrf
rrf_impact_factor: 60.0
weighted_search_alpha: 0.5
file_batch_params:
max_concurrent_files_per_batch: 3
file_batch_chunk_size: 10
cleanup_interval_seconds: 86400
safety: safety:
default_shield_id: llama-guard default_shield_id: llama-guard

View file

@ -299,19 +299,32 @@ vector_stores:
' '
context_template: 'The above results were retrieved to help answer the user''s context_template: 'The above results were retrieved to help answer the user''s
query: "{query}". Use them as supporting information only in answering this query: "{query}". Use them as supporting information only in answering this
query.{annotation_instruction} query. {annotation_instruction}
' '
annotation_prompt_params: annotation_prompt_params:
enable_annotations: true enable_annotations: true
annotation_instruction_template: ' Cite sources immediately at the end of sentences annotation_instruction_template: Cite sources immediately at the end of sentences
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''. before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
Do not add extra punctuation. Use only the file IDs provided, do not invent Do not add extra punctuation. Use only the file IDs provided, do not invent
new ones.' new ones.
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|> chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
{chunk_text} {chunk_text}
' '
file_ingestion_params:
default_chunk_size_tokens: 512
default_chunk_overlap_tokens: 128
chunk_retrieval_params:
chunk_multiplier: 5
max_tokens_in_context: 4000
default_reranker_strategy: rrf
rrf_impact_factor: 60.0
weighted_search_alpha: 0.5
file_batch_params:
max_concurrent_files_per_batch: 3
file_batch_chunk_size: 10
cleanup_interval_seconds: 86400
safety: safety:
default_shield_id: llama-guard default_shield_id: llama-guard

View file

@ -308,19 +308,32 @@ vector_stores:
' '
context_template: 'The above results were retrieved to help answer the user''s context_template: 'The above results were retrieved to help answer the user''s
query: "{query}". Use them as supporting information only in answering this query: "{query}". Use them as supporting information only in answering this
query.{annotation_instruction} query. {annotation_instruction}
' '
annotation_prompt_params: annotation_prompt_params:
enable_annotations: true enable_annotations: true
annotation_instruction_template: ' Cite sources immediately at the end of sentences annotation_instruction_template: Cite sources immediately at the end of sentences
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''. before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
Do not add extra punctuation. Use only the file IDs provided, do not invent Do not add extra punctuation. Use only the file IDs provided, do not invent
new ones.' new ones.
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|> chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
{chunk_text} {chunk_text}
' '
file_ingestion_params:
default_chunk_size_tokens: 512
default_chunk_overlap_tokens: 128
chunk_retrieval_params:
chunk_multiplier: 5
max_tokens_in_context: 4000
default_reranker_strategy: rrf
rrf_impact_factor: 60.0
weighted_search_alpha: 0.5
file_batch_params:
max_concurrent_files_per_batch: 3
file_batch_chunk_size: 10
cleanup_interval_seconds: 86400
safety: safety:
default_shield_id: llama-guard default_shield_id: llama-guard

View file

@ -296,19 +296,32 @@ vector_stores:
' '
context_template: 'The above results were retrieved to help answer the user''s context_template: 'The above results were retrieved to help answer the user''s
query: "{query}". Use them as supporting information only in answering this query: "{query}". Use them as supporting information only in answering this
query.{annotation_instruction} query. {annotation_instruction}
' '
annotation_prompt_params: annotation_prompt_params:
enable_annotations: true enable_annotations: true
annotation_instruction_template: ' Cite sources immediately at the end of sentences annotation_instruction_template: Cite sources immediately at the end of sentences
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''. before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
Do not add extra punctuation. Use only the file IDs provided, do not invent Do not add extra punctuation. Use only the file IDs provided, do not invent
new ones.' new ones.
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|> chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
{chunk_text} {chunk_text}
' '
file_ingestion_params:
default_chunk_size_tokens: 512
default_chunk_overlap_tokens: 128
chunk_retrieval_params:
chunk_multiplier: 5
max_tokens_in_context: 4000
default_reranker_strategy: rrf
rrf_impact_factor: 60.0
weighted_search_alpha: 0.5
file_batch_params:
max_concurrent_files_per_batch: 3
file_batch_chunk_size: 10
cleanup_interval_seconds: 86400
safety: safety:
default_shield_id: llama-guard default_shield_id: llama-guard

View file

@ -305,19 +305,32 @@ vector_stores:
' '
context_template: 'The above results were retrieved to help answer the user''s context_template: 'The above results were retrieved to help answer the user''s
query: "{query}". Use them as supporting information only in answering this query: "{query}". Use them as supporting information only in answering this
query.{annotation_instruction} query. {annotation_instruction}
' '
annotation_prompt_params: annotation_prompt_params:
enable_annotations: true enable_annotations: true
annotation_instruction_template: ' Cite sources immediately at the end of sentences annotation_instruction_template: Cite sources immediately at the end of sentences
before punctuation, using `<|file-id|>` format like ''This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.''. before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'.
Do not add extra punctuation. Use only the file IDs provided, do not invent Do not add extra punctuation. Use only the file IDs provided, do not invent
new ones.' new ones.
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|> chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>
{chunk_text} {chunk_text}
' '
file_ingestion_params:
default_chunk_size_tokens: 512
default_chunk_overlap_tokens: 128
chunk_retrieval_params:
chunk_multiplier: 5
max_tokens_in_context: 4000
default_reranker_strategy: rrf
rrf_impact_factor: 60.0
weighted_search_alpha: 0.5
file_batch_params:
max_concurrent_files_per_batch: 3
file_batch_chunk_size: 10
cleanup_interval_seconds: 86400
safety: safety:
default_shield_id: llama-guard default_shield_id: llama-guard

View file

@ -11,11 +11,8 @@ from typing import Any
from opentelemetry import trace from opentelemetry import trace
from llama_stack.core.datatypes import VectorStoresConfig
from llama_stack.log import get_logger from llama_stack.log import get_logger
from llama_stack.providers.utils.memory.constants import (
DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE,
DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE,
)
from llama_stack_api import ( from llama_stack_api import (
ImageContentItem, ImageContentItem,
OpenAIChatCompletionContentPartImageParam, OpenAIChatCompletionContentPartImageParam,
@ -175,8 +172,10 @@ class ToolExecutor:
self.vector_stores_config.annotation_prompt_params.annotation_instruction_template self.vector_stores_config.annotation_prompt_params.annotation_instruction_template
) )
else: else:
chunk_annotation_template = DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE # Use defaults from VectorStoresConfig when annotations disabled
annotation_instruction_template = DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE default_config = VectorStoresConfig()
chunk_annotation_template = default_config.annotation_prompt_params.chunk_annotation_template
annotation_instruction_template = default_config.annotation_prompt_params.annotation_instruction_template
content_items = [] content_items = []
content_items.append(TextContentItem(text=header_template.format(num_chunks=len(search_results)))) content_items.append(TextContentItem(text=header_template.format(num_chunks=len(search_results))))

View file

@ -116,8 +116,10 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
self, self,
documents: list[RAGDocument], documents: list[RAGDocument],
vector_store_id: str, vector_store_id: str,
chunk_size_in_tokens: int = 512, chunk_size_in_tokens: int | None = None,
) -> None: ) -> None:
if chunk_size_in_tokens is None:
chunk_size_in_tokens = self.config.vector_stores_config.file_ingestion_params.default_chunk_size_tokens
if not documents: if not documents:
return return
@ -145,10 +147,11 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
log.error(f"Failed to upload file for document {doc.document_id}: {e}") log.error(f"Failed to upload file for document {doc.document_id}: {e}")
continue continue
overlap_tokens = self.config.vector_stores_config.file_ingestion_params.default_chunk_overlap_tokens
chunking_strategy = VectorStoreChunkingStrategyStatic( chunking_strategy = VectorStoreChunkingStrategyStatic(
static=VectorStoreChunkingStrategyStaticConfig( static=VectorStoreChunkingStrategyStaticConfig(
max_chunk_size_tokens=chunk_size_in_tokens, max_chunk_size_tokens=chunk_size_in_tokens,
chunk_overlap_tokens=chunk_size_in_tokens // 4, chunk_overlap_tokens=overlap_tokens,
) )
) )
@ -180,7 +183,9 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
"No vector DBs were provided to the knowledge search tool. Please provide at least one vector DB ID." "No vector DBs were provided to the knowledge search tool. Please provide at least one vector DB ID."
) )
query_config = query_config or RAGQueryConfig() query_config = query_config or RAGQueryConfig(
max_tokens_in_context=self.config.vector_stores_config.chunk_retrieval_params.max_tokens_in_context
)
query = await generate_rag_query( query = await generate_rag_query(
query_config.query_generator_config, query_config.query_generator_config,
content, content,
@ -319,7 +324,9 @@ class MemoryToolRuntimeImpl(ToolGroupsProtocolPrivate, ToolRuntime):
if query_config: if query_config:
query_config = TypeAdapter(RAGQueryConfig).validate_python(query_config) query_config = TypeAdapter(RAGQueryConfig).validate_python(query_config)
else: else:
query_config = RAGQueryConfig() query_config = RAGQueryConfig(
max_tokens_in_context=self.config.vector_stores_config.chunk_retrieval_params.max_tokens_in_context
)
query = kwargs["query"] query = kwargs["query"]
result = await self.query( result = await self.query(

View file

@ -4,6 +4,4 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from .constants import DEFAULT_QUERY_REWRITE_PROMPT __all__ = []
__all__ = ["DEFAULT_QUERY_REWRITE_PROMPT"]

View file

@ -1,22 +0,0 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Default prompt template for query rewriting in vector search
DEFAULT_QUERY_REWRITE_PROMPT = "Expand this query with relevant synonyms and related terms. Return only the improved query, no explanations:\n\n{query}\n\nImproved query:"
# Default templates for file search tool output formatting
DEFAULT_FILE_SEARCH_HEADER_TEMPLATE = (
"knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n"
)
DEFAULT_FILE_SEARCH_FOOTER_TEMPLATE = "END of knowledge_search tool results.\n"
# Default templates for LLM prompt content and chunk formatting
DEFAULT_CHUNK_ANNOTATION_TEMPLATE = "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n"
DEFAULT_CONTEXT_TEMPLATE = 'The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query.{annotation_instruction}\n'
# Default templates for source annotation and attribution features
DEFAULT_ANNOTATION_INSTRUCTION_TEMPLATE = " Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like 'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.'. Do not add extra punctuation. Use only the file IDs provided, do not invent new ones."
DEFAULT_CHUNK_WITH_SOURCES_TEMPLATE = "[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n"

View file

@ -15,6 +15,7 @@ from typing import Annotated, Any
from fastapi import Body from fastapi import Body
from pydantic import TypeAdapter from pydantic import TypeAdapter
from llama_stack.core.datatypes import VectorStoresConfig
from llama_stack.core.id_generation import generate_object_id from llama_stack.core.id_generation import generate_object_id
from llama_stack.log import get_logger from llama_stack.log import get_logger
from llama_stack.providers.utils.memory.vector_store import ( from llama_stack.providers.utils.memory.vector_store import (
@ -59,10 +60,6 @@ EMBEDDING_DIMENSION = 768
logger = get_logger(name=__name__, category="providers::utils") logger = get_logger(name=__name__, category="providers::utils")
# Constants for OpenAI vector stores # Constants for OpenAI vector stores
CHUNK_MULTIPLIER = 5
FILE_BATCH_CLEANUP_INTERVAL_SECONDS = 24 * 60 * 60 # 1 day in seconds
MAX_CONCURRENT_FILES_PER_BATCH = 3 # Maximum concurrent file processing within a batch
FILE_BATCH_CHUNK_SIZE = 10 # Process files in chunks of this size
VERSION = "v3" VERSION = "v3"
VECTOR_DBS_PREFIX = f"vector_stores:{VERSION}::" VECTOR_DBS_PREFIX = f"vector_stores:{VERSION}::"
@ -85,11 +82,13 @@ class OpenAIVectorStoreMixin(ABC):
self, self,
files_api: Files | None = None, files_api: Files | None = None,
kvstore: KVStore | None = None, kvstore: KVStore | None = None,
vector_stores_config: VectorStoresConfig | None = None,
): ):
self.openai_vector_stores: dict[str, dict[str, Any]] = {} self.openai_vector_stores: dict[str, dict[str, Any]] = {}
self.openai_file_batches: dict[str, dict[str, Any]] = {} self.openai_file_batches: dict[str, dict[str, Any]] = {}
self.files_api = files_api self.files_api = files_api
self.kvstore = kvstore self.kvstore = kvstore
self.vector_stores_config = vector_stores_config or VectorStoresConfig()
self._last_file_batch_cleanup_time = 0 self._last_file_batch_cleanup_time = 0
self._file_batch_tasks: dict[str, asyncio.Task[None]] = {} self._file_batch_tasks: dict[str, asyncio.Task[None]] = {}
self._vector_store_locks: dict[str, asyncio.Lock] = {} self._vector_store_locks: dict[str, asyncio.Lock] = {}
@ -619,7 +618,7 @@ class OpenAIVectorStoreMixin(ABC):
else 0.0 else 0.0
) )
params = { params = {
"max_chunks": max_num_results * CHUNK_MULTIPLIER, "max_chunks": max_num_results * self.vector_stores_config.chunk_retrieval_params.chunk_multiplier,
"score_threshold": score_threshold, "score_threshold": score_threshold,
"mode": search_mode, "mode": search_mode,
} }
@ -1072,7 +1071,10 @@ class OpenAIVectorStoreMixin(ABC):
# Run cleanup if needed (throttled to once every 1 day) # Run cleanup if needed (throttled to once every 1 day)
current_time = int(time.time()) current_time = int(time.time())
if current_time - self._last_file_batch_cleanup_time >= FILE_BATCH_CLEANUP_INTERVAL_SECONDS: if (
current_time - self._last_file_batch_cleanup_time
>= self.vector_stores_config.file_batch_params.cleanup_interval_seconds
):
logger.info("Running throttled cleanup of expired file batches") logger.info("Running throttled cleanup of expired file batches")
asyncio.create_task(self._cleanup_expired_file_batches()) asyncio.create_task(self._cleanup_expired_file_batches())
self._last_file_batch_cleanup_time = current_time self._last_file_batch_cleanup_time = current_time
@ -1089,7 +1091,7 @@ class OpenAIVectorStoreMixin(ABC):
batch_info: dict[str, Any], batch_info: dict[str, Any],
) -> None: ) -> None:
"""Process files with controlled concurrency and chunking.""" """Process files with controlled concurrency and chunking."""
semaphore = asyncio.Semaphore(MAX_CONCURRENT_FILES_PER_BATCH) semaphore = asyncio.Semaphore(self.vector_stores_config.file_batch_params.max_concurrent_files_per_batch)
async def process_single_file(file_id: str) -> tuple[str, bool]: async def process_single_file(file_id: str) -> tuple[str, bool]:
"""Process a single file with concurrency control.""" """Process a single file with concurrency control."""
@ -1108,12 +1110,13 @@ class OpenAIVectorStoreMixin(ABC):
# Process files in chunks to avoid creating too many tasks at once # Process files in chunks to avoid creating too many tasks at once
total_files = len(file_ids) total_files = len(file_ids)
for chunk_start in range(0, total_files, FILE_BATCH_CHUNK_SIZE): chunk_size = self.vector_stores_config.file_batch_params.file_batch_chunk_size
chunk_end = min(chunk_start + FILE_BATCH_CHUNK_SIZE, total_files) for chunk_start in range(0, total_files, chunk_size):
chunk_end = min(chunk_start + chunk_size, total_files)
chunk = file_ids[chunk_start:chunk_end] chunk = file_ids[chunk_start:chunk_end]
chunk_num = chunk_start // FILE_BATCH_CHUNK_SIZE + 1 chunk_num = chunk_start // chunk_size + 1
total_chunks = (total_files + FILE_BATCH_CHUNK_SIZE - 1) // FILE_BATCH_CHUNK_SIZE total_chunks = (total_files + chunk_size - 1) // chunk_size
logger.info( logger.info(
f"Processing chunk {chunk_num} of {total_chunks} ({len(chunk)} files, {chunk_start + 1}-{chunk_end} of {total_files} total files)" f"Processing chunk {chunk_num} of {total_chunks} ({len(chunk)} files, {chunk_start + 1}-{chunk_end} of {total_files} total files)"
) )

View file

@ -17,6 +17,7 @@ import numpy as np
from numpy.typing import NDArray from numpy.typing import NDArray
from pydantic import BaseModel from pydantic import BaseModel
from llama_stack.core.datatypes import VectorStoresConfig
from llama_stack.log import get_logger from llama_stack.log import get_logger
from llama_stack.models.llama.llama3.tokenizer import Tokenizer from llama_stack.models.llama.llama3.tokenizer import Tokenizer
from llama_stack.providers.utils.inference.prompt_adapter import ( from llama_stack.providers.utils.inference.prompt_adapter import (
@ -262,6 +263,7 @@ class VectorStoreWithIndex:
vector_store: VectorStore vector_store: VectorStore
index: EmbeddingIndex index: EmbeddingIndex
inference_api: Api.inference inference_api: Api.inference
vector_stores_config: VectorStoresConfig | None = None
async def insert_chunks( async def insert_chunks(
self, self,
@ -294,6 +296,8 @@ class VectorStoreWithIndex:
query: InterleavedContent, query: InterleavedContent,
params: dict[str, Any] | None = None, params: dict[str, Any] | None = None,
) -> QueryChunksResponse: ) -> QueryChunksResponse:
config = self.vector_stores_config or VectorStoresConfig()
if params is None: if params is None:
params = {} params = {}
k = params.get("max_chunks", 3) k = params.get("max_chunks", 3)
@ -302,19 +306,25 @@ class VectorStoreWithIndex:
ranker = params.get("ranker") ranker = params.get("ranker")
if ranker is None: if ranker is None:
reranker_type = RERANKER_TYPE_RRF reranker_type = (
reranker_params = {"impact_factor": 60.0} RERANKER_TYPE_RRF
if config.chunk_retrieval_params.default_reranker_strategy == "rrf"
else config.chunk_retrieval_params.default_reranker_strategy
)
reranker_params = {"impact_factor": config.chunk_retrieval_params.rrf_impact_factor}
else: else:
strategy = ranker.get("strategy", "rrf") strategy = ranker.get("strategy", config.chunk_retrieval_params.default_reranker_strategy)
if strategy == "weighted": if strategy == "weighted":
weights = ranker.get("params", {}).get("weights", [0.5, 0.5]) weights = ranker.get("params", {}).get("weights", [0.5, 0.5])
reranker_type = RERANKER_TYPE_WEIGHTED reranker_type = RERANKER_TYPE_WEIGHTED
reranker_params = {"alpha": weights[0] if len(weights) > 0 else 0.5} reranker_params = {
"alpha": weights[0] if len(weights) > 0 else config.chunk_retrieval_params.weighted_search_alpha
}
elif strategy == "normalized": elif strategy == "normalized":
reranker_type = RERANKER_TYPE_NORMALIZED reranker_type = RERANKER_TYPE_NORMALIZED
else: else:
reranker_type = RERANKER_TYPE_RRF reranker_type = RERANKER_TYPE_RRF
k_value = ranker.get("params", {}).get("k", 60.0) k_value = ranker.get("params", {}).get("k", config.chunk_retrieval_params.rrf_impact_factor)
reranker_params = {"impact_factor": k_value} reranker_params = {"impact_factor": k_value}
query_string = interleaved_content_as_str(query) query_string = interleaved_content_as_str(query)

View file

@ -156,7 +156,6 @@ async def test_query_rewrite_functionality():
from unittest.mock import MagicMock from unittest.mock import MagicMock
from llama_stack.core.datatypes import QualifiedModel, RewriteQueryParams, VectorStoresConfig from llama_stack.core.datatypes import QualifiedModel, RewriteQueryParams, VectorStoresConfig
from llama_stack.providers.utils.memory.constants import DEFAULT_QUERY_REWRITE_PROMPT
from llama_stack_api import VectorStoreSearchResponsePage from llama_stack_api import VectorStoreSearchResponsePage
mock_routing_table = Mock() mock_routing_table = Mock()
@ -197,7 +196,7 @@ async def test_query_rewrite_functionality():
# Verify default prompt is used # Verify default prompt is used
prompt_text = chat_call_args.messages[0].content prompt_text = chat_call_args.messages[0].content
expected_prompt = DEFAULT_QUERY_REWRITE_PROMPT.format(query="test query") expected_prompt = "Expand this query with relevant synonyms and related terms. Return only the improved query, no explanations:\n\ntest query\n\nImproved query:"
assert prompt_text == expected_prompt assert prompt_text == expected_prompt
# Verify routing table was called with rewritten query and rewrite_query=False # Verify routing table was called with rewritten query and rewrite_query=False

View file

@ -110,22 +110,23 @@ class TestOptionalArchitecture:
assert config.annotation_prompt_params is not None assert config.annotation_prompt_params is not None
assert "{num_chunks}" in config.file_search_params.header_template assert "{num_chunks}" in config.file_search_params.header_template
def test_guaranteed_defaults_match_constants(self): def test_guaranteed_defaults_have_expected_values(self):
"""Test that guaranteed defaults match expected constant values.""" """Test that guaranteed defaults have expected hardcoded values."""
from llama_stack.providers.utils.memory.constants import (
DEFAULT_CONTEXT_TEMPLATE,
DEFAULT_FILE_SEARCH_HEADER_TEMPLATE,
)
# Create config with guaranteed defaults # Create config with guaranteed defaults
config = VectorStoresConfig() config = VectorStoresConfig()
# Verify defaults match constants # Verify defaults have expected values
header_template = config.file_search_params.header_template header_template = config.file_search_params.header_template
context_template = config.context_prompt_params.context_template context_template = config.context_prompt_params.context_template
assert header_template == DEFAULT_FILE_SEARCH_HEADER_TEMPLATE assert (
assert context_template == DEFAULT_CONTEXT_TEMPLATE header_template
== "knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n"
)
assert (
context_template
== 'The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query. {annotation_instruction}\n'
)
# Verify templates can be formatted successfully # Verify templates can be formatted successfully
formatted_header = header_template.format(num_chunks=3) formatted_header = header_template.format(num_chunks=3)

View file

@ -1091,13 +1091,11 @@ async def test_max_concurrent_files_per_batch(vector_io_adapter):
# Give time for the semaphore logic to start processing files # Give time for the semaphore logic to start processing files
await asyncio.sleep(0.2) await asyncio.sleep(0.2)
# Verify that only MAX_CONCURRENT_FILES_PER_BATCH files are processing concurrently # Verify that only max_concurrent_files_per_batch files are processing concurrently
# The semaphore in _process_files_with_concurrency should limit this # The semaphore in _process_files_with_concurrency should limit this
from llama_stack.providers.utils.memory.openai_vector_store_mixin import MAX_CONCURRENT_FILES_PER_BATCH max_concurrent_files = vector_io_adapter.vector_stores_config.file_batch_params.max_concurrent_files_per_batch
assert active_files == MAX_CONCURRENT_FILES_PER_BATCH, ( assert active_files == max_concurrent_files, f"Expected {max_concurrent_files} active files, got {active_files}"
f"Expected {MAX_CONCURRENT_FILES_PER_BATCH} active files, got {active_files}"
)
# Verify batch is in progress # Verify batch is in progress
assert batch.status == "in_progress" assert batch.status == "in_progress"