fix: Pydantic validation error with list-type metadata in vector search (#3797) (#4173)

# Fix for Issue #3797

## Problem
Vector store search failed with Pydantic ValidationError when chunk
metadata contained list-type values.

**Error:**
```
ValidationError: 3 validation errors for VectorStoreSearchResponse
attributes.tags.str: Input should be a valid string
attributes.tags.float: Input should be a valid number
attributes.tags.bool: Input should be a valid boolean
```

**Root Cause:**
- `Chunk.metadata` accepts `dict[str, Any]` (any type allowed)
- `VectorStoreSearchResponse.attributes` requires `dict[str, str | float
| bool]` (primitives only)
- Direct assignment at line 641 caused validation failure for
non-primitive types

## Solution

Added utility function to filter metadata to primitive types before
creating search response.


## Impact

**Fixed:**
- Vector search works with list metadata (e.g., `tags: ["transformers",
"gpu"]`)
- Lists become searchable as comma-separated strings
- No ValidationError on search responses

**Preserved:**
- Full metadata still available in `VectorStoreContent.metadata`
- No API schema changes
- Backward compatible with existing primitive metadata

**Affected:**
All vector store providers using `OpenAIVectorStoreMixin`: FAISS,
Chroma, Qdrant, Milvus, Weaviate, PGVector, SQLite-vec

## Testing


tests/unit/providers/vector_io/test_vector_utils.py::test_sanitize_metadata_for_attributes

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
This commit is contained in:
Roy Belio 2025-11-19 20:16:34 +02:00 committed by GitHub
parent 1e4e02e622
commit f18870a221
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 207 additions and 8 deletions

View file

@ -9862,9 +9862,21 @@ components:
title: Object title: Object
default: vector_store.file default: vector_store.file
attributes: attributes:
additionalProperties: true additionalProperties:
anyOf:
- type: string
maxLength: 512
- type: number
- type: boolean
title: string | number | boolean
propertyNames:
type: string
maxLength: 64
type: object type: object
maxProperties: 16
title: Attributes title: Attributes
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
x-oaiTypeLabel: map
chunking_strategy: chunking_strategy:
oneOf: oneOf:
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto' - $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

View file

@ -6705,9 +6705,21 @@ components:
title: Object title: Object
default: vector_store.file default: vector_store.file
attributes: attributes:
additionalProperties: true additionalProperties:
anyOf:
- type: string
maxLength: 512
- type: number
- type: boolean
title: string | number | boolean
propertyNames:
type: string
maxLength: 64
type: object type: object
maxProperties: 16
title: Attributes title: Attributes
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
x-oaiTypeLabel: map
chunking_strategy: chunking_strategy:
oneOf: oneOf:
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto' - $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

View file

@ -6061,9 +6061,21 @@ components:
title: Object title: Object
default: vector_store.file default: vector_store.file
attributes: attributes:
additionalProperties: true additionalProperties:
anyOf:
- type: string
maxLength: 512
- type: number
- type: boolean
title: string | number | boolean
propertyNames:
type: string
maxLength: 64
type: object type: object
maxProperties: 16
title: Attributes title: Attributes
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
x-oaiTypeLabel: map
chunking_strategy: chunking_strategy:
oneOf: oneOf:
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto' - $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

View file

@ -8883,9 +8883,21 @@ components:
title: Object title: Object
default: vector_store.file default: vector_store.file
attributes: attributes:
additionalProperties: true additionalProperties:
anyOf:
- type: string
maxLength: 512
- type: number
- type: boolean
title: string | number | boolean
propertyNames:
type: string
maxLength: 64
type: object type: object
maxProperties: 16
title: Attributes title: Attributes
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
x-oaiTypeLabel: map
chunking_strategy: chunking_strategy:
oneOf: oneOf:
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto' - $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

View file

@ -9862,9 +9862,21 @@ components:
title: Object title: Object
default: vector_store.file default: vector_store.file
attributes: attributes:
additionalProperties: true additionalProperties:
anyOf:
- type: string
maxLength: 512
- type: number
- type: boolean
title: string | number | boolean
propertyNames:
type: string
maxLength: 64
type: object type: object
maxProperties: 16
title: Attributes title: Attributes
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
x-oaiTypeLabel: map
chunking_strategy: chunking_strategy:
oneOf: oneOf:
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto' - $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

View file

@ -11,7 +11,7 @@
from typing import Annotated, Any, Literal, Protocol, runtime_checkable from typing import Annotated, Any, Literal, Protocol, runtime_checkable
from fastapi import Body, Query from fastapi import Body, Query
from pydantic import BaseModel, Field from pydantic import BaseModel, Field, field_validator
from llama_stack_api.common.tracing import telemetry_traceable from llama_stack_api.common.tracing import telemetry_traceable
from llama_stack_api.inference import InterleavedContent from llama_stack_api.inference import InterleavedContent
@ -372,6 +372,65 @@ VectorStoreFileStatus = Literal["completed"] | Literal["in_progress"] | Literal[
register_schema(VectorStoreFileStatus, name="VectorStoreFileStatus") register_schema(VectorStoreFileStatus, name="VectorStoreFileStatus")
# VectorStoreFileAttributes type with OpenAPI constraints
VectorStoreFileAttributes = Annotated[
dict[str, Annotated[str, Field(max_length=512)] | float | bool],
Field(
max_length=16,
json_schema_extra={
"propertyNames": {"type": "string", "maxLength": 64},
"x-oaiTypeLabel": "map",
},
description=(
"Set of 16 key-value pairs that can be attached to an object. This can be "
"useful for storing additional information about the object in a structured "
"format, and querying for objects via API or the dashboard. Keys are strings "
"with a maximum length of 64 characters. Values are strings with a maximum "
"length of 512 characters, booleans, or numbers."
),
),
]
def _sanitize_vector_store_attributes(metadata: dict[str, Any] | None) -> dict[str, str | float | bool]:
"""
Sanitize metadata to VectorStoreFileAttributes spec (max 16 properties, primitives only).
Converts dict[str, Any] to dict[str, str | float | bool]:
- Preserves: str (truncated to 512 chars), bool, int/float (as float)
- Converts: list -> comma-separated string
- Filters: dict, None, other types
- Enforces: max 16 properties, max 64 char keys, max 512 char string values
"""
if not metadata:
return {}
sanitized: dict[str, str | float | bool] = {}
for key, value in metadata.items():
# Enforce max 16 properties
if len(sanitized) >= 16:
break
# Enforce max 64 char keys
if len(key) > 64:
continue
# Convert to supported primitive types
if isinstance(value, bool):
sanitized[key] = value
elif isinstance(value, int | float):
sanitized[key] = float(value)
elif isinstance(value, str):
# Enforce max 512 char string values
sanitized[key] = value[:512] if len(value) > 512 else value
elif isinstance(value, list):
# Convert lists to comma-separated strings (max 512 chars)
list_str = ", ".join(str(item) for item in value)
sanitized[key] = list_str[:512] if len(list_str) > 512 else list_str
return sanitized
@json_schema_type @json_schema_type
class VectorStoreFileObject(BaseModel): class VectorStoreFileObject(BaseModel):
"""OpenAI Vector Store File object. """OpenAI Vector Store File object.
@ -389,7 +448,7 @@ class VectorStoreFileObject(BaseModel):
id: str id: str
object: str = "vector_store.file" object: str = "vector_store.file"
attributes: dict[str, Any] = Field(default_factory=dict) attributes: VectorStoreFileAttributes = Field(default_factory=dict)
chunking_strategy: VectorStoreChunkingStrategy chunking_strategy: VectorStoreChunkingStrategy
created_at: int created_at: int
last_error: VectorStoreFileLastError | None = None last_error: VectorStoreFileLastError | None = None
@ -397,6 +456,12 @@ class VectorStoreFileObject(BaseModel):
usage_bytes: int = 0 usage_bytes: int = 0
vector_store_id: str vector_store_id: str
@field_validator("attributes", mode="before")
@classmethod
def _validate_attributes(cls, v: dict[str, Any] | None) -> dict[str, str | float | bool]:
"""Sanitize attributes to match VectorStoreFileAttributes OpenAPI spec."""
return _sanitize_vector_store_attributes(v)
@json_schema_type @json_schema_type
class VectorStoreListFilesResponse(BaseModel): class VectorStoreListFilesResponse(BaseModel):

View file

@ -5,7 +5,7 @@
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.providers.utils.vector_io.vector_utils import generate_chunk_id from llama_stack.providers.utils.vector_io.vector_utils import generate_chunk_id
from llama_stack_api import Chunk, ChunkMetadata from llama_stack_api import Chunk, ChunkMetadata, VectorStoreFileObject
# This test is a unit test for the chunk_utils.py helpers. This should only contain # This test is a unit test for the chunk_utils.py helpers. This should only contain
# tests which are specific to this file. More general (API-level) tests should be placed in # tests which are specific to this file. More general (API-level) tests should be placed in
@ -78,3 +78,77 @@ def test_chunk_serialization():
serialized_chunk = chunk.model_dump() serialized_chunk = chunk.model_dump()
assert serialized_chunk["chunk_id"] == "test-chunk-id" assert serialized_chunk["chunk_id"] == "test-chunk-id"
assert "chunk_id" in serialized_chunk assert "chunk_id" in serialized_chunk
def test_vector_store_file_object_attributes_validation():
"""Test VectorStoreFileObject validates and sanitizes attributes at input boundary."""
# Test with metadata containing lists, nested dicts, and primitives
from llama_stack_api.vector_io import VectorStoreChunkingStrategyAuto
file_obj = VectorStoreFileObject(
id="file-123",
attributes={
"tags": ["transformers", "h100-compatible", "region:us"], # List -> string
"model_name": "granite-3.3-8b", # String preserved
"score": 0.95, # Float preserved
"active": True, # Bool preserved
"count": 42, # Int -> float
"nested": {"key": "value"}, # Dict filtered out
},
chunking_strategy=VectorStoreChunkingStrategyAuto(),
created_at=1234567890,
status="completed",
vector_store_id="vs-123",
)
# Lists converted to comma-separated strings
assert file_obj.attributes["tags"] == "transformers, h100-compatible, region:us"
# Primitives preserved
assert file_obj.attributes["model_name"] == "granite-3.3-8b"
assert file_obj.attributes["score"] == 0.95
assert file_obj.attributes["active"] is True
assert file_obj.attributes["count"] == 42.0 # int -> float
# Complex types filtered out
assert "nested" not in file_obj.attributes
def test_vector_store_file_object_attributes_constraints():
"""Test VectorStoreFileObject enforces OpenAPI constraints on attributes."""
from llama_stack_api.vector_io import VectorStoreChunkingStrategyAuto
# Test max 16 properties
many_attrs = {f"key{i}": f"value{i}" for i in range(20)}
file_obj = VectorStoreFileObject(
id="file-123",
attributes=many_attrs,
chunking_strategy=VectorStoreChunkingStrategyAuto(),
created_at=1234567890,
status="completed",
vector_store_id="vs-123",
)
assert len(file_obj.attributes) == 16 # Max 16 properties
# Test max 64 char keys are filtered
long_key_attrs = {"a" * 65: "value", "valid_key": "value"}
file_obj = VectorStoreFileObject(
id="file-124",
attributes=long_key_attrs,
chunking_strategy=VectorStoreChunkingStrategyAuto(),
created_at=1234567890,
status="completed",
vector_store_id="vs-123",
)
assert "a" * 65 not in file_obj.attributes
assert "valid_key" in file_obj.attributes
# Test max 512 char string values are truncated
long_value_attrs = {"key": "x" * 600}
file_obj = VectorStoreFileObject(
id="file-125",
attributes=long_value_attrs,
chunking_strategy=VectorStoreChunkingStrategyAuto(),
created_at=1234567890,
status="completed",
vector_store_id="vs-123",
)
assert len(file_obj.attributes["key"]) == 512