fix(vector-io): handle missing document_id in insert_chunks (#3521)

Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios. Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError. Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios. # What does this PR do? Fixes a KeyError crash in `/v1/vector-io/insert` when chunks are missing `document_id` fields. The API was failing even though `document_id` is optional according to the schema. Closes #3494 ## Test Plan **Before fix:** - POST to `/v1/vector-io/insert` with chunks → 500 KeyError - Happened regardless of where `document_id` was placed **After fix:** - Same request works fine → 200 OK - Tested with Postman using FAISS backend - Added unit test covering missing `document_id` scenarios
2025-12-04 02:03:44 +00:00 · 2025-10-15 14:02:48 -04:00 · 2025-10-15 14:02:48 -04:00 · bc8b377a7c
commit bc8b377a7c
parent e9b4278a51
4 changed files with 51 additions and 2 deletions
--- a/llama_stack/core/routers/vector_io.py
+++ b/llama_stack/core/routers/vector_io.py
@ -93,8 +93,10 @@ class VectorIORouter(VectorIO):
        chunks: list[Chunk],
        ttl_seconds: int | None = None,
    ) -> None:
+        doc_ids = [chunk.document_id for chunk in chunks[:3]]
        logger.debug(
-            f"VectorIORouter.insert_chunks: {vector_db_id}, {len(chunks)} chunks, ttl_seconds={ttl_seconds}, chunk_ids={[chunk.metadata['document_id'] for chunk in chunks[:3]]}{' and more...' if len(chunks) > 3 else ''}",
+            f"VectorIORouter.insert_chunks: {vector_db_id}, {len(chunks)} chunks, "
+            f"ttl_seconds={ttl_seconds}, chunk_ids={doc_ids}{' and more...' if len(chunks) > 3 else ''}"
        )
        provider = await self.routing_table.get_provider_impl(vector_db_id)
        return await provider.insert_chunks(vector_db_id, chunks, ttl_seconds)