chore: Add support for vector-stores files api for Milvus (#2582)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 24s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 18s
Test Llama Stack Build / generate-matrix (push) Successful in 20s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 51s
Test Llama Stack Build / build-single-provider (push) Failing after 55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 54s
Pre-commit / pre-commit (push) Successful in 1m44s

# What does this PR do?
### Summary

This pull request implements support for the OpenAI Vector Store Files
API for the Milvus vector store provider in `llama_stack`. It enables
storing, loading, updating, and deleting file metadata and file contents
in Milvus collections, allowing OpenAI vector store files to be managed
directly within Milvus.

### Main Changes

- **Milvus Vector Store Files API Implementation**
- Implements all required methods for storing, loading, updating, and
deleting vector store file metadata and contents
(`_save_openai_vector_store_file`, `_load_openai_vector_store_file`,
`_load_openai_vector_store_file_contents`,
`_update_openai_vector_store_file`,
`_delete_openai_vector_store_file_from_storage`).
- Uses two Milvus collections: `openai_vector_store_files` (for
metadata) and `openai_vector_store_files_contents` (for chunked file
contents).
- Collections are created dynamically if they do not exist, with
appropriate schema definitions.
- **Collection Name Sanitization**
- Adds a `sanitize_collection_name` utility to ensure Milvus collection
names only contain valid characters (letters, numbers, underscores).
- **Testing**
- Updates test skip logic to include `"inline::milvus"` for cases where
the OpenAI Vector Store Files API is not supported, improving
integration test accuracy.
- **Other Improvements**
  - Passes `kvstore` to `MilvusIndex` for consistency.
- Removes obsolete NotImplementedErrors and legacy code for file
storage.

## Test Plan
CI and tested via a test script

## Notes
- `VectorDB` currently uses the `name` as the `identifier` in
`openai_create_vector_store`. We need to add `name` as a field to
`VectorDB` and generate the `identifier` upon creation. OpenAI is not
idempotent with respect to the `name` field that they pass (i.e., you
can pass the same name multiple times and OpenAI will generate a new
identifier). I'll add a follow up PR for this.
- The `Files` api needs to use `files-` as a prefix in the identifier. I
have updated the Vector Store to use the OpenAI prefix `vs_*`.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
Francisco Arceo 2025-07-03 15:15:33 -04:00 committed by GitHub
parent dae1fcd3c2
commit 4afd619c56
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 179 additions and 20 deletions

View file

@ -31,7 +31,7 @@ def skip_if_provider_doesnt_support_openai_vector_stores(client_with_models):
def skip_if_provider_doesnt_support_openai_vector_store_files_api(client_with_models):
vector_io_providers = [p for p in client_with_models.providers.list() if p.api == "vector_io"]
for p in vector_io_providers:
if p.provider_type in ["inline::faiss", "inline::sqlite-vec"]:
if p.provider_type in ["inline::faiss", "inline::sqlite-vec", "inline::milvus"]:
return
pytest.skip("OpenAI vector stores are not supported by any provider")
@ -524,7 +524,6 @@ def test_openai_vector_store_attach_files_on_creation(compat_client_with_empty_s
file_ids = valid_file_ids + [failed_file_id]
num_failed = len(file_ids) - len(valid_file_ids)
# Create a vector store
vector_store = compat_client.vector_stores.create(
name="test_store",
file_ids=file_ids,