feat: Enable ingestion of precomputed embeddings (#2317)

This commit is contained in:
Francisco Arceo 2025-05-31 04:03:37 -06:00 committed by GitHub
parent 31ce208bda
commit f328436831
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 366 additions and 15 deletions

View file

@ -57,6 +57,31 @@ chunks = [
]
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks)
```
#### Using Precomputed Embeddings
If you decide to precompute embeddings for your documents, you can insert them directly into the vector database by
including the embedding vectors in the chunk data. This is useful if you have a separate embedding service or if you
want to customize the ingestion process.
```python
chunks_with_embeddings = [
{
"content": "First chunk of text",
"mime_type": "text/plain",
"embedding": [0.1, 0.2, 0.3, ...], # Your precomputed embedding vector
"metadata": {"document_id": "doc1", "section": "introduction"},
},
{
"content": "Second chunk of text",
"mime_type": "text/plain",
"embedding": [0.2, 0.3, 0.4, ...], # Your precomputed embedding vector
"metadata": {"document_id": "doc1", "section": "methodology"},
},
]
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks_with_embeddings)
```
When providing precomputed embeddings, ensure the embedding dimension matches the embedding_dimension specified when
registering the vector database.
### Retrieval
You can query the vector database to retrieve documents based on their embeddings.
```python