llama-stack-mirror/llama_stack/apis
Varsha e92301f2d7
feat(sqlite-vec): enable keyword search for sqlite-vec (#1439)
# What does this PR do?
This PR introduces support for keyword based FTS5 search with BM25
relevance scoring. It makes changes to the existing EmbeddingIndex base
class in order to support a search_mode and query_str parameter, that
can be used for keyword based search implementations.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
run 
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```
Output:
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
====================================================== test session starts =======================================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.4-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0
asyncio: mode=auto, asyncio_default_fixture_loop_scope=None
collected 7 items                                                                                                                

llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_fts PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```


For reference, with the implementation, the fts table looks like below:
```
Chunk ID: 9fbc39ce-c729-64a2-260f-c5ec9bb2a33e, Content: Sentence 0 from document 0
Chunk ID: 94062914-3e23-44cf-1e50-9e25821ba882, Content: Sentence 1 from document 0
Chunk ID: e6cfd559-4641-33ba-6ce1-7038226495eb, Content: Sentence 2 from document 0
Chunk ID: 1383af9b-f1f0-f417-4de5-65fe9456cc20, Content: Sentence 3 from document 0
Chunk ID: 2db19b1a-de14-353b-f4e1-085e8463361c, Content: Sentence 4 from document 0
Chunk ID: 9faf986a-f028-7714-068a-1c795e8f2598, Content: Sentence 5 from document 0
Chunk ID: ef593ead-5a4a-392f-7ad8-471a50f033e8, Content: Sentence 6 from document 0
Chunk ID: e161950f-021f-7300-4d05-3166738b94cf, Content: Sentence 7 from document 0
Chunk ID: 90610fc4-67c1-e740-f043-709c5978867a, Content: Sentence 8 from document 0
Chunk ID: 97712879-6fff-98ad-0558-e9f42e6b81d3, Content: Sentence 9 from document 0
Chunk ID: aea70411-51df-61ba-d2f0-cb2b5972c210, Content: Sentence 0 from document 1
Chunk ID: b678a463-7b84-92b8-abb2-27e9a1977e3c, Content: Sentence 1 from document 1
Chunk ID: 27bd63da-909c-1606-a109-75bdb9479882, Content: Sentence 2 from document 1
Chunk ID: a2ad49ad-f9be-5372-e0c7-7b0221d0b53e, Content: Sentence 3 from document 1
Chunk ID: cac53bcd-1965-082a-c0f4-ceee7323fc70, Content: Sentence 4 from document 1
```

Query results:
Result 1: Sentence 5 from document 0
Result 2: Sentence 5 from document 1
Result 3: Sentence 5 from document 2

[//]: # (## Documentation)

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-05-21 15:24:24 -04:00
..
agents feat: Add "instructions" support to responses API (#2205) 2025-05-20 09:52:10 -07:00
batch_inference chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
benchmarks chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
common chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
datasetio chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
datasets chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
eval chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
files chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
inference feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
inspect chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
models chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
post_training chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
providers chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
safety chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
scoring chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
scoring_functions chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
shields chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
synthetic_data_generation chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
telemetry chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
tools feat(sqlite-vec): enable keyword search for sqlite-vec (#1439) 2025-05-21 15:24:24 -04:00
vector_dbs chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
vector_io chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
datatypes.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
resource.py chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
version.py llama-stack version alpha -> v1 2025-01-15 05:58:09 -08:00