Commit graph

20 commits

Author SHA1 Message Date
Varsha
2e8054bede
feat: Implement hybrid search in SQLite-vec (#2312)
Some checks failed
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s
Test Llama Stack Build / generate-matrix (push) Successful in 37s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.11) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.10) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 2m0s
# What does this PR do?
Add support for hybrid search mode in SQLite-vec provider, which
combines
keyword and vector search for better results. The implementation:

- Adds hybrid search mode as a new option alongside vector and keyword
search
- Implements query_hybrid method in SQLiteVecIndex that:
  - First performs keyword search to get candidate matches
  - Then applies vector similarity search on those candidates
- Updates documentation to reflect the new search mode

This change improves search quality by leveraging both semantic
similarity
and keyword matching, while maintaining backward compatibility with
existing
vector and keyword search modes.

## Test Plan
```
pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 10 items                                                                                                                                                                                                

tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED
```

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-06-13 15:54:06 -04:00
Charlie Doern
a7ecc92be1
docs: add post training to providers list (#2280)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, datasets) (push) Failing after 11s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (http, inspect) (push) Failing after 12s
Integration Tests / test-matrix (http, agents) (push) Failing after 13s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 1m18s
Pre-commit / pre-commit (push) Successful in 3m0s
# What does this PR do?

the providers list is missing post_training. Add that column and
`HuggingFace`, `TorchTune`, and `NVIDIA NEMO` as supported providers.

also point to these providers in docs/source/providers/index.md, and
describe basic functionality

There are other missing provider types here as well, but starting with
this

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-05-28 09:32:00 -04:00
Varsha
e92301f2d7
feat(sqlite-vec): enable keyword search for sqlite-vec (#1439)
# What does this PR do?
This PR introduces support for keyword based FTS5 search with BM25
relevance scoring. It makes changes to the existing EmbeddingIndex base
class in order to support a search_mode and query_str parameter, that
can be used for keyword based search implementations.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
run 
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```
Output:
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
====================================================== test session starts =======================================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.4-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0
asyncio: mode=auto, asyncio_default_fixture_loop_scope=None
collected 7 items                                                                                                                

llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_fts PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```


For reference, with the implementation, the fts table looks like below:
```
Chunk ID: 9fbc39ce-c729-64a2-260f-c5ec9bb2a33e, Content: Sentence 0 from document 0
Chunk ID: 94062914-3e23-44cf-1e50-9e25821ba882, Content: Sentence 1 from document 0
Chunk ID: e6cfd559-4641-33ba-6ce1-7038226495eb, Content: Sentence 2 from document 0
Chunk ID: 1383af9b-f1f0-f417-4de5-65fe9456cc20, Content: Sentence 3 from document 0
Chunk ID: 2db19b1a-de14-353b-f4e1-085e8463361c, Content: Sentence 4 from document 0
Chunk ID: 9faf986a-f028-7714-068a-1c795e8f2598, Content: Sentence 5 from document 0
Chunk ID: ef593ead-5a4a-392f-7ad8-471a50f033e8, Content: Sentence 6 from document 0
Chunk ID: e161950f-021f-7300-4d05-3166738b94cf, Content: Sentence 7 from document 0
Chunk ID: 90610fc4-67c1-e740-f043-709c5978867a, Content: Sentence 8 from document 0
Chunk ID: 97712879-6fff-98ad-0558-e9f42e6b81d3, Content: Sentence 9 from document 0
Chunk ID: aea70411-51df-61ba-d2f0-cb2b5972c210, Content: Sentence 0 from document 1
Chunk ID: b678a463-7b84-92b8-abb2-27e9a1977e3c, Content: Sentence 1 from document 1
Chunk ID: 27bd63da-909c-1606-a109-75bdb9479882, Content: Sentence 2 from document 1
Chunk ID: a2ad49ad-f9be-5372-e0c7-7b0221d0b53e, Content: Sentence 3 from document 1
Chunk ID: cac53bcd-1965-082a-c0f4-ceee7323fc70, Content: Sentence 4 from document 1
```

Query results:
Result 1: Sentence 5 from document 0
Result 2: Sentence 5 from document 1
Result 3: Sentence 5 from document 2

[//]: # (## Documentation)

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-05-21 15:24:24 -04:00
Charlie Doern
e46de23be6
feat: refactor external providers dir (#2049)
# What does this PR do?

currently the "default" dir for external providers is
`/etc/llama-stack/providers.d`

This dir is not used anywhere nor created.

Switch to a more friendly `~/.llama/providers.d/`

This allows external providers to actually create this dir and/or
populate it upon installation, `pip` cannot create directories in `etc`.

If a user does not specify a dir, default to this one

see https://github.com/containers/ramalama-stack/issues/36

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-05-15 20:17:03 +02:00
Ihar Hrachyshka
1de0dfaab5
docs: Clarify kfp provider is both inline and remote (#2144)
The provider selling point *is* using the same provider for both.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-14 09:37:07 +02:00
Divya
3022f7b642
feat: Adding TLS support for Remote::Milvus vector_io (#2011)
# What does this PR do?
For the Issue :-
#[2010](https://github.com/meta-llama/llama-stack/issues/2010)
Currently, if we try to connect the Llama stack server to a remote
Milvus instance that has TLS enabled, the connection fails because TLS
support is not implemented in the Llama stack codebase. As a result,
users are unable to use secured Milvus deployments out of the box.

After adding this , the user will be able to connect to remote::Milvus
which is TLS enabled .
if TLS enabled :-
```
vector_io:
  - provider_id: milvus
    provider_type: remote::milvus
    config:
      uri: "http://<host>:<port>"
      token: "<user>:<password>"
      secure: True
      server_pem_path: "path/to/server.pem"
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
I have already tested it by connecting to a Milvus instance which is TLS
enabled and i was able to start llama stack server .
2025-05-06 14:15:34 +02:00
Christina Xu
65cc971877
docs: Add TrustyAI LM-Eval to list of known external providers (#2020)
# What does this PR do?
Adds documentation for the remote [TrustyAI LM-Eval Eval
Provider](https://github.com/trustyai-explainability/llama-stack-provider-lmeval).
LM-Eval is a service for large language model evaluation based on the
open source project
[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
and is integrated into the [TrustyAI Kubernetes
Operator](https://trustyai-explainability.github.io/trustyai-site/main/trustyai-operator.html).
2025-05-06 14:11:55 +02:00
Sébastien Han
a5d151e912
docs: fix typo mivus.md -> milvus.md (#2102)
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-05 09:48:38 -07:00
Ihar Hrachyshka
16e163da0e
docs: List external kubeflow pipelines provider prototype (#2100)
# What does this PR do?

Lists another external provider example (kfp).

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-05 10:24:52 +02:00
Charlie Doern
a673697858
chore: rename ramalama provider (#2008)
# What does this PR do?

the ramalama team has decided to rename their external provider
`ramalama-stack` (more catchy!). Update docs accordingly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-04-24 09:34:15 +02:00
Nathan Weinberg
6a44e7ba20
docs: add API to external providers table (#2006)
Also does a minor reorg of the columns

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-04-23 15:58:10 +02:00
Nathan Weinberg
d6e88e0bc6
docs: add RamaLama to list of known external providers (#2004)
The RamaLama project now has an external provider offering for Llama
Stack: https://github.com/containers/llama-stack-provider-ramalama

See also: https://github.com/meta-llama/llama-stack/pull/1676

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-04-23 09:44:18 +02:00
Francisco Arceo
49955a06b1
docs: Update quickstart page to structure things a little more for the novices (#1873)
# What does this PR do?
Another doc enhancement for
https://github.com/meta-llama/llama-stack/issues/1818

Summary of changes:
- `docs/source/distributions/configuration.md`
   - Updated dropdown title to include a more user-friendly description.

- `docs/_static/css/my_theme.css`
   - Added styling for `<h3>` elements to set a normal font weight.

- `docs/source/distributions/starting_llama_stack_server.md`
- Changed section headers from bold text to proper markdown headers
(e.g., `##`).
- Improved descriptions for starting Llama Stack server using different
methods (library, container, conda, Kubernetes).
- Enhanced clarity and structure by converting instructions into
markdown headers and improved formatting.

- `docs/source/getting_started/index.md`
   - Major restructuring of the "Quick Start" guide:
- Added new introductory section for Llama Stack and its capabilities.
- Reorganized steps into clearer subsections with proper markdown
headers.
- Replaced dropdowns with tabbed content for OS-specific instructions.
- Added detailed steps for setting up and running the Llama Stack server
and client.
- Introduced new sections for running basic inference and building
agents.
- Enhanced readability and visual structure with emojis, admonitions,
and examples.

- `docs/source/providers/index.md`
   - Updated the list of LLM inference providers to include "Ollama."
   - Expanded the list of vector databases to include "SQLite-Vec."

Let me know if you need further details!

## Test Plan
Renders locally, included screenshot.

# Documentation

For https://github.com/meta-llama/llama-stack/issues/1818

<img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM"
src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-10 14:09:00 -07:00
Sébastien Han
389767010b
feat: ability to execute external providers (#1672)
# What does this PR do?

Providers that live outside of the llama-stack codebase are now
supported.
A new property `external_providers_dir` has been added to the main
config and can be configured as follow:

```
external_providers_dir: /etc/llama-stack/providers.d/
```

Where the expected structure is:

```
providers.d/
  inference/
    custom_ollama.yaml
    vllm.yaml
  vector_io/
    qdrant.yaml
```

Where `custom_ollama.yaml` is:

```
adapter:
  adapter_type: custom_ollama
  pip_packages: ["ollama", "aiohttp"]
  config_class: llama_stack_ollama_provider.config.OllamaImplConfig
  module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
```

Obviously the package must be installed on the system, here is the
`llama_stack_ollama_provider` example:

```
$ uv pip show llama-stack-ollama-provider
Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv
Name: llama-stack-ollama-provider
Version: 0.1.0
Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages
Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider
Requires:
Required-by:
```

Closes: https://github.com/meta-llama/llama-stack/issues/658

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-09 10:30:41 +02:00
Francisco Arceo
37b6da37ba
docs: Document sqlite-vec faiss comparison (#1821)
# What does this PR do?
This PR documents and benchmarks the performance tradeoffs between
sqlite-vec and FAISS inline VectorDB providers.

# Closes https://github.com/meta-llama/llama-stack/issues/1165

## Test Plan

The test was run using this script:

<details>
<summary>CLICK TO SHOW SCRIPT 👋  </summary>

```python

import cProfile
import os
import uuid
import time
import random
import string
import matplotlib.pyplot as plt
import pandas as pd
from termcolor import cprint
from llama_stack_client.types import Document
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
from memory_profiler import profile
from line_profiler import LineProfiler

os.environ["INFERENCE_MODEL"] = "llama3.2:3b-instruct-fp16"
os.environ["LLAMA_STACK_CONFIG"] = "ollama"

def generate_random_chars(count=400):
    return ''.join(random.choices(string.ascii_letters, k=count))

def generate_documents(num_docs: int, num_chars: int):
    documents = [
        Document(
            document_id=f"doc-{i}",
            content=f"Document content for document {i} - {generate_random_chars(count=num_chars)}",
            mime_type="text/plain",
            metadata={},
        )
        for i in range(num_docs)
    ]
    return documents


@profile
def benchmark_write(client, vector_db_id, documents, batch_size=100):
    write_times = []
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        start_time = time.time()
        client.tool_runtime.rag_tool.insert(
            documents=batch,
            vector_db_id=vector_db_id,
            chunk_size_in_tokens=512,
        )
        end_time = time.time()
        write_times.append(end_time - start_time)

    return write_times

@profile
def benchmark_read(client, provider_id, vector_db_id, user_prompts):
    response_times = []
    for prompt in user_prompts:
        start_time = time.time()
        response = client.vector_io.query(
            vector_db_id=vector_db_id,
            query=prompt,
        )
        end_time = time.time()
        response_times.append(end_time - start_time)
    return response_times

def profile_functions():
    profiler = LineProfiler()
    profiler.add_function(benchmark_write)
    profiler.add_function(benchmark_read)
    return profiler


def plot_results(output, batch_size):
    # Create a DataFrame for easy manipulation
    df_sqlite = pd.DataFrame(output['sqlite-vec'])
    df_faiss = pd.DataFrame(output['faiss'])

    df_sqlite['write_times'] *= 1000
    df_faiss['write_times'] *= 1000

    avg_write_sqlite = df_sqlite['write_times'].mean()
    avg_write_faiss = df_faiss['write_times'].mean()
    avg_read_sqlite = df_sqlite['read_times'].mean()
    avg_read_faiss = df_faiss['read_times'].mean()

    plt.figure(figsize=(12, 6))
    plt.hist(df_sqlite['write_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Write Times')
    plt.hist(df_faiss['write_times'], bins=10, alpha=0.5, color='red', label='faiss Write Times')
    plt.axvline(avg_write_sqlite, color='blue', linestyle='--',
                label=f'Average Write Time (sqlite-vec): {avg_write_sqlite:.3f} ms')
    plt.axvline(avg_write_faiss, color='red', linestyle='--',
                label=f'Average Write Time (faiss): {avg_write_faiss:.3f} ms')
    plt.title(f'Histogram of Write Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]} with batch size = {batch_size}')
    plt.xlabel('Time (milliseconds)')
    plt.ylabel('Density')
    plt.legend()
    plt.savefig('write_time_comparison.png')
    plt.close()

    plt.figure(figsize=(12, 6))
    plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times')
    plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times')
    plt.axvline(avg_read_sqlite, color='blue', linestyle='--',
                label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms')
    plt.axvline(avg_read_faiss, color='red', linestyle='--',
                label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms')
    plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}')
    plt.xlabel('Time (milliseconds)')
    plt.ylabel('Density')
    plt.legend()
    plt.savefig('read_time_comparison.png')
    plt.close()

    plt.figure(figsize=(12, 6))
    plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times')
    plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times')
    plt.axvline(avg_read_sqlite, color='blue', linestyle='--',
                label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms')
    plt.axvline(avg_read_faiss, color='red', linestyle='--',
                label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms')
    plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}')
    plt.xlabel('Time (milliseconds)')
    plt.ylabel('Density')
    plt.legend()
    plt.savefig('read_time_comparison.png')
    plt.close()

    plt.figure(figsize=(12, 6))
    plt.plot(df_sqlite.index, df_sqlite['write_times'],
             marker='o', markersize=4, linestyle='-', color='blue',
             label='sqlite-vec Write Times')
    plt.plot(df_faiss.index, df_faiss['write_times'],
             marker='x', markersize=4, linestyle='-', color='red',
             label='faiss Write Times')

    plt.title(f'Write Times by Operation Sequence\n(batch size = {batch_size})')
    plt.xlabel('Write Operation Sequence')
    plt.ylabel('Time (milliseconds)')
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.savefig('write_time_sequence.png')
    plt.close()
    # Print out the summary table
    print("\nPerformance Summary for sqlite-vec:")
    print(df_sqlite)

    # Print out the summary table
    print("\nPerformance Summary for faiss:")
    print(df_faiss)


def main():
    # Initialize the client
    client = LlamaStackAsLibraryClient("ollama")
    vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"
    _ = client.initialize()

    # Generate a large dataset
    num_chars = 50
    num_docs = 100
    num_writes = 100
    write_batch_size = 100
    num_reads = 100

    documents = generate_documents(num_docs * write_batch_size, num_chars)
    user_prompts = [
        f"Tell me about document {i}" for i in range(1, num_reads + 1)
    ]

    providers = ["sqlite-vec", "faiss"]
    output = {
        provider_id: {"write_times": None, "read_times": None} for provider_id in providers
    }

    # Benchmark writes and reads for SQLite and Faiss
    for provider_id in providers:
        cprint(f"Benchmarking provider: {provider_id}", "yellow")
        client.vector_dbs.register(
            provider_id=provider_id,
            vector_db_id=vector_db_id,
            embedding_model="all-MiniLM-L6-v2",
            embedding_dimension=384,
        )
        write_times = benchmark_write(client, vector_db_id, documents, write_batch_size)

        average_write_time_ms = sum(write_times) / len(write_times) * 1000.
        cprint(f"Average write time for {provider_id} is {average_write_time_ms:.2f} milliseconds for {num_writes} runs", "blue")

        cprint(f"Benchmarking reads for provider: {provider_id}", "yellow")
        read_times = benchmark_read(client, provider_id, vector_db_id, user_prompts)

        average_read_time_ms = sum(read_times) / len(read_times) * 1000.
        cprint(f"Average read time for {provider_id} is {average_read_time_ms:.2f} milliseconds for {num_reads} runs", "blue")

        client.vector_dbs.unregister(vector_db_id=vector_db_id)
        output[provider_id]['write_times'] = write_times
        output[provider_id]['read_times'] = read_times
    # Generate plots and summary
    plot_results(output, write_batch_size)


if __name__ == "__main__":
    cProfile.run('main()', 'profile_output.prof')
```
</details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-28 17:41:33 +01:00
Daniele Martinoli
cca9bd6cc3
feat: Qdrant inline provider (#1273)
# What does this PR do?
Removed local execution option from the remote Qdrant provider and
introduced an explicit inline provider for the embedded execution.
Updated the ollama template to include this option: this part can be
reverted in case we don't want to have two default `vector_io`
providers.

(Closes #1082)

## Test Plan
Build and run an ollama distro:
```bash
llama stack build --template ollama --image-type conda
llama stack run --image-type conda ollama
```

Run one of the sample ingestionapplicatinos like
[rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py),
but replace this line:
```py
    selected_vector_provider = vector_providers[0]
```
with the following, to use the `qdrant` provider:
```py
    selected_vector_provider = vector_providers[1]
```

After running the test code, verify the timestamp of the Qdrant store:
```bash
% ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_*
total 784
-rw-r--r--@ 1 dmartino  staff  401408 Feb 26 10:07 storage.sqlite
```

[//]: # (## Documentation)

---------

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Co-authored-by: Francisco Arceo <farceo@redhat.com>
2025-03-18 14:04:21 -07:00
Ashwin Bharambe
330cc9d09d
feat: add Milvus vectorDB (#1467)
# What does this PR do?
See https://github.com/meta-llama/llama-stack/pull/1171 which is the
original PR. Author: @zc277584121

feat: add [Milvus](https://milvus.io/) vectorDB

note: I use the MilvusClient to implement it instead of
AsyncMilvusClient, because when I tested AsyncMilvusClient, it would
raise issues about evenloop, which I think AsyncMilvusClient SDK is not
robust enough to be compatible with llama_stack framework.

## Test Plan
have passed the unit test and ene2end test
Here is my end2end test logs, including the client code, client log,
server logs from inline and remote settings

[test_end2end_logs.zip](https://github.com/user-attachments/files/18964391/test_end2end_logs.zip)

---------

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Cheney Zhang <chen.zhang@zilliz.com>
2025-03-06 20:59:31 -08:00
Ashwin Bharambe
8bbd52bb9f
chore: remove dependency on llama_models completely (#1344) 2025-03-01 12:48:08 -08:00
Yuan Tang
17162b9978
docs: Add vLLM to the list of inference providers in concepts and providers pages (#1227)
This increases visibility of the vLLM provider.
2025-02-23 20:16:30 -08:00
Francisco Arceo
19ae4b35d9
docs: Adding Provider sections to docs (#1195)
# What does this PR do?
Adding Provider sections to docs (some of these will be empty and need
updating).


This PR is still a draft while I seek feedback from other contributors.
I opened it to make the structure visible in the linked GitHub Issue.

# Closes https://github.com/meta-llama/llama-stack/issues/1189

- Providers Overview Page
![Screenshot 2025-02-21 at 12 15
09 PM](https://github.com/user-attachments/assets/e83e5a17-0d96-4de0-8251-68161799a054)

- SQLite-Vec specific page
![Screenshot 2025-02-21 at 12 15
34 PM](https://github.com/user-attachments/assets/14773900-fc8f-49e9-832a-b060b7ca010a)

## Test Plan
N/A

[//]: # (## Documentation)

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-22 11:59:34 -08:00