This was missed in https://github.com/meta-llama/llama-stack/pull/1023.
```
Traceback (most recent call last):
File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 488, in <module>
main()
File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 389, in main
impls = asyncio.run(construct_stack(config))
File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/yutang/repos/llama-stack/llama_stack/distribution/stack.py", line 202, in construct_stack
impls = await resolve_impls(run_config, provider_registry or get_provider_registry(), dist_registry)
File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 230, in resolve_impls
impl = await instantiate_provider(
File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 312, in instantiate_provider
config_type = instantiate_class_type(provider_spec.config_class)
File "/home/yutang/repos/llama-stack/llama_stack/distribution/utils/dynamic.py", line 13, in instantiate_class_type
return getattr(module, class_name)
AttributeError: module 'llama_stack.providers.inline.vector_io.faiss' has no attribute 'FaissImplConfig'
```
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This PR adds `sqlite_vec` as an additional inline vectordb.
Tested with `ollama` by adding the `vector_io` object in
`./llama_stack/templates/ollama/run.yaml` :
```yaml
vector_io:
- provider_id: sqlite_vec
provider_type: inline::sqlite_vec
config:
kvstore:
type: sqlite
namespace: null
db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db
db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db
```
I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test
file with:
```python
INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"]
```
And parameterized the relevant tests.
[//]: # (If resolving an issue, uncomment and update the line below)
# Closes
https://github.com/meta-llama/llama-stack/issues/1005
## Test Plan
I ran the tests with:
```bash
INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py
```
Which outputs:
```python
...
PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED
```
In addition, I ran the `rag_with_vector_db.py`
[example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py)
using the script below with `uv run rag_example.py`.
<details>
<summary>CLICK TO SHOW SCRIPT 👋 </summary>
```python
#!/usr/bin/env python3
import os
import uuid
from termcolor import cprint
# Set environment variables
os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16'
os.environ['LLAMA_STACK_CONFIG'] = 'ollama'
# Import libraries after setting environment variables
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig
from llama_stack_client.types import Document
def main():
# Initialize the client
client = LlamaStackAsLibraryClient("ollama")
vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"
_ = client.initialize()
model_id = 'llama3.2:3b-instruct-fp16'
# Define the list of document URLs and create Document objects
urls = [
"chat.rst",
"llama3.rst",
"memory_optimizations.rst",
"lora_finetune.rst",
]
documents = [
Document(
document_id=f"num-{i}",
content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
mime_type="text/plain",
metadata={},
)
for i, url in enumerate(urls)
]
# (Optional) Use the documents as needed with your client here
client.vector_dbs.register(
provider_id='sqlite_vec',
vector_db_id=vector_db_id,
embedding_model="all-MiniLM-L6-v2",
embedding_dimension=384,
)
client.tool_runtime.rag_tool.insert(
documents=documents,
vector_db_id=vector_db_id,
chunk_size_in_tokens=512,
)
# Create agent configuration
agent_config = AgentConfig(
model=model_id,
instructions="You are a helpful assistant",
enable_session_persistence=False,
toolgroups=[
{
"name": "builtin::rag",
"args": {
"vector_db_ids": [vector_db_id],
}
}
],
)
# Instantiate the Agent
agent = Agent(client, agent_config)
# List of user prompts
user_prompts = [
"What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.",
"Was anything related to 'Llama3' discussed, if so what?",
"Tell me how to use LoRA",
"What about Quantization?",
]
# Create a session for the agent
session_id = agent.create_session("test-session")
# Process each prompt and display the output
for prompt in user_prompts:
cprint(f"User> {prompt}", "green")
response = agent.create_turn(
messages=[
{
"role": "user",
"content": prompt,
}
],
session_id=session_id,
)
# Log and print events from the response
for log in EventLogger().log(response):
log.print()
if __name__ == "__main__":
main()
```
</details>
Which outputs a large summary of RAG generation.
# Documentation
Will handle documentation updates in follow-up PR.
# (- [ ] Added a Changelog entry if the change is significant)
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.
This is the first part:
- delete other kinds of memory banks (keyvalue, keyword, graph) for now;
we will introduce a keyvalue store API as part of this design but not
use it in the RAG tool yet.
- renaming of the APIs
2025-01-22 09:59:30 -08:00
Renamed from llama_stack/providers/registry/memory.py (Browse further)