llama-stack

1723 commits 21 branches 64 tags 62 MiB

Author	SHA1	Message	Date
Daniele Martinoli	cca9bd6cc3	feat: Qdrant inline provider (#1273 ) # What does this PR do? Removed local execution option from the remote Qdrant provider and introduced an explicit inline provider for the embedded execution. Updated the ollama template to include this option: this part can be reverted in case we don't want to have two default `vector_io` providers. (Closes #1082) ## Test Plan Build and run an ollama distro: ```bash llama stack build --template ollama --image-type conda llama stack run --image-type conda ollama ``` Run one of the sample ingestionapplicatinos like [rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py), but replace this line: ```py selected_vector_provider = vector_providers[0] ``` with the following, to use the `qdrant` provider: ```py selected_vector_provider = vector_providers[1] ``` After running the test code, verify the timestamp of the Qdrant store: ```bash % ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_* total 784 -rw-r--r--@ 1 dmartino staff 401408 Feb 26 10:07 storage.sqlite ``` [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-03-18 14:04:21 -07:00
Ashwin Bharambe	d072b5fa0c	test: add unit test to ensure all config types are instantiable (#1601 )	2025-03-12 22:29:58 -07:00
Ashwin Bharambe	dc84bc755a	fix: revert to using faiss for ollama distro (#1530 ) This is unfortunate because `sqlite-vec` seems promising. But its PIP package is not quite complete. It does not have binary for arm64 (I think, or maybe it even lacks 64 bit builds?) which results in the arm64 container resulting in ``` File "/usr/local/lib/python3.10/site-packages/sqlite_vec/init.py", line 17, in load conn.load_extension(loadable_path()) sqlite3.OperationalError: /usr/local/lib/python3.10/site-packages/sqlite_vec/vec0.so: wrong ELF class: ELFCLASS32 ``` To get around I tried to install from source via `uv pip install sqlite-vec --no-binary=sqlite-vec` however it even lacks a source distribution which makes that impossible. ## Test Plan Build the container locally using: ```bash LLAMA_STACK_DIR=. llama stack build --template ollama --image-type container ``` Run the container as: ``` podman run --privileged -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -v ~/.llama:/root/.llama \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env OLLAMA_URL=http://host.containers.internal:11434 \ -v ~/local/llama-stack:/app/llama-stack-source localhost/distribution-ollama:dev --port $LLAMA_STACK_PORT ``` Verify the container starts up correctly. Without this patch, it would encounter the ELFCLASS32 error.	2025-03-10 16:15:17 -07:00
Ashwin Bharambe	330cc9d09d	feat: add Milvus vectorDB (#1467 ) # What does this PR do? See https://github.com/meta-llama/llama-stack/pull/1171 which is the original PR. Author: @zc277584121 feat: add [Milvus](https://milvus.io/) vectorDB note: I use the MilvusClient to implement it instead of AsyncMilvusClient, because when I tested AsyncMilvusClient, it would raise issues about evenloop, which I think AsyncMilvusClient SDK is not robust enough to be compatible with llama_stack framework. ## Test Plan have passed the unit test and ene2end test Here is my end2end test logs, including the client code, client log, server logs from inline and remote settings [test_end2end_logs.zip](https://github.com/user-attachments/files/18964391/test_end2end_logs.zip) --------- Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Cheney Zhang <chen.zhang@zilliz.com>	2025-03-06 20:59:31 -08:00
Ashwin Bharambe	992f865b2e	chore: move embedding deps to RAG tool where they are needed (#1210 ) `EMBEDDING_DEPS` were wrongly associated with `vector_io` providers. They are needed by https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142 and related code and is used by the RAG tool and as such should only be needed by the `inline::rag-runtime` provider.	2025-02-21 11:33:41 -08:00
Yuan Tang	5858777ff0	fix: Update VectorIO config classes in registry (#1079 ) This was missed in https://github.com/meta-llama/llama-stack/pull/1023. ``` Traceback (most recent call last): File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 488, in <module> main() File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 389, in main impls = asyncio.run(construct_stack(config)) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/home/yutang/repos/llama-stack/llama_stack/distribution/stack.py", line 202, in construct_stack impls = await resolve_impls(run_config, provider_registry or get_provider_registry(), dist_registry) File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 230, in resolve_impls impl = await instantiate_provider( File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 312, in instantiate_provider config_type = instantiate_class_type(provider_spec.config_class) File "/home/yutang/repos/llama-stack/llama_stack/distribution/utils/dynamic.py", line 13, in instantiate_class_type return getattr(module, class_name) AttributeError: module 'llama_stack.providers.inline.vector_io.faiss' has no attribute 'FaissImplConfig' ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 15:39:13 -08:00
Francisco Arceo	119fe8742a	feat: Adding sqlite-vec as a vectordb (#1040 ) # What does this PR do? This PR adds `sqlite_vec` as an additional inline vectordb. Tested with `ollama` by adding the `vector_io` object in `./llama_stack/templates/ollama/run.yaml` : ```yaml vector_io: - provider_id: sqlite_vec provider_type: inline::sqlite_vec config: kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db ``` I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test file with: ```python INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"] ``` And parameterized the relevant tests. [//]: # (If resolving an issue, uncomment and update the line below) # Closes https://github.com/meta-llama/llama-stack/issues/1005 ## Test Plan I ran the tests with: ```bash INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ``` Which outputs: ```python ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` In addition, I ran the `rag_with_vector_db.py` [example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py) using the script below with `uv run rag_example.py`. <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python #!/usr/bin/env python3 import os import uuid from termcolor import cprint # Set environment variables os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16' os.environ['LLAMA_STACK_CONFIG'] = 'ollama' # Import libraries after setting environment variables from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig from llama_stack_client.types import Document def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() model_id = 'llama3.2:3b-instruct-fp16' # Define the list of document URLs and create Document objects urls = [ "chat.rst", "llama3.rst", "memory_optimizations.rst", "lora_finetune.rst", ] documents = [ Document( document_id=f"num-{i}", content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", mime_type="text/plain", metadata={}, ) for i, url in enumerate(urls) ] # (Optional) Use the documents as needed with your client here client.vector_dbs.register( provider_id='sqlite_vec', vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) # Create agent configuration agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant", enable_session_persistence=False, toolgroups=[ { "name": "builtin::rag", "args": { "vector_db_ids": [vector_db_id], } } ], ) # Instantiate the Agent agent = Agent(client, agent_config) # List of user prompts user_prompts = [ "What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.", "Was anything related to 'Llama3' discussed, if so what?", "Tell me how to use LoRA", "What about Quantization?", ] # Create a session for the agent session_id = agent.create_session("test-session") # Process each prompt and display the output for prompt in user_prompts: cprint(f"User> {prompt}", "green") response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=session_id, ) # Log and print events from the response for log in EventLogger().log(response): log.print() if __name__ == "__main__": main() ``` </details> Which outputs a large summary of RAG generation. # Documentation Will handle documentation updates in follow-up PR. # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-12 10:50:03 -08:00
Ashwin Bharambe	f98efe68c9	Misc fixes (#944 ) - Make sure torch + torchvision go together as deps, otherwise bad stuff happens - Add a pre-commit for requirements.txt	2025-02-03 14:08:47 -08:00
Ashwin Bharambe	3ae8585b65	[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. This is the first part: - delete other kinds of memory banks (keyvalue, keyword, graph) for now; we will introduce a keyvalue store API as part of this design but not use it in the RAG tool yet. - renaming of the APIs	2025-01-22 09:59:30 -08:00

Renamed from llama_stack/providers/registry/memory.py (Browse further)

9 commits