llama-stack

1388 commits 21 branches 64 tags 62 MiB

Author	SHA1	Message	Date
Ashwin Bharambe	992f865b2e	chore: move embedding deps to RAG tool where they are needed (#1210 ) `EMBEDDING_DEPS` were wrongly associated with `vector_io` providers. They are needed by https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142 and related code and is used by the RAG tool and as such should only be needed by the `inline::rag-runtime` provider.	2025-02-21 11:33:41 -08:00
Ben Browning	406465622e	fix: Update QdrantConfig to QdrantVectorIOConfig (#1104 ) # What does this PR do? This fixes an import introduced due to merging #1079 before #1039, and thus the changes from #1039 needing to update `QdrantConfig` to `QdrantVectorIOConfig`. ## Test Plan I ran the remote vllm provider inference tests against the latest main: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` That failed with: ``` File "/home/bbrownin/src/llama-stack/llama_stack/providers/tests/vector_io/fixtures.py", line 20, in <module> from llama_stack.providers.remote.vector_io.qdrant import QdrantConfig ImportError: Error importing plugin "llama_stack.providers.tests.vector_io.fixtures": cannot import name 'QdrantConfig' from 'llama_stack.providers.remote.vector_io.qdrant' (/home/bbrownin/src/llama-stack/llama_stack/providers/remote/vector_io/qdrant/__init__.py) ``` After this change, the import no longer fails and the tests pass. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-14 06:31:00 -08:00
Bill Murdock	32d1e50a6f	test: Add qdrant to provider tests (#1039 ) # What does this PR do? This is a follow on to #1022 . It includes the changes I needed to be able to test the Qdrant support as requested by @terrytangyuan . I uncovered a lot of bigger, more systemic issues with the vector DB testing and I will open a new issue for those. For now, I am just delivering the work I already did on that. ## Test Plan As discussed on #1022: ``` podman pull qdrant/qdrant mkdir qdrant-data podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant ``` ``` ollama pull all-minilm:l6-v2 curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}' ``` ``` EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings ``` These show 3 tests passing and 15 deselected which is presumably working as intended. --------- Signed-off-by: Bill Murdock <bmurdock@redhat.com>	2025-02-13 15:44:55 -08:00
Yuan Tang	8ff27b58fa	chore: Consistent naming for VectorIO providers (#1023 ) # What does this PR do? This changes all VectorIO providers classes to follow the pattern `<ProviderName>VectorIOConfig` and `<ProviderName>VectorIOAdapter`. All API endpoints for VectorIOs are currently consistent with `/vector-io`. Note that API endpoint for VectorDB stay unchanged as `/vector-dbs`. ## Test Plan I don't have a way to test all providers. This is a simple renaming so things should work as expected. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 13:15:49 -05:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Francisco Arceo	119fe8742a	feat: Adding sqlite-vec as a vectordb (#1040 ) # What does this PR do? This PR adds `sqlite_vec` as an additional inline vectordb. Tested with `ollama` by adding the `vector_io` object in `./llama_stack/templates/ollama/run.yaml` : ```yaml vector_io: - provider_id: sqlite_vec provider_type: inline::sqlite_vec config: kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db ``` I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test file with: ```python INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"] ``` And parameterized the relevant tests. [//]: # (If resolving an issue, uncomment and update the line below) # Closes https://github.com/meta-llama/llama-stack/issues/1005 ## Test Plan I ran the tests with: ```bash INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ``` Which outputs: ```python ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` In addition, I ran the `rag_with_vector_db.py` [example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py) using the script below with `uv run rag_example.py`. <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python #!/usr/bin/env python3 import os import uuid from termcolor import cprint # Set environment variables os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16' os.environ['LLAMA_STACK_CONFIG'] = 'ollama' # Import libraries after setting environment variables from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig from llama_stack_client.types import Document def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() model_id = 'llama3.2:3b-instruct-fp16' # Define the list of document URLs and create Document objects urls = [ "chat.rst", "llama3.rst", "memory_optimizations.rst", "lora_finetune.rst", ] documents = [ Document( document_id=f"num-{i}", content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", mime_type="text/plain", metadata={}, ) for i, url in enumerate(urls) ] # (Optional) Use the documents as needed with your client here client.vector_dbs.register( provider_id='sqlite_vec', vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) # Create agent configuration agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant", enable_session_persistence=False, toolgroups=[ { "name": "builtin::rag", "args": { "vector_db_ids": [vector_db_id], } } ], ) # Instantiate the Agent agent = Agent(client, agent_config) # List of user prompts user_prompts = [ "What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.", "Was anything related to 'Llama3' discussed, if so what?", "Tell me how to use LoRA", "What about Quantization?", ] # Create a session for the agent session_id = agent.create_session("test-session") # Process each prompt and display the output for prompt in user_prompts: cprint(f"User> {prompt}", "green") response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=session_id, ) # Log and print events from the response for log in EventLogger().log(response): log.print() if __name__ == "__main__": main() ``` </details> Which outputs a large summary of RAG generation. # Documentation Will handle documentation updates in follow-up PR. # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-12 10:50:03 -08:00
Ashwin Bharambe	78a481bb22	[memory refactor][2/n] Update faiss and make it pass tests (#830 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. Second part: - updates routing table / router code - updates the faiss implementation ## Test Plan ``` pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384 ```	2025-01-22 10:02:15 -08:00

Renamed from llama_stack/providers/tests/memory/fixtures.py (Browse further)

7 commits