diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e42d6db75..0da1fcdab 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -2,15 +2,44 @@ We want to make contributing to this project as easy and transparent as possible. -## Pull Requests -We actively welcome your pull requests. +## Discussions -> Issues -> Pull Requests + +We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md). + +If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later. + +**I'd like to contribute!** + +All issues are actionable (please report if they are not.) Pick one and start working on it. Thank you. +If you need help or guidance, comment on the issue. Issues that are extra friendly to new contributors are tagged with "contributor friendly". + +**I have a bug!** + +1. Search the issue tracker and discussions for similar issues. +2. If you don't have steps to reproduce, open a discussion. +3. If you have steps to reproduce, open an issue. + +**I have an idea for a feature!** + +1. Open a discussion. + +**I've implemented a feature!** + +1. If there is an issue for the feature, open a pull request. +2. If there is no issue, open a discussion and link to your branch. + +**I have a question!** + +1. Open a discussion or use [Discord](https://discord.gg/llama-stack). + + +**Opening a Pull Request** 1. Fork the repo and create your branch from `main`. -2. If you've added code that should be tested, add tests. -3. If you've changed APIs, update the documentation. -4. Ensure the test suite passes. -5. Make sure your code lints. -6. If you haven't already, complete the Contributor License Agreement ("CLA"). +2. If you've changed APIs, update the documentation. +3. Ensure the test suite passes. +4. Make sure your code lints using `pre-commit`. +5. If you haven't already, complete the Contributor License Agreement ("CLA"). ## Contributor License Agreement ("CLA") In order to accept your pull request, we need you to submit a CLA. You only need diff --git a/docs/source/building_applications/rag.md b/docs/source/building_applications/rag.md index 17ecd2046..485973aed 100644 --- a/docs/source/building_applications/rag.md +++ b/docs/source/building_applications/rag.md @@ -1,71 +1,99 @@ -## Memory & RAG +## Using "Memory" or Retrieval Augmented Generation (RAG) -Memory enables your applications to reference and recall information from previous interactions or external documents. Llama Stack's memory system is built around the concept of Memory Banks: +Memory enables your applications to reference and recall information from previous interactions or external documents. -1. **Vector Memory Banks**: For semantic search and retrieval -2. **Key-Value Memory Banks**: For structured data storage -3. **Keyword Memory Banks**: For basic text search -4. **Graph Memory Banks**: For relationship-based retrieval +Llama Stack organizes the memory APIs into three layers: +- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.) +- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly. +- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more. -Here's how to set up a vector memory bank for RAG: +RAG System + +The RAG system uses lower-level storage for different types of data: +* **Vector IO**: For semantic search and retrieval +* **Key-Value and Relational IO**: For structured data storage + +We may add more storage types like Graph IO in the future. + +### Setting up Vector DBs + +Here's how to set up a vector database for RAG: ```python -# Register a memory bank -bank_id = "my_documents" -response = client.memory_banks.register( - memory_bank_id=bank_id, - params={ - "memory_bank_type": "vector", - "embedding_model": "all-MiniLM-L6-v2", - "chunk_size_in_tokens": 512 - } +# Register a vector db +vector_db_id = "my_documents" +response = client.vector_dbs.register( + vector_db_id=vector_db_id, + embedding_model="all-MiniLM-L6-v2", + embedding_dimension=384, + provider_id="faiss", ) -# Insert documents -documents = [ +# You can insert a pre-chunked document directly into the vector db +chunks = [ { "document_id": "doc1", "content": "Your document text here", "mime_type": "text/plain" - } + }, + ... ] -client.memory.insert(bank_id, documents) +client.vector_io.insert(vector_db_id, chunks) + +# You can then query for these chunks +chunks_response = client.vector_io.query(vector_db_id, query="What do you know about...") + +``` + +### Using the RAG Tool + +A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces. + +```python +from llama_stack_client.types import Document + +urls = ["memory_optimizations.rst", "chat.rst", "llama3.rst"] +documents = [ + Document( + document_id=f"num-{i}", + content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", + mime_type="text/plain", + metadata={}, + ) + for i, url in enumerate(urls) +] + +client.tool_runtime.rag_tool.insert( + documents=documents, + vector_db_id=vector_db_id, + chunk_size_in_tokens=512, +) # Query documents -results = client.memory.query( - bank_id=bank_id, +results = client.tool_runtime.rag_tool.query( + vector_db_id=vector_db_id, query="What do you know about...", ) ``` - ### Building RAG-Enhanced Agents One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example: ```python -from llama_stack_client.types import Attachment - -# Create attachments from documents -attachments = [ - Attachment( - content="https://raw.githubusercontent.com/example/doc.rst", - mime_type="text/plain" - ) -] # Configure agent with memory agent_config = AgentConfig( model="Llama3.2-3B-Instruct", instructions="You are a helpful assistant", - tools=[{ - "type": "memory", - "memory_bank_configs": [], - "query_generator_config": {"type": "default", "sep": " "}, - "max_tokens_in_context": 4096, - "max_chunks": 10 - }], - enable_session_persistence=True + toolgroups=[ + { + "name": "builtin::rag", + "args": { + "vector_db_ids": [vector_db_id], + } + } + ] ) agent = Agent(client, agent_config) @@ -77,7 +105,12 @@ response = agent.create_turn( "role": "user", "content": "I am providing some documents for reference." }], - attachments=attachments, + documents=[ + dict( + content="https://raw.githubusercontent.com/example/doc.rst", + mime_type="text/plain" + ) + ], session_id=session_id ) diff --git a/docs/source/building_applications/rag.png b/docs/source/building_applications/rag.png new file mode 100644 index 000000000..a5e5b8cdb Binary files /dev/null and b/docs/source/building_applications/rag.png differ