Update docs for RAG and improve CONTRIBUTING.md

2025-01-28 06:08:14 -08:00 · 2025-01-28 06:08:14 -08:00 · d123e9d3d7
commit d123e9d3d7
parent 229f0d5f7c
3 changed files with 110 additions and 48 deletions
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -2,15 +2,44 @@
 We want to make contributing to this project as easy and transparent as
 possible.
-## Pull Requests
+## Discussions -> Issues -> Pull Requests
-We actively welcome your pull requests.
+
 We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
 If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.
 **I'd like to contribute!**
 All issues are actionable (please report if they are not.) Pick one and start working on it. Thank you.
 If you need help or guidance, comment on the issue. Issues that are extra friendly to new contributors are tagged with "contributor friendly".
 **I have a bug!**
 1. Search the issue tracker and discussions for similar issues.
 2. If you don't have steps to reproduce, open a discussion.
 3. If you have steps to reproduce, open an issue.
 **I have an idea for a feature!**
 1. Open a discussion.
 **I've implemented a feature!**
 1. If there is an issue for the feature, open a pull request.
 2. If there is no issue, open a discussion and link to your branch.
 **I have a question!**
 1. Open a discussion or use [Discord](https://discord.gg/llama-stack).
 **Opening a Pull Request**
 1. Fork the repo and create your branch from `main`.
-2. If you've added code that should be tested, add tests.
+2. If you've changed APIs, update the documentation.
-3. If you've changed APIs, update the documentation.
+3. Ensure the test suite passes.
-4. Ensure the test suite passes.
+4. Make sure your code lints using `pre-commit`.
-5. Make sure your code lints.
+5. If you haven't already, complete the Contributor License Agreement ("CLA").
 6. If you haven't already, complete the Contributor License Agreement ("CLA").
 ## Contributor License Agreement ("CLA")
 In order to accept your pull request, we need you to submit a CLA. You only need
--- a/docs/source/building_applications/rag.md
+++ b/docs/source/building_applications/rag.md
@ -1,71 +1,99 @@
-## Memory & RAG
+## Using "Memory" or Retrieval Augmented Generation (RAG)
-Memory enables your applications to reference and recall information from previous interactions or external documents. Llama Stack's memory system is built around the concept of Memory Banks:
+Memory enables your applications to reference and recall information from previous interactions or external documents.
-1. **Vector Memory Banks**: For semantic search and retrieval
+Llama Stack organizes the memory APIs into three layers:
-2. **Key-Value Memory Banks**: For structured data storage
+- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
-3. **Keyword Memory Banks**: For basic text search
+- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
-4. **Graph Memory Banks**: For relationship-based retrieval
+- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
-Here's how to set up a vector memory bank for RAG:
+<img src="rag.png" alt="RAG System" width="50%">
 The RAG system uses lower-level storage for different types of data:
 * **Vector IO**: For semantic search and retrieval
 * **Key-Value and Relational IO**: For structured data storage
 We may add more storage types like Graph IO in the future.
 ### Setting up Vector DBs
 Here's how to set up a vector database for RAG:
 ```python
-# Register a memory bank
+# Register a vector db
-bank_id = "my_documents"
+vector_db_id = "my_documents"
-response = client.memory_banks.register(
+response = client.vector_dbs.register(
-    memory_bank_id=bank_id,
+    vector_db_id=vector_db_id,
-    params={
+    embedding_model="all-MiniLM-L6-v2",
-        "memory_bank_type": "vector",
+    embedding_dimension=384,
-        "embedding_model": "all-MiniLM-L6-v2",
+    provider_id="faiss",
        "chunk_size_in_tokens": 512
    }
 )
-# Insert documents
+# You can insert a pre-chunked document directly into the vector db
-documents = [
+chunks = [
    {
        "document_id": "doc1",
        "content": "Your document text here",
        "mime_type": "text/plain"
-    }
+    },
    ...
 ]
-client.memory.insert(bank_id, documents)
+client.vector_io.insert(vector_db_id, chunks)
 # You can then query for these chunks
 chunks_response = client.vector_io.query(vector_db_id, query="What do you know about...")
 ```
 ### Using the RAG Tool
 A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces.
 ```python
 from llama_stack_client.types import Document
 urls = ["memory_optimizations.rst", "chat.rst", "llama3.rst"]
 documents = [
    Document(
        document_id=f"num-{i}",
        content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
        mime_type="text/plain",
        metadata={},
    )
    for i, url in enumerate(urls)
 ]
 client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=512,
 )
 # Query documents
-results = client.memory.query(
+results = client.tool_runtime.rag_tool.query(
-    bank_id=bank_id,
+    vector_db_id=vector_db_id,
    query="What do you know about...",
 )
 ```
 ### Building RAG-Enhanced Agents
 One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example:
 ```python
 from llama_stack_client.types import Attachment
 # Create attachments from documents
 attachments = [
    Attachment(
        content="https://raw.githubusercontent.com/example/doc.rst",
        mime_type="text/plain"
    )
 ]
 # Configure agent with memory
 agent_config = AgentConfig(
    model="Llama3.2-3B-Instruct",
    instructions="You are a helpful assistant",
-    tools=[{
+    toolgroups=[
-        "type": "memory",
+        {
-        "memory_bank_configs": [],
+            "name": "builtin::rag",
-        "query_generator_config": {"type": "default", "sep": " "},
+            "args": {
-        "max_tokens_in_context": 4096,
+                "vector_db_ids": [vector_db_id],
-        "max_chunks": 10
+            }
-    }],
+        }
-    enable_session_persistence=True
+    ]
 )
 agent = Agent(client, agent_config)
@ -77,7 +105,12 @@ response = agent.create_turn(
        "role": "user",
        "content": "I am providing some documents for reference."
    }],
-    attachments=attachments,
+    documents=[
        dict(
            content="https://raw.githubusercontent.com/example/doc.rst",
            mime_type="text/plain"
        )
    ],
    session_id=session_id
 )
--- a/docs/source/building_applications/rag.png
+++ b/docs/source/building_applications/rag.png