forked from phoenix-oss/llama-stack-mirror
Update docs for RAG and improve CONTRIBUTING.md
This commit is contained in:
parent
229f0d5f7c
commit
d123e9d3d7
3 changed files with 110 additions and 48 deletions
|
@ -2,15 +2,44 @@
|
|||
We want to make contributing to this project as easy and transparent as
|
||||
possible.
|
||||
|
||||
## Pull Requests
|
||||
We actively welcome your pull requests.
|
||||
## Discussions -> Issues -> Pull Requests
|
||||
|
||||
We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
|
||||
|
||||
If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.
|
||||
|
||||
**I'd like to contribute!**
|
||||
|
||||
All issues are actionable (please report if they are not.) Pick one and start working on it. Thank you.
|
||||
If you need help or guidance, comment on the issue. Issues that are extra friendly to new contributors are tagged with "contributor friendly".
|
||||
|
||||
**I have a bug!**
|
||||
|
||||
1. Search the issue tracker and discussions for similar issues.
|
||||
2. If you don't have steps to reproduce, open a discussion.
|
||||
3. If you have steps to reproduce, open an issue.
|
||||
|
||||
**I have an idea for a feature!**
|
||||
|
||||
1. Open a discussion.
|
||||
|
||||
**I've implemented a feature!**
|
||||
|
||||
1. If there is an issue for the feature, open a pull request.
|
||||
2. If there is no issue, open a discussion and link to your branch.
|
||||
|
||||
**I have a question!**
|
||||
|
||||
1. Open a discussion or use [Discord](https://discord.gg/llama-stack).
|
||||
|
||||
|
||||
**Opening a Pull Request**
|
||||
|
||||
1. Fork the repo and create your branch from `main`.
|
||||
2. If you've added code that should be tested, add tests.
|
||||
3. If you've changed APIs, update the documentation.
|
||||
4. Ensure the test suite passes.
|
||||
5. Make sure your code lints.
|
||||
6. If you haven't already, complete the Contributor License Agreement ("CLA").
|
||||
2. If you've changed APIs, update the documentation.
|
||||
3. Ensure the test suite passes.
|
||||
4. Make sure your code lints using `pre-commit`.
|
||||
5. If you haven't already, complete the Contributor License Agreement ("CLA").
|
||||
|
||||
## Contributor License Agreement ("CLA")
|
||||
In order to accept your pull request, we need you to submit a CLA. You only need
|
||||
|
|
|
@ -1,71 +1,99 @@
|
|||
## Memory & RAG
|
||||
## Using "Memory" or Retrieval Augmented Generation (RAG)
|
||||
|
||||
Memory enables your applications to reference and recall information from previous interactions or external documents. Llama Stack's memory system is built around the concept of Memory Banks:
|
||||
Memory enables your applications to reference and recall information from previous interactions or external documents.
|
||||
|
||||
1. **Vector Memory Banks**: For semantic search and retrieval
|
||||
2. **Key-Value Memory Banks**: For structured data storage
|
||||
3. **Keyword Memory Banks**: For basic text search
|
||||
4. **Graph Memory Banks**: For relationship-based retrieval
|
||||
Llama Stack organizes the memory APIs into three layers:
|
||||
- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
|
||||
- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
|
||||
- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
|
||||
|
||||
Here's how to set up a vector memory bank for RAG:
|
||||
<img src="rag.png" alt="RAG System" width="50%">
|
||||
|
||||
The RAG system uses lower-level storage for different types of data:
|
||||
* **Vector IO**: For semantic search and retrieval
|
||||
* **Key-Value and Relational IO**: For structured data storage
|
||||
|
||||
We may add more storage types like Graph IO in the future.
|
||||
|
||||
### Setting up Vector DBs
|
||||
|
||||
Here's how to set up a vector database for RAG:
|
||||
|
||||
```python
|
||||
# Register a memory bank
|
||||
bank_id = "my_documents"
|
||||
response = client.memory_banks.register(
|
||||
memory_bank_id=bank_id,
|
||||
params={
|
||||
"memory_bank_type": "vector",
|
||||
"embedding_model": "all-MiniLM-L6-v2",
|
||||
"chunk_size_in_tokens": 512
|
||||
}
|
||||
# Register a vector db
|
||||
vector_db_id = "my_documents"
|
||||
response = client.vector_dbs.register(
|
||||
vector_db_id=vector_db_id,
|
||||
embedding_model="all-MiniLM-L6-v2",
|
||||
embedding_dimension=384,
|
||||
provider_id="faiss",
|
||||
)
|
||||
|
||||
# Insert documents
|
||||
documents = [
|
||||
# You can insert a pre-chunked document directly into the vector db
|
||||
chunks = [
|
||||
{
|
||||
"document_id": "doc1",
|
||||
"content": "Your document text here",
|
||||
"mime_type": "text/plain"
|
||||
}
|
||||
},
|
||||
...
|
||||
]
|
||||
client.memory.insert(bank_id, documents)
|
||||
client.vector_io.insert(vector_db_id, chunks)
|
||||
|
||||
# You can then query for these chunks
|
||||
chunks_response = client.vector_io.query(vector_db_id, query="What do you know about...")
|
||||
|
||||
```
|
||||
|
||||
### Using the RAG Tool
|
||||
|
||||
A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces.
|
||||
|
||||
```python
|
||||
from llama_stack_client.types import Document
|
||||
|
||||
urls = ["memory_optimizations.rst", "chat.rst", "llama3.rst"]
|
||||
documents = [
|
||||
Document(
|
||||
document_id=f"num-{i}",
|
||||
content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
|
||||
mime_type="text/plain",
|
||||
metadata={},
|
||||
)
|
||||
for i, url in enumerate(urls)
|
||||
]
|
||||
|
||||
client.tool_runtime.rag_tool.insert(
|
||||
documents=documents,
|
||||
vector_db_id=vector_db_id,
|
||||
chunk_size_in_tokens=512,
|
||||
)
|
||||
|
||||
# Query documents
|
||||
results = client.memory.query(
|
||||
bank_id=bank_id,
|
||||
results = client.tool_runtime.rag_tool.query(
|
||||
vector_db_id=vector_db_id,
|
||||
query="What do you know about...",
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
### Building RAG-Enhanced Agents
|
||||
|
||||
One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example:
|
||||
|
||||
```python
|
||||
from llama_stack_client.types import Attachment
|
||||
|
||||
# Create attachments from documents
|
||||
attachments = [
|
||||
Attachment(
|
||||
content="https://raw.githubusercontent.com/example/doc.rst",
|
||||
mime_type="text/plain"
|
||||
)
|
||||
]
|
||||
|
||||
# Configure agent with memory
|
||||
agent_config = AgentConfig(
|
||||
model="Llama3.2-3B-Instruct",
|
||||
instructions="You are a helpful assistant",
|
||||
tools=[{
|
||||
"type": "memory",
|
||||
"memory_bank_configs": [],
|
||||
"query_generator_config": {"type": "default", "sep": " "},
|
||||
"max_tokens_in_context": 4096,
|
||||
"max_chunks": 10
|
||||
}],
|
||||
enable_session_persistence=True
|
||||
toolgroups=[
|
||||
{
|
||||
"name": "builtin::rag",
|
||||
"args": {
|
||||
"vector_db_ids": [vector_db_id],
|
||||
}
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
agent = Agent(client, agent_config)
|
||||
|
@ -77,7 +105,12 @@ response = agent.create_turn(
|
|||
"role": "user",
|
||||
"content": "I am providing some documents for reference."
|
||||
}],
|
||||
attachments=attachments,
|
||||
documents=[
|
||||
dict(
|
||||
content="https://raw.githubusercontent.com/example/doc.rst",
|
||||
mime_type="text/plain"
|
||||
)
|
||||
],
|
||||
session_id=session_id
|
||||
)
|
||||
|
||||
|
|
BIN
docs/source/building_applications/rag.png
Normal file
BIN
docs/source/building_applications/rag.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 145 KiB |
Loading…
Add table
Add a link
Reference in a new issue