docs: update Agent documentation (#1333)

Summary:
- [new] Agent concepts (session, turn)
- [new] how to write custom tools
- [new] non-streaming API and how to get outputs
- [update] remaining `memory` -> `rag` rename
- [new] note importance of `instructions`

Test Plan:
read
This commit is contained in:
ehhuang 2025-03-01 22:34:52 -08:00 committed by GitHub
parent 46b0a404e8
commit 52977e56a8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 170 additions and 64 deletions

View file

@ -1,8 +1,8 @@
## Using "Memory" or Retrieval Augmented Generation (RAG)
## Using Retrieval Augmented Generation (RAG)
Memory enables your applications to reference and recall information from previous interactions or external documents.
RAG enables your applications to reference and recall information from previous interactions or external documents.
Llama Stack organizes the memory APIs into three layers:
Llama Stack organizes the APIs that enable RAG into three layers:
- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
@ -86,7 +86,7 @@ from llama_stack_client.lib.agents.agent import Agent
# Configure agent with memory
agent_config = AgentConfig(
model="meta-llama/Llama-3.2-3B-Instruct",
model="meta-llama/Llama-3.3-70B-Instruct",
instructions="You are a helpful assistant",
enable_session_persistence=False,
toolgroups=[
@ -102,6 +102,19 @@ agent_config = AgentConfig(
agent = Agent(client, agent_config)
session_id = agent.create_session("rag_session")
# Ask questions about documents in the vector db, and the agent will query the db to answer the question.
response = agent.create_turn(
messages=[{"role": "user", "content": "How to optimize memory in PyTorch?"}],
session_id=session_id,
)
```
> **NOTE:** the `instructions` field in the `AgentConfig` can be used to guide the agent's behavior. It is important to experiment with different instructions to see what works best for your use case.
You can also pass documents along with the user's message and ask questions about them.
```python
# Initial document ingestion
response = agent.create_turn(
messages=[