forked from phoenix-oss/llama-stack-mirror
docs: update Agent documentation (#1333)
Summary: - [new] Agent concepts (session, turn) - [new] how to write custom tools - [new] non-streaming API and how to get outputs - [update] remaining `memory` -> `rag` rename - [new] note importance of `instructions` Test Plan: read
This commit is contained in:
parent
46b0a404e8
commit
52977e56a8
6 changed files with 170 additions and 64 deletions
|
@ -1,8 +1,8 @@
|
|||
## Using "Memory" or Retrieval Augmented Generation (RAG)
|
||||
## Using Retrieval Augmented Generation (RAG)
|
||||
|
||||
Memory enables your applications to reference and recall information from previous interactions or external documents.
|
||||
RAG enables your applications to reference and recall information from previous interactions or external documents.
|
||||
|
||||
Llama Stack organizes the memory APIs into three layers:
|
||||
Llama Stack organizes the APIs that enable RAG into three layers:
|
||||
- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
|
||||
- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
|
||||
- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
|
||||
|
@ -86,7 +86,7 @@ from llama_stack_client.lib.agents.agent import Agent
|
|||
|
||||
# Configure agent with memory
|
||||
agent_config = AgentConfig(
|
||||
model="meta-llama/Llama-3.2-3B-Instruct",
|
||||
model="meta-llama/Llama-3.3-70B-Instruct",
|
||||
instructions="You are a helpful assistant",
|
||||
enable_session_persistence=False,
|
||||
toolgroups=[
|
||||
|
@ -102,6 +102,19 @@ agent_config = AgentConfig(
|
|||
agent = Agent(client, agent_config)
|
||||
session_id = agent.create_session("rag_session")
|
||||
|
||||
|
||||
# Ask questions about documents in the vector db, and the agent will query the db to answer the question.
|
||||
response = agent.create_turn(
|
||||
messages=[{"role": "user", "content": "How to optimize memory in PyTorch?"}],
|
||||
session_id=session_id,
|
||||
)
|
||||
```
|
||||
|
||||
> **NOTE:** the `instructions` field in the `AgentConfig` can be used to guide the agent's behavior. It is important to experiment with different instructions to see what works best for your use case.
|
||||
|
||||
|
||||
You can also pass documents along with the user's message and ask questions about them.
|
||||
```python
|
||||
# Initial document ingestion
|
||||
response = agent.create_turn(
|
||||
messages=[
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue