From 24d70cedcaf2cc373ecf2418da80281b0ca6f9fb Mon Sep 17 00:00:00 2001
From: Francisco Arceo <arceofrancisco@gmail.com>
Date: Fri, 11 Apr 2025 12:50:36 -0600
Subject: [PATCH] docs: Updated docs to show minimal RAG example and some other
 minor changes (#1935)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

# What does this PR do?
Incorporating some feedback into the docs.

- **`docs/source/getting_started/index.md`:**
    - Demo actually does RAG now
    - Simplified the installation command for dependencies.
    - Updated demo script examples to align with the latest API changes.
- Replaced manual document manipulation with `RAGDocument` for clarity
and maintainability.
- Introduced new logic for model and embedding selection using the Llama
Stack Client SDK.
- Enhanced examples to showcase proper agent initialization and logging.
- **`docs/source/getting_started/detailed_tutorial.md`:**
- Updated the section for listing models to include proper code
formatting with `bash`.
    - Removed and reorganized the "Run the Demos" section for clarity.
- Adjusted tab-item structures and added new instructions for demo
scripts.
- **`docs/_static/css/my_theme.css`:**
- Updated heading styles to include `h2`, `h3`, and `h4` for consistent
font weight.
- Added a new style for `pre` tags to wrap text and break long words,
this is particularly useful for rendering long output from generation.


## Test Plan
Tested locally. Screenshot for reference:

<img width="1250" alt="Screenshot 2025-04-10 at 10 12 12 PM"
src="https://github.com/user-attachments/assets/ce1c8986-e072-4c6f-a697-ed0d8fb75b34"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
---
 docs/_static/css/my_theme.css                 |   6 +-
 .../getting_started/detailed_tutorial.md      |  26 ++---
 docs/source/getting_started/index.md          | 101 ++++++++----------
 3 files changed, 62 insertions(+), 71 deletions(-)
diff --git a/docs/_static/css/my_theme.css b/docs/_static/css/my_theme.css
index 6f82f6358..a587f866d 100644
--- a/docs/_static/css/my_theme.css
+++ b/docs/_static/css/my_theme.css
@@ -17,9 +17,13 @@
     display: none;
 }
 
-h3 {
+h2, h3, h4 {
     font-weight: normal;
 }
 html[data-theme="dark"] .rst-content div[class^="highlight"] {
   background-color: #0b0b0b;
 }
+pre {
+    white-space: pre-wrap !important;
+    word-break: break-all;
+}
diff --git a/docs/source/getting_started/detailed_tutorial.md b/docs/source/getting_started/detailed_tutorial.md
index 65582e8d8..911b35437 100644
--- a/docs/source/getting_started/detailed_tutorial.md
+++ b/docs/source/getting_started/detailed_tutorial.md
@@ -173,9 +173,8 @@ You will see the below:
 Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
 ```
 
-#### iii. List Available Models
 List the models
-```
+```bash
 llama-stack-client models list
 Available Models
 
@@ -190,15 +189,6 @@ Available Models
 Total models: 2
 
 ```
-
-## Step 4: Run the Demos
-
-Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
-Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
-
-::::{tab-set}
-
-:::{tab-item} Basic Inference with the CLI
 You can test basic Llama inference completion using the CLI.
 
 ```bash
@@ -221,10 +211,16 @@ ChatCompletionResponse(
     ],
 )
 ```
-:::
 
-:::{tab-item} Basic Inference with a Script
-Alternatively, you can run inference using the Llama Stack client SDK.
+## Step 4: Run the Demos
+
+Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
+Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
+
+::::{tab-set}
+
+:::{tab-item} Basic Inference
+Now you can run inference using the Llama Stack client SDK.
 
 ### i. Create the Script
 Create a file `inference.py` and add the following code:
@@ -269,7 +265,7 @@ Beauty in the bits
 :::
 
 :::{tab-item} Build a Simple Agent
-Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
+Next we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
 ### i. Create the Script
 Create a file `agent.py` and add the following code:
 
diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md
index 63fa5ae6e..ce7dbe973 100644
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@@ -12,9 +12,8 @@ as the inference [provider](../providers/index.md#inference) for a Llama Model.
 Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with
 [Ollama](https://ollama.com/download).
 ```bash
-uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals
+uv pip install llama-stack
 source .venv/bin/activate
-export INFERENCE_MODEL="llama3.2:3b"
 ollama run llama3.2:3b --keepalive 60m
 ```
 ## Step 2: Run the Llama Stack Server
@@ -24,70 +23,62 @@ INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type ven
 ## Step 3: Run the Demo
 Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell.
 ```python
-from termcolor import cprint
-from llama_stack_client.types import Document
-from llama_stack_client import LlamaStackClient
-
-
-vector_db = "faiss"
-vector_db_id = "test-vector-db"
-model_id = "llama3.2:3b-instruct-fp16"
-query = "Can you give me the arxiv link for Lora Fine Tuning in Pytorch?"
-documents = [
-    Document(
-        document_id="document_1",
-        content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/lora_finetune.rst",
-        mime_type="text/plain",
-        metadata={},
-    )
-]
+from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient
 
+vector_db_id = "my_demo_vector_db"
 client = LlamaStackClient(base_url="http://localhost:8321")
-client.vector_dbs.register(
-    provider_id=vector_db,
-    vector_db_id=vector_db_id,
-    embedding_model="all-MiniLM-L6-v2",
-    embedding_dimension=384,
-)
 
+models = client.models.list()
+
+# Select the first LLM and first embedding models
+model_id = next(m for m in models if m.model_type == "llm").identifier
+embedding_model_id = (
+    em := next(m for m in models if m.model_type == "embedding")
+).identifier
+embedding_dimension = em.metadata["embedding_dimension"]
+
+_ = client.vector_dbs.register(
+    vector_db_id=vector_db_id,
+    embedding_model=embedding_model_id,
+    embedding_dimension=embedding_dimension,
+    provider_id="faiss",
+)
+document = RAGDocument(
+    document_id="document_1",
+    content="https://www.paulgraham.com/greatwork.html",
+    mime_type="text/html",
+    metadata={},
+)
 client.tool_runtime.rag_tool.insert(
-    documents=documents,
+    documents=[document],
     vector_db_id=vector_db_id,
     chunk_size_in_tokens=50,
 )
-
-response = client.tool_runtime.rag_tool.query(
-    vector_db_ids=[vector_db_id],
-    content=query,
+agent = Agent(
+    client,
+    model=model_id,
+    instructions="You are a helpful assistant",
+    tools=[
+        {
+            "name": "builtin::rag/knowledge_search",
+            "args": {"vector_db_ids": [vector_db_id]},
+        }
+    ],
 )
 
-cprint("" + "-" * 50, "yellow")
-cprint(f"Query> {query}", "red")
-cprint("" + "-" * 50, "yellow")
-for chunk in response.content:
-    cprint(f"Chunk ID> {chunk.text}", "green")
-    cprint("" + "-" * 50, "yellow")
+response = agent.create_turn(
+    messages=[{"role": "user", "content": "How do you do great work?"}],
+    session_id=agent.create_session("rag_session"),
+)
+
+for log in AgentEventLogger().log(response):
+    log.print()
 ```
 And you should see output like below.
-```
---------------------------------------------------
-Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch?
---------------------------------------------------
-Chunk ID> knowledge_search tool found 5 chunks:
-BEGIN of knowledge_search tool results.
-
---------------------------------------------------
-Chunk ID> Result 1:
-Document_id:docum
-Content: .. _lora_finetune_label:
-
-============================
-Fine-Tuning Llama2 with LoRA
-============================
-
-This guide will teach you about `LoRA <https://arxiv.org/abs/2106.09685>`_, a
-
---------------------------------------------------
+```bash
+inference> [knowledge_search(query="What does it mean to do great work")]
+tool_execution> Tool:knowledge_search Args:{'query': 'What does it mean to do great work'}
+tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='Result 2:\nDocument_id:docum\nContent: [<a name="f1n"><font color=#000000>1</font></a>]\nI don\'t think you could give a precise definition of what\ncounts as great work. Doing great work means doing something important\nso well\n', type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: . And if so\nyou're already further along than you might realize, because the\nset of people willing to want to is small.<br /><br />The factors in doing great work are factors in the literal,\nmathematical sense, and\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: \nincreases your morale and helps you do even better work. But this\ncycle also operates in the other direction: if you're not doing\ngood work, that can demoralize you and make it even harder to. Since\nit matters\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent:  to try to do\ngreat work. But that's what's going on subconsciously; they shy\naway from the question.<br /><br />So I'm going to pull a sneaky trick on you. Do you want to do great\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')]
 ```
 Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳