From 1639fd8b751e9d6980f81eed235fca6e4b2e8fbd Mon Sep 17 00:00:00 2001
From: Francisco Javier Arceo <farceo@redhat.com>
Date: Sun, 6 Apr 2025 22:54:40 -0400
Subject: [PATCH] rebased

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
---
 docs/source/getting_started/index.md | 135 ++++++++++++++-------------
 1 file changed, 70 insertions(+), 65 deletions(-)

diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md
index ef258a9cf..71cd2ef43 100644
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@@ -1,100 +1,110 @@
 # Quick Start
 
+In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple RAG agent.
+A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with
+tools (e.g., RAG, web search, code execution, etc.) for taking actions.
+In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers.
 
 Llama Stack is a stateful service with REST APIs to support seamless transition of AI applications across different environments. The server can be run in a variety of ways, including as a standalone binary, Docker container, or hosted service. You can build and test using a local server first and deploy to a hosted endpoint for production.
 
-In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/) to run inference on a Llama Model.
+In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/)
+as the inference [provider](../providers/index.md#inference) for a Llama Model.
+
+## Step 1: Installation and Setup
 
 
-### 1. Start Ollama
+### i. Install and Start Ollama for Inference
 
+Install Ollama by following the instructions on the [Ollama website](https://ollama.com/download).
+
+To start Ollama run:
 ```bash
 ollama run llama3.2:3b --keepalive 60m
 ```
 
 By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime.
 
-```{admonition} Note
-:class: tip
+### ii. Install `uv` to Manage your Python packages
 
-If you do not have ollama, you can install it from [here](https://ollama.com/download).
-```
+Install [uv](https://docs.astral.sh/uv/) to setup your virtual environment
 
-### 2. Run Llama Stack locally
+::::{tab-set}
 
-We use `uv` to setup a virtual environment and install the Llama Stack package.
-
-:::{dropdown} [Click to Open] Instructions to setup uv
-
-Install [uv](https://docs.astral.sh/uv/) to setup your virtual environment.
-
-
-#### For macOS and Linux:
-```bash
+:::{tab-item} macOS and Linux
+Use `curl` to download the script and execute it with `sh`:
+```console
 curl -LsSf https://astral.sh/uv/install.sh | sh
 ```
-#### For Windows:
+:::
+
+:::{tab-item} Windows
 Use `irm` to download the script and execute it with `iex`:
-```powershell
+
+```console
 powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
 ```
+:::
+::::
 
-Setup venv
+### iii. Setup your Virtual Environment
 ```bash
 uv venv --python 3.10
 source .venv/bin/activate
 ```
-:::
+## Step 2: Install Llama Stack
+Llama Stack is a server that exposes multiple APIs, you connect with it using the Llama Stack client SDK.
 
-**Install the Llama Stack package**
+### Install the Llama Stack Server
 ```bash
-uv pip install -U llama-stack
+uv pip install llama-stack
 ```
 
-**Build and Run the Llama Stack server for Ollama.**
+### Install the Llama Stack Client
+```bash
+uv pip install llama-stack-client
+```
+
+## Step 3: Build and Run Llama Stack
+Llama Stack uses a [configuration file](../distributions/configuration.md) to define the stack.
+The config file is a YAML file that specifies the providers and their configurations.
+
+### i. Build and Run the Llama Stack Config for Ollama
 ```bash
 INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run
 ```
 
-You will see the output end like below:
+You will see output like below:
 ```
 ...
 INFO:     Application startup complete.
 INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
 ```
 
+### ii. Using the Llama Stack Client
 Now you can use the llama stack client to run inference and build agents!
 
-### 3. Client CLI
+:::{dropdown} You can reuse the server setup or the Llama Stack Client
 
-Install the client package
-```bash
-pip install llama-stack-client
-```
-
-:::{dropdown} OR reuse server setup
 Open a new terminal and navigate to the same directory you started the server from.
 
-Setup venv (llama-stack already includes the llama-stack-client package)
+Setup venv (llama-stack already includes the client package)
 ```bash
 source .venv/bin/activate
 ```
-:::
+Let's use the `llama-stack-client` CLI to check the connectivity to the server.
 
-#### 3.1 Configure the client to point to the local server
 ```bash
-llama-stack-client configure --endpoint http://localhost:8321 --api-key none
+llama-stack-client configure --endpoint http://localhost:$LLAMA_STACK_PORT --api-key none
 ```
 You will see the below:
 ```
 Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
 ```
 
-#### 3.2 List available models
+#### iii. List available models
+List the models
 ```
 llama-stack-client models list
-```
-
 ```
 Available Models
 
@@ -110,7 +120,9 @@ Total models: 2
 
 ```
 
-#### 3.3 Test basic inference
+## Step 4: Run Inference with Llama Stack
+You can test basic Llama inference completion using the CLI too.
+
 ```bash
 llama-stack-client inference chat-completion --message "tell me a joke"
 ```
@@ -132,19 +144,6 @@ ChatCompletionResponse(
 )
 ```
 
-### 4. Python SDK
-Install the python client
-```bash
-pip install llama-stack-client
-```
-:::{dropdown} OR reuse server setup
-Open a new terminal and navigate to the same directory you started the server from.
-
-Setup venv (llama-stack already includes the llama-stack-client package)
-```bash
-source .venv/bin/activate
-```
-:::
 #### 4.1 Basic Inference
 Create a file `inference.py` and add the following code:
 ```python
@@ -170,11 +169,11 @@ response = client.inference.chat_completion(
 )
 print(response.completion_message.content)
 ```
-Run the script
+Let's run the script using `uv`
 ```bash
-python inference.py
+uv run python inference.py
 ```
-Sample output:
+Which will output:
 ```
 Model: llama3.2:3b-instruct-fp16
 Here is a haiku about coding:
@@ -226,9 +225,9 @@ for event in AgentEventLogger().log(stream):
     event.print()
 ```
 
-Run the script:
+Let's run the script using `uv`
 ```bash
-python agent.py
+uv run python agent.py
 ```
 
 :::{dropdown} `Sample output`
@@ -419,19 +418,23 @@ ragagent = Agent(
 
 s_id = ragagent.create_session(session_name=f"s{uuid.uuid4().hex}")
 
-turns = ["what is torchtune", "tell me about dora"]
+user_prompts = [
+    "How to optimize memory usage in torchtune? use the knowledge_search tool to get information.",
+]
 
-for t in turns:
-    print("user>", t)
-    stream = ragagent.create_turn(
-        messages=[{"role": "user", "content": t}], session_id=s_id, stream=True
+# Run the agent loop by calling the `create_turn` method
+for prompt in user_prompts:
+    cprint(f"User> {prompt}", "green")
+    response = rag_agent.create_turn(
+        messages=[{"role": "user", "content": prompt}],
+        session_id=session_id,
     )
     for event in AgentEventLogger().log(stream):
         event.print()
 ```
-Run the script:
-```
-python rag_agent.py
+Let's run the script using `uv`
+```bash
+uv run python lsagent.py
 ```
 :::{dropdown} `Sample output`
 ```
@@ -451,5 +454,7 @@ Overall, DORA is a powerful reinforcement learning algorithm that can learn comp
 ## Next Steps
 - Go through the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)
 - Checkout more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks)
+- Learn more about Llama Stack [Concepts](../concepts/index.md)
+- Learn how to [Build Llama Stacks](../distributions/index.md)
 - See [References](../references/index.md) for more details about the llama CLI and Python SDK
 - For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository.