docs: concepts and building_applications migration (#3534)

# What does this PR do? - Migrates the remaining documentation sections to the new documentation format    ## Test Plan - Partial migration
2025-12-05 02:17:31 +00:00 · 2025-09-24 14:05:30 -07:00 · 2025-09-24 14:05:30 -07:00 · c71ce8df61
commit c71ce8df61
parent 05ff4c4420
82 changed files with 2535 additions and 1237 deletions
--- a/docs/docs/getting_started/detailed_tutorial.mdx
+++ b/docs/docs/getting_started/detailed_tutorial.mdx
@ -0,0 +1,563 @@
+---
+title: Detailed Tutorial
+description: Complete guide to using Llama Stack server and client SDK to build AI agents
+sidebar_label: Detailed Tutorial
+sidebar_position: 3
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Detailed Tutorial
+
+In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple agent.
+A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with
+tools (e.g., RAG, web search, code execution, etc.) for taking actions.
+In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers.
+
+Llama Stack is a stateful service with REST APIs to support seamless transition of AI applications across different environments. The server can be run in a variety of ways, including as a standalone binary, Docker container, or hosted service. You can build and test using a local server first and deploy to a hosted endpoint for production.
+
+In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/)
+as the inference [provider](../providers/index.md#inference) for a Llama Model.
+
+### Step 1: Installation and Setup
+
+Install Ollama by following the instructions on the [Ollama website](https://ollama.com/download), then
+download Llama 3.2 3B model, and then start the Ollama service.
+```bash
+ollama pull llama3.2:3b
+ollama run llama3.2:3b --keepalive 60m
+```
+
+Install [uv](https://docs.astral.sh/uv/) to setup your virtual environment
+
+::::{tab-set}
+
+:::{tab-item} macOS and Linux
+Use `curl` to download the script and execute it with `sh`:
+```console
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+:::
+
+:::{tab-item} Windows
+Use `irm` to download the script and execute it with `iex`:
+
+```console
+powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
+```
+:::
+::::
+
+Setup your virtual environment.
+
+```bash
+uv sync --python 3.12
+source .venv/bin/activate
+```
+### Step 2:  Run Llama Stack
+Llama Stack is a server that exposes multiple APIs, you connect with it using the Llama Stack client SDK.
+
+::::{tab-set}
+
+:::{tab-item} Using `venv`
+You can use Python to build and run the Llama Stack server, which is useful for testing and development.
+
+Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
+which defines the providers and their settings. The generated configuration serves as a starting point that you can [customize for your specific needs](../distributions/customizing_run_yaml.md).
+Now let's build and run the Llama Stack config for Ollama.
+We use `starter` as template. By default all providers are disabled, this requires enable ollama by passing environment variables.
+
+```bash
+llama stack build --distro starter --image-type venv --run
+```
+:::
+:::{tab-item} Using `venv`
+You can use Python to build and run the Llama Stack server, which is useful for testing and development.
+
+Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
+which defines the providers and their settings.
+Now let's build and run the Llama Stack config for Ollama.
+
+```bash
+llama stack build --distro starter --image-type venv --run
+```
+:::
+:::{tab-item} Using a Container
+You can use a container image to run the Llama Stack server. We provide several container images for the server
+component that works with different inference providers out of the box. For this guide, we will use
+`llamastack/distribution-starter` as the container image. If you'd like to build your own image or customize the
+configurations, please check out [this guide](../distributions/building_distro.md).
+First lets setup some environment variables and create a local directory to mount into the container’s file system.
+```bash
+export LLAMA_STACK_PORT=8321
+mkdir -p ~/.llama
+```
+Then start the server using the container tool of your choice.  For example, if you are running Docker you can use the
+following command:
+```bash
+docker run -it \
+  --pull always \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ~/.llama:/root/.llama \
+  llamastack/distribution-starter \
+  --port $LLAMA_STACK_PORT \
+  --env OLLAMA_URL=http://host.docker.internal:11434
+```
+Note to start the container with Podman, you can do the same but replace `docker` at the start of the command with
+`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL`
+with `host.containers.internal`.
+
+The configuration YAML for the Ollama distribution is available at `distributions/ollama/run.yaml`.
+
+```{tip}
+
+Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the host’s network directly so it can connect to Ollama running on `localhost:11434`.
+
+Linux users having issues running the above command should instead try the following:
+```bash
+docker run -it \
+  --pull always \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ~/.llama:/root/.llama \
+  --network=host \
+  llamastack/distribution-starter \
+  --port $LLAMA_STACK_PORT \
+  --env OLLAMA_URL=http://localhost:11434
+```
+:::
+::::
+You will see output like below:
+```
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
+```
+
+Now you can use the Llama Stack client to run inference and build agents!
+
+You can reuse the server setup or use the [Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/).
+Note that the client package is already included in the `llama-stack` package.
+
+### Step 3: Run Client CLI
+
+Open a new terminal and navigate to the same directory you started the server from. Then set up a new or activate your
+existing server virtual environment.
+
+::::{tab-set}
+
+:::{tab-item} Reuse Server `venv`
+```bash
+# The client is included in the llama-stack package so we just activate the server venv
+source .venv/bin/activate
+```
+:::
+
+:::{tab-item} Install with `venv`
+```bash
+uv venv client --python 3.12
+source client/bin/activate
+pip install llama-stack-client
+```
+:::
+
+
+::::
+
+Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the
+connectivity to the server.
+
+```bash
+llama-stack-client configure --endpoint http://localhost:8321 --api-key none
+```
+You will see the below:
+```
+Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
+```
+
+List the models
+```bash
+llama-stack-client models list
+Available Models
+
+┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
+┃ model_type      ┃ identifier                          ┃ provider_resource_id                ┃ metadata                                  ┃ provider_id           ┃
+┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
+│ embedding       │ ollama/all-minilm:l6-v2             │ all-minilm:l6-v2                    │ {'embedding_dimension': 384.0}            │ ollama                │
+├─────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────────┼───────────────────────┤
+│ ...             │ ...                                 │ ...                                 │                                           │ ...                   │
+├─────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────────┼───────────────────────┤
+│ llm             │ ollama/Llama-3.2:3b                 │ llama3.2:3b                         │                                           │ ollama                │
+└─────────────────┴─────────────────────────────────────┴─────────────────────────────────────┴───────────────────────────────────────────┴───────────────────────┘
+
+```
+You can test basic Llama inference completion using the CLI.
+
+```bash
+llama-stack-client inference chat-completion --model-id "ollama/llama3.2:3b" --message "tell me a joke"
+
+```
+Sample output:
+```python
+OpenAIChatCompletion(
+    id="chatcmpl-08d7b2be-40f3-47ed-8f16-a6f29f2436af",
+    choices=[
+        OpenAIChatCompletionChoice(
+            finish_reason="stop",
+            index=0,
+            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
+                role="assistant",
+                content="Why couldn't the bicycle stand up by itself?\n\nBecause it was two-tired.",
+                name=None,
+                tool_calls=None,
+                refusal=None,
+                annotations=None,
+                audio=None,
+                function_call=None,
+            ),
+            logprobs=None,
+        )
+    ],
+    created=1751725254,
+    model="llama3.2:3b",
+    object="chat.completion",
+    service_tier=None,
+    system_fingerprint="fp_ollama",
+    usage={
+        "completion_tokens": 18,
+        "prompt_tokens": 29,
+        "total_tokens": 47,
+        "completion_tokens_details": None,
+        "prompt_tokens_details": None,
+    },
+)
+```
+
+### Step 4: Run the Demos
+
+Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
+Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
+
+::::{tab-set}
+
+:::{tab-item} Basic Inference
+Now you can run inference using the Llama Stack client SDK.
+
+#### i. Create the Script
+
+Create a file `inference.py` and add the following code:
+```python
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient(base_url="http://localhost:8321")
+
+# List available models
+models = client.models.list()
+
+# Select the first LLM
+llm = next(m for m in models if m.model_type == "llm" and m.provider_id == "ollama")
+model_id = llm.identifier
+
+print("Model:", model_id)
+
+response = client.chat.completions.create(
+    model=model_id,
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Write a haiku about coding"},
+    ],
+)
+print(response)
+```
+
+#### ii. Run the Script
+Let's run the script using `uv`
+```bash
+uv run python inference.py
+```
+Which will output:
+```
+Model: ollama/llama3.2:3b
+OpenAIChatCompletion(id='chatcmpl-30cd0f28-a2ad-4b6d-934b-13707fc60ebf', choices=[OpenAIChatCompletionChoice(finish_reason='stop', index=0, message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(role='assistant', content="Lines of code unfold\nAlgorithms dance with ease\nLogic's gentle kiss", name=None, tool_calls=None, refusal=None, annotations=None, audio=None, function_call=None), logprobs=None)], created=1751732480, model='llama3.2:3b', object='chat.completion', service_tier=None, system_fingerprint='fp_ollama', usage={'completion_tokens': 16, 'prompt_tokens': 37, 'total_tokens': 53, 'completion_tokens_details': None, 'prompt_tokens_details': None})
+```
+:::
+
+:::{tab-item} Build a Simple Agent
+Next we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
+#### i. Create the Script
+Create a file `agent.py` and add the following code:
+
+```python
+from llama_stack_client import LlamaStackClient
+from llama_stack_client import Agent, AgentEventLogger
+from rich.pretty import pprint
+import uuid
+
+client = LlamaStackClient(base_url=f"http://localhost:8321")
+
+models = client.models.list()
+llm = next(m for m in models if m.model_type == "llm" and m.provider_id == "ollama")
+model_id = llm.identifier
+
+agent = Agent(client, model=model_id, instructions="You are a helpful assistant.")
+
+s_id = agent.create_session(session_name=f"s{uuid.uuid4().hex}")
+
+print("Non-streaming ...")
+response = agent.create_turn(
+    messages=[{"role": "user", "content": "Who are you?"}],
+    session_id=s_id,
+    stream=False,
+)
+print("agent>", response.output_message.content)
+
+print("Streaming ...")
+stream = agent.create_turn(
+    messages=[{"role": "user", "content": "Who are you?"}], session_id=s_id, stream=True
+)
+for event in stream:
+    pprint(event)
+
+print("Streaming with print helper...")
+stream = agent.create_turn(
+    messages=[{"role": "user", "content": "Who are you?"}], session_id=s_id, stream=True
+)
+for event in AgentEventLogger().log(stream):
+    event.print()
+```
+### ii. Run the Script
+Let's run the script using `uv`
+```bash
+uv run python agent.py
+```
+
+```{dropdown} 👋 Click here to see the sample output
+    Non-streaming ...
+    agent> I'm an artificial intelligence designed to assist and communicate with users like you. I don't have a personal identity, but I can provide information, answer questions, and help with tasks to the best of my abilities.
+
+    I'm a large language model, which means I've been trained on a massive dataset of text from various sources, allowing me to understand and respond to a wide range of topics and questions. My purpose is to provide helpful and accurate information, and I'm constantly learning and improving my responses based on the interactions I have with users like you.
+
+    I can help with:
+
+    * Answering questions on various subjects
+    * Providing definitions and explanations
+    * Offering suggestions and ideas
+    * Assisting with language-related tasks, such as proofreading and editing
+    * Generating text and content
+    * And more!
+
+    Feel free to ask me anything, and I'll do my best to help!
+    Streaming ...
+    AgentTurnResponseStreamChunk(
+    │   event=TurnResponseEvent(
+    │   │   payload=AgentTurnResponseStepStartPayload(
+    │   │   │   event_type='step_start',
+    │   │   │   step_id='69831607-fa75-424a-949b-e2049e3129d1',
+    │   │   │   step_type='inference',
+    │   │   │   metadata={}
+    │   │   )
+    │   )
+    )
+    AgentTurnResponseStreamChunk(
+    │   event=TurnResponseEvent(
+    │   │   payload=AgentTurnResponseStepProgressPayload(
+    │   │   │   delta=TextDelta(text='As', type='text'),
+    │   │   │   event_type='step_progress',
+    │   │   │   step_id='69831607-fa75-424a-949b-e2049e3129d1',
+    │   │   │   step_type='inference'
+    │   │   )
+    │   )
+    )
+    AgentTurnResponseStreamChunk(
+    │   event=TurnResponseEvent(
+    │   │   payload=AgentTurnResponseStepProgressPayload(
+    │   │   │   delta=TextDelta(text=' a', type='text'),
+    │   │   │   event_type='step_progress',
+    │   │   │   step_id='69831607-fa75-424a-949b-e2049e3129d1',
+    │   │   │   step_type='inference'
+    │   │   )
+    │   )
+    )
+    ...
+    AgentTurnResponseStreamChunk(
+    │   event=TurnResponseEvent(
+    │   │   payload=AgentTurnResponseStepCompletePayload(
+    │   │   │   event_type='step_complete',
+    │   │   │   step_details=InferenceStep(
+    │   │   │   │   api_model_response=CompletionMessage(
+    │   │   │   │   │   content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?',
+    │   │   │   │   │   role='assistant',
+    │   │   │   │   │   stop_reason='end_of_turn',
+    │   │   │   │   │   tool_calls=[]
+    │   │   │   │   ),
+    │   │   │   │   step_id='69831607-fa75-424a-949b-e2049e3129d1',
+    │   │   │   │   step_type='inference',
+    │   │   │   │   turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca',
+    │   │   │   │   completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 716174, tzinfo=TzInfo(UTC)),
+    │   │   │   │   started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28823, tzinfo=TzInfo(UTC))
+    │   │   │   ),
+    │   │   │   step_id='69831607-fa75-424a-949b-e2049e3129d1',
+    │   │   │   step_type='inference'
+    │   │   )
+    │   )
+    )
+    AgentTurnResponseStreamChunk(
+    │   event=TurnResponseEvent(
+    │   │   payload=AgentTurnResponseTurnCompletePayload(
+    │   │   │   event_type='turn_complete',
+    │   │   │   turn=Turn(
+    │   │   │   │   input_messages=[UserMessage(content='Who are you?', role='user', context=None)],
+    │   │   │   │   output_message=CompletionMessage(
+    │   │   │   │   │   content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?',
+    │   │   │   │   │   role='assistant',
+    │   │   │   │   │   stop_reason='end_of_turn',
+    │   │   │   │   │   tool_calls=[]
+    │   │   │   │   ),
+    │   │   │   │   session_id='abd4afea-4324-43f4-9513-cfe3970d92e8',
+    │   │   │   │   started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28722, tzinfo=TzInfo(UTC)),
+    │   │   │   │   steps=[
+    │   │   │   │   │   InferenceStep(
+    │   │   │   │   │   │   api_model_response=CompletionMessage(
+    │   │   │   │   │   │   │   content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?',
+    │   │   │   │   │   │   │   role='assistant',
+    │   │   │   │   │   │   │   stop_reason='end_of_turn',
+    │   │   │   │   │   │   │   tool_calls=[]
+    │   │   │   │   │   │   ),
+    │   │   │   │   │   │   step_id='69831607-fa75-424a-949b-e2049e3129d1',
+    │   │   │   │   │   │   step_type='inference',
+    │   │   │   │   │   │   turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca',
+    │   │   │   │   │   │   completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 716174, tzinfo=TzInfo(UTC)),
+    │   │   │   │   │   │   started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28823, tzinfo=TzInfo(UTC))
+    │   │   │   │   │   )
+    │   │   │   │   ],
+    │   │   │   │   turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca',
+    │   │   │   │   completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 727364, tzinfo=TzInfo(UTC)),
+    │   │   │   │   output_attachments=[]
+    │   │   │   )
+    │   │   )
+    │   )
+    )
+
+
+    Streaming with print helper...
+    inference> Déjà vu! You're asking me again!
+
+    As I mentioned earlier, I'm a computer program designed to simulate conversation and answer questions. I don't have a personal identity or consciousness like a human would. I exist solely as a digital entity, running on computer servers and responding to inputs from users like you.
+
+    I'm a type of artificial intelligence (AI) called a large language model, which means I've been trained on a massive dataset of text from various sources. This training allows me to understand and respond to a wide range of questions and topics.
+
+    My purpose is to provide helpful and accurate information, answer questions, and assist users like you with tasks and conversations. I don't have personal preferences, emotions, or opinions like humans do. My goal is to be informative, neutral, and respectful in my responses.
+
+    So, that's me in a nutshell!
+```
+:::
+
+:::{tab-item} Build a RAG Agent
+
+For our last demo, we can build a RAG agent that can answer questions about the Torchtune project using the documents
+in a vector database.
+#### i. Create the Script
+Create a file `rag_agent.py` and add the following code:
+
+```python
+from llama_stack_client import LlamaStackClient
+from llama_stack_client import Agent, AgentEventLogger
+from llama_stack_client.types import Document
+import uuid
+
+client = LlamaStackClient(base_url="http://localhost:8321")
+
+# Create a vector database instance
+embed_lm = next(m for m in client.models.list() if m.model_type == "embedding")
+embedding_model = embed_lm.identifier
+vector_db_id = f"v{uuid.uuid4().hex}"
+# The VectorDB API is deprecated; the server now returns its own authoritative ID.
+# We capture the correct ID from the response's .identifier attribute.
+vector_db_id = client.vector_dbs.register(
+    vector_db_id=vector_db_id,
+    embedding_model=embedding_model,
+).identifier
+
+# Create Documents
+urls = [
+    "memory_optimizations.rst",
+    "chat.rst",
+    "llama3.rst",
+    "qat_finetune.rst",
+    "lora_finetune.rst",
+]
+documents = [
+    Document(
+        document_id=f"num-{i}",
+        content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
+        mime_type="text/plain",
+        metadata={},
+    )
+    for i, url in enumerate(urls)
+]
+
+# Insert documents
+client.tool_runtime.rag_tool.insert(
+    documents=documents,
+    vector_db_id=vector_db_id,
+    chunk_size_in_tokens=512,
+)
+
+# Get the model being served
+llm = next(
+    m
+    for m in client.models.list()
+    if m.model_type == "llm" and m.provider_id == "ollama"
+)
+model = llm.identifier
+
+# Create the RAG agent
+rag_agent = Agent(
+    client,
+    model=model,
+    instructions="You are a helpful assistant. Use the RAG tool to answer questions as needed.",
+    tools=[
+        {
+            "name": "builtin::rag/knowledge_search",
+            "args": {"vector_db_ids": [vector_db_id]},
+        }
+    ],
+)
+
+session_id = rag_agent.create_session(session_name=f"s{uuid.uuid4().hex}")
+
+turns = ["what is torchtune", "tell me about dora"]
+
+for t in turns:
+    print("user>", t)
+    stream = rag_agent.create_turn(
+        messages=[{"role": "user", "content": t}], session_id=session_id, stream=True
+    )
+    for event in AgentEventLogger().log(stream):
+        event.print()
+```
+#### ii. Run the Script
+Let's run the script using `uv`
+```bash
+uv run python rag_agent.py
+```
+
+```{dropdown} 👋 Click here to see the sample output
+    user> what is torchtune
+    inference> [knowledge_search(query='TorchTune')]
+    tool_execution> Tool:knowledge_search Args:{'query': 'TorchTune'}
+    tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text='Result 1:\nDocument_id:num-1\nContent:  conversational data, :func:`~torchtune.datasets.chat_dataset` seems to be a good fit. ..., type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')]
+    inference> Here is a high-level overview of the text:
+
+    **LoRA Finetuning with PyTorch Tune**
+
+    PyTorch Tune provides a recipe for LoRA (Low-Rank Adaptation) finetuning, which is a technique to adapt pre-trained models to new tasks. The recipe uses the `lora_finetune_distributed` command.
+    ...
+    Overall, DORA is a powerful reinforcement learning algorithm that can learn complex tasks from human demonstrations. However, it requires careful consideration of the challenges and limitations to achieve optimal results.
+```
+:::
+
+::::
+
+**You're Ready to Build Your Own Apps!**
+
+Congrats! 🥳 Now you're ready to [build your own Llama Stack applications](../building_applications/index)! 🚀