Merge branch 'docs_improvement' of github.com:meta-llama/llama-stack into docs_improvement

2025-10-15 06:37:58 +00:00 · 2024-11-05 15:07:28 -08:00 · 2024-11-05 15:07:28 -08:00 · ca95afb449
commit ca95afb449
parent 9ed1c50990 0c90ecb232
15 changed files with 1366 additions and 443 deletions
--- a/docs/source/comprehensive-start.md
+++ b/docs/source/comprehensive-start.md
@ -1,111 +0,0 @@
-
-# Getting Started with Llama Stack
-
-This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our [documentation](../README.md) for more on Llama Stack's capabilities, or visit [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) for example apps.
-
-## Installation
-
-The `llama` CLI tool helps you manage the Llama toolchain & agentic systems. After installing the `llama-stack` package, the `llama` command should be available in your path.
-
-You can install this repository in two ways:
-
-1. **Install as a package**:
-   Install directly from [PyPI](https://pypi.org/project/llama-stack/) with:
-   ```bash
-   pip install llama-stack
-   ```
-
-2. **Install from source**:
-   Follow these steps to install from the source code:
-   ```bash
-   mkdir -p ~/local
-   cd ~/local
-   git clone git@github.com:meta-llama/llama-stack.git
-
-   conda create -n stack python=3.10
-   conda activate stack
-
-   cd llama-stack
-   $CONDA_PREFIX/bin/pip install -e .
-   ```
-
-Refer to the [CLI Reference](./cli_reference.md) for details on Llama CLI commands.
-
-## Starting Up Llama Stack Server
-
-There are two ways to start the Llama Stack server:
-
-1. **Using Docker**:
-   We provide a pre-built Docker image of Llama Stack, available in the [distributions](../distributions/) folder.
-
-   > **Note:** For GPU inference, set environment variables to specify the local directory with your model checkpoints and enable GPU inference.
-   ```bash
-   export LLAMA_CHECKPOINT_DIR=~/.llama
-   ```
-   Download Llama models with:
-   ```
-   llama download --model-id Llama3.1-8B-Instruct
-   ```
-   Start a Docker container with:
-   ```bash
-   cd llama-stack/distributions/meta-reference-gpu
-   docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
-   ```
-
-   **Tip:** For remote providers, use `docker compose up` with scripts in the [distributions folder](../distributions/).
-
-2. **Build->Configure->Run via Conda**:
-   For development, build a LlamaStack distribution from scratch.
-
-   **`llama stack build`**
-   Enter build information interactively:
-   ```bash
-   llama stack build
-   ```
-
-   **`llama stack configure`**
-   Run `llama stack configure <name>` using the name from the build step.
-   ```bash
-   llama stack configure my-local-stack
-   ```
-
-   **`llama stack run`**
-   Start the server with:
-   ```bash
-   llama stack run my-local-stack
-   ```
-
-## Testing with Client
-
-After setup, test the server with a client:
-```bash
-cd /path/to/llama-stack
-conda activate <env>
-
-python -m llama_stack.apis.inference.client localhost 5000
-```
-
-You can also send a POST request:
-```bash
-curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
-    "model": "Llama3.1-8B-Instruct",
-    "messages": [
-        {"role": "system", "content": "You are a helpful assistant."},
-        {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
-    ],
-    "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
-}'
-```
-
-For testing safety, run:
-```bash
-python -m llama_stack.apis.safety.client localhost 5000
-```
-
-Check our client SDKs for various languages: [Python](https://github.com/meta-llama/llama-stack-client-python), [Node](https://github.com/meta-llama/llama-stack-client-node), [Swift](https://github.com/meta-llama/llama-stack-client-swift), and [Kotlin](https://github.com/meta-llama/llama-stack-client-kotlin).
-
-## Advanced Guides
-
-For more on custom Llama Stack distributions, refer to our [Building a Llama Stack Distribution](./building_distro.md) guide.
--- a/docs/zero_to_hero_guide/00_Inference101.ipynb
+++ b/docs/zero_to_hero_guide/00_Inference101.ipynb
@ -0,0 +1,247 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c1e7571c",
+   "metadata": {},
+   "source": [
+    "# Llama Stack Inference Guide\n",
+    "\n",
+    "This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).\n",
+    "\n",
+    "### Table of Contents\n",
+    "1. [Quickstart](#quickstart)\n",
+    "2. [Building Effective Prompts](#building-effective-prompts)\n",
+    "3. [Conversation Loop](#conversation-loop)\n",
+    "4. [Conversation History](#conversation-history)\n",
+    "5. [Streaming Responses](#streaming-responses)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "414301dc",
+   "metadata": {},
+   "source": [
+    "## Quickstart\n",
+    "\n",
+    "This section walks through each step to set up and make a simple text generation request.\n",
+    "\n",
+    "### 1. Set Up the Client\n",
+    "\n",
+    "Begin by importing the necessary components from Llama Stack’s client library:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7a573752",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client import LlamaStackClient\n",
+    "from llama_stack_client.types import SystemMessage, UserMessage\n",
+    "\n",
+    "client = LlamaStackClient(base_url='http://localhost:5000')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86366383",
+   "metadata": {},
+   "source": [
+    "### 2. Create a Chat Completion Request\n",
+    "\n",
+    "Use the `chat_completion` function to define the conversation context. Each message you include should have a specific role and content:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77c29dba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.inference.chat_completion(\n",
+    "    messages=[\n",
+    "        SystemMessage(content='You are a friendly assistant.', role='system'),\n",
+    "        UserMessage(content='Write a two-sentence poem about llama.', role='user')\n",
+    "    ],\n",
+    "    model='Llama3.2-11B-Vision-Instruct',\n",
+    ")\n",
+    "\n",
+    "print(response.completion_message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5f16949",
+   "metadata": {},
+   "source": [
+    "## Building Effective Prompts\n",
+    "\n",
+    "Effective prompt creation (often called 'prompt engineering') is essential for quality responses. Here are best practices for structuring your prompts to get the most out of the Llama Stack model:\n",
+    "\n",
+    "1. **System Messages**: Use `SystemMessage` to set the model's behavior. This is similar to providing top-level instructions for tone, format, or specific behavior.\n",
+    "   - **Example**: `SystemMessage(content='You are a friendly assistant that explains complex topics simply.')`\n",
+    "2. **User Messages**: Define the task or question you want to ask the model with a `UserMessage`. The clearer and more direct you are, the better the response.\n",
+    "   - **Example**: `UserMessage(content='Explain recursion in programming in simple terms.')`\n",
+    "\n",
+    "### Sample Prompt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5c6812da",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.inference.chat_completion(\n",
+    "    messages=[\n",
+    "        SystemMessage(content='You are shakespeare.', role='system'),\n",
+    "        UserMessage(content='Write a two-sentence poem about llama.', role='user')\n",
+    "    ],\n",
+    "    model='Llama3.2-11B-Vision-Instruct',\n",
+    ")\n",
+    "\n",
+    "print(response.completion_message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8690ef0",
+   "metadata": {},
+   "source": [
+    "## Conversation Loop\n",
+    "\n",
+    "To create a continuous conversation loop, where users can input multiple messages in a session, use the following structure. This example runs an asynchronous loop, ending when the user types 'exit,' 'quit,' or 'bye.'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "02211625",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "from llama_stack_client import LlamaStackClient\n",
+    "from llama_stack_client.types import UserMessage\n",
+    "from termcolor import cprint\n",
+    "\n",
+    "client = LlamaStackClient(base_url='http://localhost:5000')\n",
+    "\n",
+    "async def chat_loop():\n",
+    "    while True:\n",
+    "        user_input = input('User> ')\n",
+    "        if user_input.lower() in ['exit', 'quit', 'bye']:\n",
+    "            cprint('Ending conversation. Goodbye!', 'yellow')\n",
+    "            break\n",
+    "\n",
+    "        message = UserMessage(content=user_input, role='user')\n",
+    "        response = client.inference.chat_completion(\n",
+    "            messages=[message],\n",
+    "            model='Llama3.2-11B-Vision-Instruct',\n",
+    "        )\n",
+    "        cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
+    "\n",
+    "asyncio.run(chat_loop())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf0d555",
+   "metadata": {},
+   "source": [
+    "## Conversation History\n",
+    "\n",
+    "Maintaining a conversation history allows the model to retain context from previous interactions. Use a list to accumulate messages, enabling continuity throughout the chat session."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9496f75c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "async def chat_loop():\n",
+    "    conversation_history = []\n",
+    "    while True:\n",
+    "        user_input = input('User> ')\n",
+    "        if user_input.lower() in ['exit', 'quit', 'bye']:\n",
+    "            cprint('Ending conversation. Goodbye!', 'yellow')\n",
+    "            break\n",
+    "\n",
+    "        user_message = UserMessage(content=user_input, role='user')\n",
+    "        conversation_history.append(user_message)\n",
+    "\n",
+    "        response = client.inference.chat_completion(\n",
+    "            messages=conversation_history,\n",
+    "            model='Llama3.2-11B-Vision-Instruct',\n",
+    "        )\n",
+    "        cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
+    "\n",
+    "        assistant_message = UserMessage(content=response.completion_message.content, role='user')\n",
+    "        conversation_history.append(assistant_message)\n",
+    "\n",
+    "asyncio.run(chat_loop())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03fcf5e0",
+   "metadata": {},
+   "source": [
+    "## Streaming Responses\n",
+    "\n",
+    "Llama Stack offers a `stream` parameter in the `chat_completion` function, which allows partial responses to be returned progressively as they are generated. This can enhance user experience by providing immediate feedback without waiting for the entire response to be processed.\n",
+    "\n",
+    "### Example: Streaming Responses"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d119026e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "from llama_stack_client import LlamaStackClient\n",
+    "from llama_stack_client.lib.inference.event_logger import EventLogger\n",
+    "from llama_stack_client.types import UserMessage\n",
+    "from termcolor import cprint\n",
+    "\n",
+    "async def run_main(stream: bool = True):\n",
+    "    client = LlamaStackClient(base_url='http://localhost:5000')\n",
+    "\n",
+    "    message = UserMessage(\n",
+    "        content='hello world, write me a 2 sentence poem about the moon', role='user'\n",
+    "    )\n",
+    "    print(f'User>{message.content}', 'green')\n",
+    "\n",
+    "    response = client.inference.chat_completion(\n",
+    "        messages=[message],\n",
+    "        model='Llama3.2-11B-Vision-Instruct',\n",
+    "        stream=stream,\n",
+    "    )\n",
+    "\n",
+    "    if not stream:\n",
+    "        cprint(f'> Response: {response}', 'cyan')\n",
+    "    else:\n",
+    "        async for log in EventLogger().log(response):\n",
+    "            log.print()\n",
+    "\n",
+    "    models_response = client.models.list()\n",
+    "    print(models_response)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    asyncio.run(run_main())"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/zero_to_hero_guide/00_Local_Cloud_Inference101.ipynb
+++ b/docs/zero_to_hero_guide/00_Local_Cloud_Inference101.ipynb
@ -0,0 +1,201 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a0ed972d",
+   "metadata": {},
+   "source": [
+    "# Switching between Local and Cloud Model with Llama Stack\n",
+    "\n",
+    "This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stack’s `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n",
+    "\n",
+    "### Pre-requisite\n",
+    "Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n",
+    "\n",
+    "### Implementation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df89cff7",
+   "metadata": {},
+   "source": [
+    "#### 1. Set Up Local and Cloud Clients\n",
+    "\n",
+    "Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:5000` and the cloud distribution running on `http://localhost:5001`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7f868dfe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client import LlamaStackClient\n",
+    "\n",
+    "# Configure local and cloud clients\n",
+    "local_client = LlamaStackClient(base_url='http://localhost:5000')\n",
+    "cloud_client = LlamaStackClient(base_url='http://localhost:5001')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "894689c1",
+   "metadata": {},
+   "source": [
+    "#### 2. Client Selection with Fallback\n",
+    "\n",
+    "The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff0c8277",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import httpx\n",
+    "from termcolor import cprint\n",
+    "\n",
+    "async def select_client() -> LlamaStackClient:\n",
+    "    \"\"\"Use local client if available; otherwise, switch to cloud client.\"\"\"\n",
+    "    try:\n",
+    "        async with httpx.AsyncClient() as http_client:\n",
+    "            response = await http_client.get(f'{local_client.base_url}/health')\n",
+    "            if response.status_code == 200:\n",
+    "                cprint('Using local client.', 'yellow')\n",
+    "                return local_client\n",
+    "    except httpx.RequestError:\n",
+    "        pass\n",
+    "    cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n",
+    "    return cloud_client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ccfe66f",
+   "metadata": {},
+   "source": [
+    "#### 3. Generate a Response\n",
+    "\n",
+    "After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e19cc20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client.types import UserMessage\n",
+    "\n",
+    "async def get_llama_response(stream: bool = True):\n",
+    "    client = await select_client()  # Selects the available client\n",
+    "    message = UserMessage(content='hello world, write me a 2 sentence poem about the moon', role='user')\n",
+    "    cprint(f'User> {message.content}', 'green')\n",
+    "\n",
+    "    response = client.inference.chat_completion(\n",
+    "        messages=[message],\n",
+    "        model='Llama3.2-11B-Vision-Instruct',\n",
+    "        stream=stream,\n",
+    "    )\n",
+    "\n",
+    "    if not stream:\n",
+    "        cprint(f'> Response: {response}', 'cyan')\n",
+    "    else:\n",
+    "        # Stream tokens progressively\n",
+    "        async for log in EventLogger().log(response):\n",
+    "            log.print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6edf5e57",
+   "metadata": {},
+   "source": [
+    "#### 4. Run the Asynchronous Response Generation\n",
+    "\n",
+    "Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c10f487e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "\n",
+    "# Initiate the response generation process\n",
+    "asyncio.run(get_llama_response())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56aa9a09",
+   "metadata": {},
+   "source": [
+    "### Complete code\n",
+    "Summing it up, here's the complete code for local-cloud model implementation with Llama Stack:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d9fd74ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "import httpx\n",
+    "from llama_stack_client import LlamaStackClient\n",
+    "from llama_stack_client.lib.inference.event_logger import EventLogger\n",
+    "from llama_stack_client.types import UserMessage\n",
+    "from termcolor import cprint\n",
+    "\n",
+    "local_client = LlamaStackClient(base_url='http://localhost:5000')\n",
+    "cloud_client = LlamaStackClient(base_url='http://localhost:5001')\n",
+    "\n",
+    "async def select_client() -> LlamaStackClient:\n",
+    "    try:\n",
+    "        async with httpx.AsyncClient() as http_client:\n",
+    "            response = await http_client.get(f'{local_client.base_url}/health')\n",
+    "            if response.status_code == 200:\n",
+    "                cprint('Using local client.', 'yellow')\n",
+    "                return local_client\n",
+    "    except httpx.RequestError:\n",
+    "        pass\n",
+    "    cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n",
+    "    return cloud_client\n",
+    "\n",
+    "async def get_llama_response(stream: bool = True):\n",
+    "    client = await select_client()\n",
+    "    message = UserMessage(\n",
+    "        content='hello world, write me a 2 sentence poem about the moon', role='user'\n",
+    "    )\n",
+    "    cprint(f'User> {message.content}', 'green')\n",
+    "\n",
+    "    response = client.inference.chat_completion(\n",
+    "        messages=[message],\n",
+    "        model='Llama3.2-11B-Vision-Instruct',\n",
+    "        stream=stream,\n",
+    "    )\n",
+    "\n",
+    "    if not stream:\n",
+    "        cprint(f'> Response: {response}', 'cyan')\n",
+    "    else:\n",
+    "        async for log in EventLogger().log(response):\n",
+    "            log.print()\n",
+    "\n",
+    "asyncio.run(get_llama_response())"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb
+++ b/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb
--- a/docs/zero_to_hero_guide/02_Image_Chat101.ipynb
+++ b/docs/zero_to_hero_guide/02_Image_Chat101.ipynb
--- a/docs/zero_to_hero_guide/02_Tool_Calling101.ipynb
+++ b/docs/zero_to_hero_guide/02_Tool_Calling101.ipynb
@ -1,318 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Tool Calling"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n",
-    "1. Setting up and using the Brave Search API\n",
-    "2. Creating custom tools\n",
-    "3. Configuring tool prompts and safety settings"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Requirement already satisfied: llama-stack-client in ./.conda/envs/quick/lib/python3.13/site-packages (0.0.48)\n",
-      "Requirement already satisfied: anyio<5,>=3.5.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (4.6.2.post1)\n",
-      "Requirement already satisfied: distro<2,>=1.7.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (1.9.0)\n",
-      "Requirement already satisfied: httpx<1,>=0.23.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (0.27.2)\n",
-      "Requirement already satisfied: pydantic<3,>=1.9.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (2.9.2)\n",
-      "Requirement already satisfied: sniffio in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (1.3.1)\n",
-      "Requirement already satisfied: tabulate>=0.9.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (0.9.0)\n",
-      "Requirement already satisfied: typing-extensions<5,>=4.7 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (4.12.2)\n",
-      "Requirement already satisfied: idna>=2.8 in ./.conda/envs/quick/lib/python3.13/site-packages (from anyio<5,>=3.5.0->llama-stack-client) (3.10)\n",
-      "Requirement already satisfied: certifi in ./.conda/envs/quick/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama-stack-client) (2024.8.30)\n",
-      "Requirement already satisfied: httpcore==1.* in ./.conda/envs/quick/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama-stack-client) (1.0.6)\n",
-      "Requirement already satisfied: h11<0.15,>=0.13 in ./.conda/envs/quick/lib/python3.13/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama-stack-client) (0.14.0)\n",
-      "Requirement already satisfied: annotated-types>=0.6.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama-stack-client) (0.7.0)\n",
-      "Requirement already satisfied: pydantic-core==2.23.4 in ./.conda/envs/quick/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama-stack-client) (2.23.4)\n"
-     ]
-    }
-   ],
-   "source": [
-    "!pip install llama-stack-client --upgrade"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "NameError",
-     "evalue": "name 'Agent' is not defined",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[4], line 23\u001b[0m\n\u001b[1;32m     15\u001b[0m load_dotenv()\n\u001b[1;32m     17\u001b[0m \u001b[38;5;66;03m# Helper function to create an agent with tools\u001b[39;00m\n\u001b[1;32m     18\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mcreate_tool_agent\u001b[39m(\n\u001b[1;32m     19\u001b[0m     client: LlamaStackClient,\n\u001b[1;32m     20\u001b[0m     tools: List[Dict],\n\u001b[1;32m     21\u001b[0m     instructions: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mYou are a helpful assistant\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m     22\u001b[0m     model: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mLlama3.1-8B-Instruct\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m---> 23\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[43mAgent\u001b[49m:\n\u001b[1;32m     24\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"Create an agent with specified tools.\"\"\"\u001b[39;00m\n\u001b[1;32m     25\u001b[0m     agent_config \u001b[38;5;241m=\u001b[39m AgentConfig(\n\u001b[1;32m     26\u001b[0m         model\u001b[38;5;241m=\u001b[39mmodel,\n\u001b[1;32m     27\u001b[0m         instructions\u001b[38;5;241m=\u001b[39minstructions,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     38\u001b[0m         enable_session_persistence\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m,\n\u001b[1;32m     39\u001b[0m     )\n",
-      "\u001b[0;31mNameError\u001b[0m: name 'Agent' is not defined"
-     ]
-    }
-   ],
-   "source": [
-    "import asyncio\n",
-    "import os\n",
-    "from typing import Dict, List, Optional\n",
-    "from dotenv import load_dotenv\n",
-    "\n",
-    "from llama_stack_client import LlamaStackClient\n",
-    "#from llama_stack_client.lib.agents.agent import Agent\n",
-    "from llama_stack_client.lib.agents.event_logger import EventLogger\n",
-    "from llama_stack_client.types.agent_create_params import (\n",
-    "    AgentConfig,\n",
-    "    AgentConfigToolSearchToolDefinition,\n",
-    ")\n",
-    "\n",
-    "# Load environment variables\n",
-    "load_dotenv()\n",
-    "\n",
-    "# Helper function to create an agent with tools\n",
-    "async def create_tool_agent(\n",
-    "    client: LlamaStackClient,\n",
-    "    tools: List[Dict],\n",
-    "    instructions: str = \"You are a helpful assistant\",\n",
-    "    model: str = \"Llama3.1-8B-Instruct\",\n",
-    ") -> Agent:\n",
-    "    \"\"\"Create an agent with specified tools.\"\"\"\n",
-    "    agent_config = AgentConfig(\n",
-    "        model=model,\n",
-    "        instructions=instructions,\n",
-    "        sampling_params={\n",
-    "            \"strategy\": \"greedy\",\n",
-    "            \"temperature\": 1.0,\n",
-    "            \"top_p\": 0.9,\n",
-    "        },\n",
-    "        tools=tools,\n",
-    "        tool_choice=\"auto\",\n",
-    "        tool_prompt_format=\"json\",\n",
-    "        input_shields=[\"llama_guard\"],\n",
-    "        output_shields=[\"llama_guard\"],\n",
-    "        enable_session_persistence=True,\n",
-    "    )\n",
-    "\n",
-    "    return Agent(client, agent_config)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "First, create a `.env` file in your notebook directory with your Brave Search API key:\n",
-    "\n",
-    "```\n",
-    "BRAVE_SEARCH_API_KEY=your_key_here\n",
-    "```\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "async def create_search_agent(client: LlamaStackClient) -> Agent:\n",
-    "    \"\"\"Create an agent with Brave Search capability.\"\"\"\n",
-    "    search_tool = AgentConfigToolSearchToolDefinition(\n",
-    "        type=\"brave_search\",\n",
-    "        engine=\"brave\",\n",
-    "        api_key=os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n",
-    "    )\n",
-    "\n",
-    "    return await create_tool_agent(\n",
-    "        client=client,\n",
-    "        tools=[search_tool],\n",
-    "        instructions=\"\"\"\n",
-    "        You are a research assistant that can search the web.\n",
-    "        Always cite your sources with URLs when providing information.\n",
-    "        Format your responses as:\n",
-    "\n",
-    "        FINDINGS:\n",
-    "        [Your summary here]\n",
-    "\n",
-    "        SOURCES:\n",
-    "        - [Source title](URL)\n",
-    "        \"\"\"\n",
-    "    )\n",
-    "\n",
-    "# Example usage\n",
-    "async def search_example():\n",
-    "    client = LlamaStackClient(base_url=\"http://localhost:8000\")\n",
-    "    agent = await create_search_agent(client)\n",
-    "\n",
-    "    # Create a session\n",
-    "    session_id = agent.create_session(\"search-session\")\n",
-    "\n",
-    "    # Example queries\n",
-    "    queries = [\n",
-    "        \"What are the latest developments in quantum computing?\",\n",
-    "        \"Who won the most recent Super Bowl?\",\n",
-    "    ]\n",
-    "\n",
-    "    for query in queries:\n",
-    "        print(f\"\\nQuery: {query}\")\n",
-    "        print(\"-\" * 50)\n",
-    "\n",
-    "        response = agent.create_turn(\n",
-    "            messages=[{\"role\": \"user\", \"content\": query}],\n",
-    "            session_id=session_id,\n",
-    "        )\n",
-    "\n",
-    "        async for log in EventLogger().log(response):\n",
-    "            log.print()\n",
-    "\n",
-    "# Run the example (in Jupyter, use asyncio.run())\n",
-    "await search_example()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 3. Custom Tool Creation\n",
-    "\n",
-    "Let's create a custom weather tool:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from typing import TypedDict, Optional\n",
-    "from datetime import datetime\n",
-    "\n",
-    "# Define tool types\n",
-    "class WeatherInput(TypedDict):\n",
-    "    location: str\n",
-    "    date: Optional[str]\n",
-    "\n",
-    "class WeatherOutput(TypedDict):\n",
-    "    temperature: float\n",
-    "    conditions: str\n",
-    "    humidity: float\n",
-    "\n",
-    "class WeatherTool:\n",
-    "    \"\"\"Example custom tool for weather information.\"\"\"\n",
-    "\n",
-    "    def __init__(self, api_key: Optional[str] = None):\n",
-    "        self.api_key = api_key\n",
-    "\n",
-    "    async def get_weather(self, location: str, date: Optional[str] = None) -> WeatherOutput:\n",
-    "        \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n",
-    "        # Mock implementation\n",
-    "        return {\n",
-    "            \"temperature\": 72.5,\n",
-    "            \"conditions\": \"partly cloudy\",\n",
-    "            \"humidity\": 65.0\n",
-    "        }\n",
-    "\n",
-    "    async def __call__(self, input_data: WeatherInput) -> WeatherOutput:\n",
-    "        \"\"\"Make the tool callable with structured input.\"\"\"\n",
-    "        return await self.get_weather(\n",
-    "            location=input_data[\"location\"],\n",
-    "            date=input_data.get(\"date\")\n",
-    "        )\n",
-    "\n",
-    "async def create_weather_agent(client: LlamaStackClient) -> Agent:\n",
-    "    \"\"\"Create an agent with weather tool capability.\"\"\"\n",
-    "    weather_tool = {\n",
-    "        \"type\": \"function\",\n",
-    "        \"function\": {\n",
-    "            \"name\": \"get_weather\",\n",
-    "            \"description\": \"Get weather information for a location\",\n",
-    "            \"parameters\": {\n",
-    "                \"type\": \"object\",\n",
-    "                \"properties\": {\n",
-    "                    \"location\": {\n",
-    "                        \"type\": \"string\",\n",
-    "                        \"description\": \"City or location name\"\n",
-    "                    },\n",
-    "                    \"date\": {\n",
-    "                        \"type\": \"string\",\n",
-    "                        \"description\": \"Optional date (YYYY-MM-DD)\",\n",
-    "                        \"format\": \"date\"\n",
-    "                    }\n",
-    "                },\n",
-    "                \"required\": [\"location\"]\n",
-    "            }\n",
-    "        },\n",
-    "        \"implementation\": WeatherTool()\n",
-    "    }\n",
-    "\n",
-    "    return await create_tool_agent(\n",
-    "        client=client,\n",
-    "        tools=[weather_tool],\n",
-    "        instructions=\"\"\"\n",
-    "        You are a weather assistant that can provide weather information.\n",
-    "        Always specify the location clearly in your responses.\n",
-    "        Include both temperature and conditions in your summaries.\n",
-    "        \"\"\"\n",
-    "    )\n",
-    "\n",
-    "# Example usage\n",
-    "async def weather_example():\n",
-    "    client = LlamaStackClient(base_url=\"http://localhost:8000\")\n",
-    "    agent = await create_weather_agent(client)\n",
-    "\n",
-    "    session_id = agent.create_session(\"weather-session\")\n",
-    "\n",
-    "    queries = [\n",
-    "        \"What's the weather like in San Francisco?\",\n",
-    "        \"Tell me the weather in Tokyo tomorrow\",\n",
-    "    ]\n",
-    "\n",
-    "    for query in queries:\n",
-    "        print(f\"\\nQuery: {query}\")\n",
-    "        print(\"-\" * 50)\n",
-    "\n",
-    "        response = agent.create_turn(\n",
-    "            messages=[{\"role\": \"user\", \"content\": query}],\n",
-    "            session_id=session_id,\n",
-    "        )\n",
-    "\n",
-    "        async for log in EventLogger().log(response):\n",
-    "            log.print()\n",
-    "\n",
-    "# Run the example\n",
-    "await weather_example()"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.13.0"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
--- a/docs/zero_to_hero_guide/03_Tool_Calling101.ipynb
+++ b/docs/zero_to_hero_guide/03_Tool_Calling101.ipynb
@ -0,0 +1,349 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Tool Calling"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n",
+    "1. Setting up and using the Brave Search API\n",
+    "2. Creating custom tools\n",
+    "3. Configuring tool prompts and safety settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "import os\n",
+    "from typing import Dict, List, Optional\n",
+    "from dotenv import load_dotenv\n",
+    "\n",
+    "from llama_stack_client import LlamaStackClient\n",
+    "from llama_stack_client.lib.agents.agent import Agent\n",
+    "from llama_stack_client.lib.agents.event_logger import EventLogger\n",
+    "from llama_stack_client.types.agent_create_params import (\n",
+    "    AgentConfig,\n",
+    "    AgentConfigToolSearchToolDefinition,\n",
+    ")\n",
+    "\n",
+    "# Load environment variables\n",
+    "load_dotenv()\n",
+    "\n",
+    "# Helper function to create an agent with tools\n",
+    "async def create_tool_agent(\n",
+    "    client: LlamaStackClient,\n",
+    "    tools: List[Dict],\n",
+    "    instructions: str = \"You are a helpful assistant\",\n",
+    "    model: str = \"Llama3.1-8B-Instruct\",\n",
+    ") -> Agent:\n",
+    "    \"\"\"Create an agent with specified tools.\"\"\"\n",
+    "    agent_config = AgentConfig(\n",
+    "        model=model,\n",
+    "        instructions=instructions,\n",
+    "        sampling_params={\n",
+    "            \"strategy\": \"greedy\",\n",
+    "            \"temperature\": 1.0,\n",
+    "            \"top_p\": 0.9,\n",
+    "        },\n",
+    "        tools=tools,\n",
+    "        tool_choice=\"auto\",\n",
+    "        tool_prompt_format=\"json\",\n",
+    "        enable_session_persistence=True,\n",
+    "    )\n",
+    "\n",
+    "    return Agent(client, agent_config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, create a `.env` file in your notebook directory with your Brave Search API key:\n",
+    "\n",
+    "```\n",
+    "BRAVE_SEARCH_API_KEY=your_key_here\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Query: What are the latest developments in quantum computing?\n",
+      "--------------------------------------------------\n",
+      "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[33mF\u001b[0m\u001b[33mIND\u001b[0m\u001b[33mINGS\u001b[0m\u001b[33m:\n",
+      "\u001b[0m\u001b[33mThe\u001b[0m\u001b[33m latest\u001b[0m\u001b[33m developments\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computing\u001b[0m\u001b[33m include\u001b[0m\u001b[33m advancements\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m processors\u001b[0m\u001b[33m,\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m algorithms\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m error\u001b[0m\u001b[33m correction\u001b[0m\u001b[33m.\u001b[0m\u001b[33m Researchers\u001b[0m\u001b[33m have\u001b[0m\u001b[33m made\u001b[0m\u001b[33m significant\u001b[0m\u001b[33m progress\u001b[0m\u001b[33m in\u001b[0m\u001b[33m developing\u001b[0m\u001b[33m more\u001b[0m\u001b[33m powerful\u001b[0m\u001b[33m and\u001b[0m\u001b[33m reliable\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m with\u001b[0m\u001b[33m some\u001b[0m\u001b[33m companies\u001b[0m\u001b[33m already\u001b[0m\u001b[33m showcasing\u001b[0m\u001b[33m \u001b[0m\u001b[33m100\u001b[0m\u001b[33m-q\u001b[0m\u001b[33mubit\u001b[0m\u001b[33m and\u001b[0m\u001b[33m \u001b[0m\u001b[33m127\u001b[0m\u001b[33m-q\u001b[0m\u001b[33mubit\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m processors\u001b[0m\u001b[33m (\u001b[0m\u001b[33mIBM\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m;\u001b[0m\u001b[33m Google\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m These\u001b[0m\u001b[33m advancements\u001b[0m\u001b[33m have\u001b[0m\u001b[33m led\u001b[0m\u001b[33m to\u001b[0m\u001b[33m breakthrough\u001b[0m\u001b[33ms\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m simulations\u001b[0m\u001b[33m,\u001b[0m\u001b[33m machine\u001b[0m\u001b[33m learning\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m optimization\u001b[0m\u001b[33m problems\u001b[0m\u001b[33m (\u001b[0m\u001b[33mB\u001b[0m\u001b[33mhart\u001b[0m\u001b[33mi\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m;\u001b[0m\u001b[33m Zhang\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m Additionally\u001b[0m\u001b[33m,\u001b[0m\u001b[33m there\u001b[0m\u001b[33m have\u001b[0m\u001b[33m been\u001b[0m\u001b[33m significant\u001b[0m\u001b[33m improvements\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m error\u001b[0m\u001b[33m correction\u001b[0m\u001b[33m,\u001b[0m\u001b[33m which\u001b[0m\u001b[33m is\u001b[0m\u001b[33m essential\u001b[0m\u001b[33m for\u001b[0m\u001b[33m large\u001b[0m\u001b[33m-scale\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computing\u001b[0m\u001b[33m (\u001b[0m\u001b[33mG\u001b[0m\u001b[33mottes\u001b[0m\u001b[33mman\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\n",
+      "\n",
+      "\u001b[0m\u001b[33mS\u001b[0m\u001b[33mOURCES\u001b[0m\u001b[33m:\n",
+      "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m IBM\u001b[0m\u001b[33m Quantum\u001b[0m\u001b[33m:\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Process\u001b[0m\u001b[33mors\u001b[0m\u001b[33m\"\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mwww\u001b[0m\u001b[33m.ibm\u001b[0m\u001b[33m.com\u001b[0m\u001b[33m/\u001b[0m\u001b[33mquant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m/com\u001b[0m\u001b[33mputer\u001b[0m\u001b[33m/)\n",
+      "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Google\u001b[0m\u001b[33m Quantum\u001b[0m\u001b[33m AI\u001b[0m\u001b[33m Lab\u001b[0m\u001b[33m:\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Process\u001b[0m\u001b[33mors\u001b[0m\u001b[33m\"\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mquant\u001b[0m\u001b[33mum\u001b[0m\u001b[33mai\u001b[0m\u001b[33m.google\u001b[0m\u001b[33m/al\u001b[0m\u001b[33mphabet\u001b[0m\u001b[33m/sub\u001b[0m\u001b[33m-page\u001b[0m\u001b[33m-\u001b[0m\u001b[33m1\u001b[0m\u001b[33m/)\n",
+      "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Bh\u001b[0m\u001b[33marti\u001b[0m\u001b[33m,\u001b[0m\u001b[33m K\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Computing\u001b[0m\u001b[33m:\u001b[0m\u001b[33m A\u001b[0m\u001b[33m Review\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Recent\u001b[0m\u001b[33m Advances\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m Journal\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Physics\u001b[0m\u001b[33m:\u001b[0m\u001b[33m Conference\u001b[0m\u001b[33m Series\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m218\u001b[0m\u001b[33m5\u001b[0m\u001b[33m(\u001b[0m\u001b[33m1\u001b[0m\u001b[33m),\u001b[0m\u001b[33m \u001b[0m\u001b[33m012\u001b[0m\u001b[33m001\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mi\u001b[0m\u001b[33mop\u001b[0m\u001b[33mscience\u001b[0m\u001b[33m.i\u001b[0m\u001b[33mop\u001b[0m\u001b[33m.org\u001b[0m\u001b[33m/article\u001b[0m\u001b[33m/\u001b[0m\u001b[33m10\u001b[0m\u001b[33m.\u001b[0m\u001b[33m108\u001b[0m\u001b[33m8\u001b[0m\u001b[33m/\u001b[0m\u001b[33m174\u001b[0m\u001b[33m2\u001b[0m\u001b[33m-\u001b[0m\u001b[33m659\u001b[0m\u001b[33m6\u001b[0m\u001b[33m/\u001b[0m\u001b[33m218\u001b[0m\u001b[33m5\u001b[0m\u001b[33m/\u001b[0m\u001b[33m1\u001b[0m\u001b[33m/\u001b[0m\u001b[33m012\u001b[0m\u001b[33m001\u001b[0m\u001b[33m)\n",
+      "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Zhang\u001b[0m\u001b[33m,\u001b[0m\u001b[33m Y\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Algorithms\u001b[0m\u001b[33m for\u001b[0m\u001b[33m Machine\u001b[0m\u001b[33m Learning\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m Journal\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Machine\u001b[0m\u001b[33m Learning\u001b[0m\u001b[33m Research\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m23\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m1\u001b[0m\u001b[33m-\u001b[0m\u001b[33m36\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mj\u001b[0m\u001b[33mml\u001b[0m\u001b[33mr\u001b[0m\u001b[33m.org\u001b[0m\u001b[33m/p\u001b[0m\u001b[33mapers\u001b[0m\u001b[33m/v\u001b[0m\u001b[33m23\u001b[0m\u001b[33m/\u001b[0m\u001b[33m20\u001b[0m\u001b[33m-\u001b[0m\u001b[33m065\u001b[0m\u001b[33m.html\u001b[0m\u001b[33m)\n",
+      "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m G\u001b[0m\u001b[33mottes\u001b[0m\u001b[33mman\u001b[0m\u001b[33m,\u001b[0m\u001b[33m D\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Error\u001b[0m\u001b[33m Correction\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m In\u001b[0m\u001b[33m Encyclopedia\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Complexity\u001b[0m\u001b[33m and\u001b[0m\u001b[33m Systems\u001b[0m\u001b[33m Science\u001b[0m\u001b[33m (\u001b[0m\u001b[33mpp\u001b[0m\u001b[33m.\u001b[0m\u001b[33m \u001b[0m\u001b[33m1\u001b[0m\u001b[33m-\u001b[0m\u001b[33m13\u001b[0m\u001b[33m).\u001b[0m\u001b[33m Springer\u001b[0m\u001b[33m,\u001b[0m\u001b[33m New\u001b[0m\u001b[33m York\u001b[0m\u001b[33m,\u001b[0m\u001b[33m NY\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mlink\u001b[0m\u001b[33m.spring\u001b[0m\u001b[33mer\u001b[0m\u001b[33m.com\u001b[0m\u001b[33m/reference\u001b[0m\u001b[33mwork\u001b[0m\u001b[33mentry\u001b[0m\u001b[33m/\u001b[0m\u001b[33m10\u001b[0m\u001b[33m.\u001b[0m\u001b[33m100\u001b[0m\u001b[33m7\u001b[0m\u001b[33m/\u001b[0m\u001b[33m978\u001b[0m\u001b[33m-\u001b[0m\u001b[33m0\u001b[0m\u001b[33m-\u001b[0m\u001b[33m387\u001b[0m\u001b[33m-\u001b[0m\u001b[33m758\u001b[0m\u001b[33m88\u001b[0m\u001b[33m-\u001b[0m\u001b[33m6\u001b[0m\u001b[33m_\u001b[0m\u001b[33m447\u001b[0m\u001b[33m)\u001b[0m\u001b[97m\u001b[0m\n",
+      "\u001b[30m\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "async def create_search_agent(client: LlamaStackClient) -> Agent:\n",
+    "    \"\"\"Create an agent with Brave Search capability.\"\"\"\n",
+    "    search_tool = AgentConfigToolSearchToolDefinition(\n",
+    "        type=\"brave_search\",\n",
+    "        engine=\"brave\",\n",
+    "        api_key=\"dummy_value\"#os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n",
+    "    )\n",
+    "\n",
+    "    return await create_tool_agent(\n",
+    "        client=client,\n",
+    "        tools=[search_tool],\n",
+    "        instructions=\"\"\"\n",
+    "        You are a research assistant that can search the web.\n",
+    "        Always cite your sources with URLs when providing information.\n",
+    "        Format your responses as:\n",
+    "\n",
+    "        FINDINGS:\n",
+    "        [Your summary here]\n",
+    "\n",
+    "        SOURCES:\n",
+    "        - [Source title](URL)\n",
+    "        \"\"\"\n",
+    "    )\n",
+    "\n",
+    "# Example usage\n",
+    "async def search_example():\n",
+    "    client = LlamaStackClient(base_url=\"http://localhost:5001\")\n",
+    "    agent = await create_search_agent(client)\n",
+    "\n",
+    "    # Create a session\n",
+    "    session_id = agent.create_session(\"search-session\")\n",
+    "\n",
+    "    # Example queries\n",
+    "    queries = [\n",
+    "        \"What are the latest developments in quantum computing?\",\n",
+    "        #\"Who won the most recent Super Bowl?\",\n",
+    "    ]\n",
+    "\n",
+    "    for query in queries:\n",
+    "        print(f\"\\nQuery: {query}\")\n",
+    "        print(\"-\" * 50)\n",
+    "\n",
+    "        response = agent.create_turn(\n",
+    "            messages=[{\"role\": \"user\", \"content\": query}],\n",
+    "            session_id=session_id,\n",
+    "        )\n",
+    "\n",
+    "        async for log in EventLogger().log(response):\n",
+    "            log.print()\n",
+    "\n",
+    "# Run the example (in Jupyter, use asyncio.run())\n",
+    "await search_example()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Custom Tool Creation\n",
+    "\n",
+    "Let's create a custom weather tool:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Query: What's the weather like in San Francisco?\n",
+      "--------------------------------------------------\n",
+      "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[33m{\n",
+      "\u001b[0m\u001b[33m   \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mtype\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mfunction\u001b[0m\u001b[33m\",\n",
+      "\u001b[0m\u001b[33m   \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mname\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mget\u001b[0m\u001b[33m_weather\u001b[0m\u001b[33m\",\n",
+      "\u001b[0m\u001b[33m   \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mparameters\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m {\n",
+      "\u001b[0m\u001b[33m       \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mlocation\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mSan\u001b[0m\u001b[33m Francisco\u001b[0m\u001b[33m\"\n",
+      "\u001b[0m\u001b[33m   \u001b[0m\u001b[33m }\n",
+      "\u001b[0m\u001b[33m}\u001b[0m\u001b[97m\u001b[0m\n"
+     ]
+    },
+    {
+     "ename": "AttributeError",
+     "evalue": "'WeatherTool' object has no attribute 'run'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[27], line 113\u001b[0m\n\u001b[1;32m    110\u001b[0m nest_asyncio\u001b[38;5;241m.\u001b[39mapply()\n\u001b[1;32m    112\u001b[0m \u001b[38;5;66;03m# Run the example\u001b[39;00m\n\u001b[0;32m--> 113\u001b[0m \u001b[38;5;28;01mawait\u001b[39;00m weather_example()\n",
+      "Cell \u001b[0;32mIn[27], line 105\u001b[0m, in \u001b[0;36mweather_example\u001b[0;34m()\u001b[0m\n\u001b[1;32m     98\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m-\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m*\u001b[39m \u001b[38;5;241m50\u001b[39m)\n\u001b[1;32m    100\u001b[0m response \u001b[38;5;241m=\u001b[39m agent\u001b[38;5;241m.\u001b[39mcreate_turn(\n\u001b[1;32m    101\u001b[0m     messages\u001b[38;5;241m=\u001b[39m[{\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrole\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124muser\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcontent\u001b[39m\u001b[38;5;124m\"\u001b[39m: query}],\n\u001b[1;32m    102\u001b[0m     session_id\u001b[38;5;241m=\u001b[39msession_id,\n\u001b[1;32m    103\u001b[0m )\n\u001b[0;32m--> 105\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mfor\u001b[39;00m log \u001b[38;5;129;01min\u001b[39;00m EventLogger()\u001b[38;5;241m.\u001b[39mlog(response):\n\u001b[1;32m    106\u001b[0m     log\u001b[38;5;241m.\u001b[39mprint()\n",
+      "File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/event_logger.py:55\u001b[0m, in \u001b[0;36mEventLogger.log\u001b[0;34m(self, event_generator)\u001b[0m\n\u001b[1;32m     52\u001b[0m previous_event_type \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m     53\u001b[0m previous_step_type \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m---> 55\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mfor\u001b[39;00m chunk \u001b[38;5;129;01min\u001b[39;00m event_generator:\n\u001b[1;32m     56\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(chunk, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mevent\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m     57\u001b[0m         \u001b[38;5;66;03m# Need to check for custom tool first\u001b[39;00m\n\u001b[1;32m     58\u001b[0m         \u001b[38;5;66;03m# since it does not produce event but instead\u001b[39;00m\n\u001b[1;32m     59\u001b[0m         \u001b[38;5;66;03m# a Message\u001b[39;00m\n\u001b[1;32m     60\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(chunk, ToolResponseMessage):\n",
+      "File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/agent.py:76\u001b[0m, in \u001b[0;36mAgent.create_turn\u001b[0;34m(self, messages, attachments, session_id)\u001b[0m\n\u001b[1;32m     74\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m     75\u001b[0m     tool \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcustom_tools[tool_call\u001b[38;5;241m.\u001b[39mtool_name]\n\u001b[0;32m---> 76\u001b[0m     result_messages \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexecute_custom_tool(tool, message)\n\u001b[1;32m     77\u001b[0m     next_message \u001b[38;5;241m=\u001b[39m result_messages[\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m     79\u001b[0m \u001b[38;5;28;01myield\u001b[39;00m next_message\n",
+      "File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/agent.py:84\u001b[0m, in \u001b[0;36mAgent.execute_custom_tool\u001b[0;34m(self, tool, message)\u001b[0m\n\u001b[1;32m     81\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mexecute_custom_tool\u001b[39m(\n\u001b[1;32m     82\u001b[0m     \u001b[38;5;28mself\u001b[39m, tool: CustomTool, message: Union[UserMessage, ToolResponseMessage]\n\u001b[1;32m     83\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m List[Union[UserMessage, ToolResponseMessage]]:\n\u001b[0;32m---> 84\u001b[0m     result_messages \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[43mtool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m([message])\n\u001b[1;32m     85\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m result_messages\n",
+      "\u001b[0;31mAttributeError\u001b[0m: 'WeatherTool' object has no attribute 'run'"
+     ]
+    }
+   ],
+   "source": [
+    "from typing import TypedDict, Optional, Dict, Any\n",
+    "from datetime import datetime\n",
+    "class WeatherTool:\n",
+    "    \"\"\"Example custom tool for weather information.\"\"\"\n",
+    "    \n",
+    "    def get_name(self) -> str:\n",
+    "        return \"get_weather\"\n",
+    "    \n",
+    "    def get_description(self) -> str:\n",
+    "        return \"Get weather information for a location\"\n",
+    "        \n",
+    "    def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
+    "        return {\n",
+    "            \"location\": ToolParamDefinitionParam(\n",
+    "                param_type=\"str\",\n",
+    "                description=\"City or location name\",\n",
+    "                required=True\n",
+    "            ),\n",
+    "            \"date\": ToolParamDefinitionParam(\n",
+    "                param_type=\"str\",\n",
+    "                description=\"Optional date (YYYY-MM-DD)\",\n",
+    "                required=False\n",
+    "            )\n",
+    "        }\n",
+    "    \n",
+    "    async def run_impl(self, location: str, date: Optional[str] = None) -> Dict[str, Any]:\n",
+    "        \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n",
+    "        # Mock implementation\n",
+    "        return {\n",
+    "            \"temperature\": 72.5,\n",
+    "            \"conditions\": \"partly cloudy\",\n",
+    "            \"humidity\": 65.0\n",
+    "        }\n",
+    "\n",
+    "async def create_weather_agent(client: LlamaStackClient) -> Agent:\n",
+    "    \"\"\"Create an agent with weather tool capability.\"\"\"\n",
+    "    agent_config = AgentConfig(\n",
+    "        model=\"Llama3.1-8B-Instruct\",\n",
+    "        instructions=\"\"\"\n",
+    "        You are a weather assistant that can provide weather information.\n",
+    "        Always specify the location clearly in your responses.\n",
+    "        Include both temperature and conditions in your summaries.\n",
+    "        \"\"\",\n",
+    "        sampling_params={\n",
+    "            \"strategy\": \"greedy\",\n",
+    "            \"temperature\": 1.0,\n",
+    "            \"top_p\": 0.9,\n",
+    "        },\n",
+    "        tools=[\n",
+    "            {\n",
+    "                \"function_name\": \"get_weather\",\n",
+    "                \"description\": \"Get weather information for a location\",\n",
+    "                \"parameters\": {\n",
+    "                    \"location\": {\n",
+    "                        \"param_type\": \"str\",\n",
+    "                        \"description\": \"City or location name\",\n",
+    "                        \"required\": True,\n",
+    "                    },\n",
+    "                    \"date\": {\n",
+    "                        \"param_type\": \"str\",\n",
+    "                        \"description\": \"Optional date (YYYY-MM-DD)\",\n",
+    "                        \"required\": False,\n",
+    "                    },\n",
+    "                },\n",
+    "                \"type\": \"function_call\",\n",
+    "            }\n",
+    "        ],\n",
+    "        tool_choice=\"auto\",\n",
+    "        tool_prompt_format=\"json\",\n",
+    "        input_shields=[],\n",
+    "        output_shields=[],\n",
+    "        enable_session_persistence=True\n",
+    "    )\n",
+    "    \n",
+    "    # Create the agent with the tool\n",
+    "    weather_tool = WeatherTool()\n",
+    "    agent = Agent(\n",
+    "        client=client,\n",
+    "        agent_config=agent_config,\n",
+    "        custom_tools=[weather_tool]\n",
+    "    )\n",
+    "    \n",
+    "    return agent\n",
+    "\n",
+    "# Example usage\n",
+    "async def weather_example():\n",
+    "    client = LlamaStackClient(base_url=\"http://localhost:5001\")\n",
+    "    agent = await create_weather_agent(client)\n",
+    "    session_id = agent.create_session(\"weather-session\")\n",
+    "    \n",
+    "    queries = [\n",
+    "        \"What's the weather like in San Francisco?\",\n",
+    "        \"Tell me the weather in Tokyo tomorrow\",\n",
+    "    ]\n",
+    "    \n",
+    "    for query in queries:\n",
+    "        print(f\"\\nQuery: {query}\")\n",
+    "        print(\"-\" * 50)\n",
+    "        \n",
+    "        response = agent.create_turn(\n",
+    "            messages=[{\"role\": \"user\", \"content\": query}],\n",
+    "            session_id=session_id,\n",
+    "        )\n",
+    "        \n",
+    "        async for log in EventLogger().log(response):\n",
+    "            log.print()\n",
+    "\n",
+    "# For Jupyter notebooks\n",
+    "import nest_asyncio\n",
+    "nest_asyncio.apply()\n",
+    "\n",
+    "# Run the example\n",
+    "await weather_example()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/docs/zero_to_hero_guide/04_Memory101.ipynb
+++ b/docs/zero_to_hero_guide/04_Memory101.ipynb
--- a/docs/zero_to_hero_guide/05_Safety101.ipynb
+++ b/docs/zero_to_hero_guide/05_Safety101.ipynb
--- a/docs/zero_to_hero_guide/06_Agents101.ipynb
+++ b/docs/zero_to_hero_guide/06_Agents101.ipynb
--- a/docs/zero_to_hero_guide/Tool_Calling101.ipynb
+++ b/docs/zero_to_hero_guide/Tool_Calling101.ipynb
@ -0,0 +1,558 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Getting Started with LlamaStack: Tool Calling Tutorial\n",
+    "\n",
+    "Welcome! This notebook will guide you through creating and using custom tools with LlamaStack.\n",
+    "We'll start with the basics and work our way up to more complex examples.\n",
+    "\n",
+    "Table of Contents:\n",
+    "1. Setup and Installation\n",
+    "2. Understanding Tool Basics\n",
+    "3. Creating Your First Tool\n",
+    "4. Building a Mock Weather Tool\n",
+    "5. Setting Up the LlamaStack Agent\n",
+    "6. Running Examples\n",
+    "7. Next Steps\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Setup\n",
+    "#### Before we begin, let's import all the required packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import asyncio\n",
+    "import json\n",
+    "from typing import Dict\n",
+    "from datetime import datetime"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# LlamaStack specific imports\n",
+    "from llama_stack_client import LlamaStackClient\n",
+    "from llama_stack_client.lib.agents.agent import Agent\n",
+    "from llama_stack_client.lib.agents.event_logger import EventLogger\n",
+    "from llama_stack_client.types.agent_create_params import AgentConfig\n",
+    "from llama_stack_client.types.tool_param_definition_param import ToolParamDefinitionParam"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Understanding Tool Basics\n",
+    "\n",
+    "In LlamaStack, a tool is like a special function that our AI assistant can use. Think of it as giving the AI a new \n",
+    "capability, like using a calculator or checking the weather.\n",
+    "\n",
+    "Every tool needs:\n",
+    "- A name: What we call the tool\n",
+    "- A description: What the tool does\n",
+    "- Parameters: What information the tool needs to work\n",
+    "- Implementation: The actual code that does the work\n",
+    "\n",
+    "Let's create a base class that all our tools will inherit from:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class SingleMessageCustomTool:\n",
+    "    \"\"\"Base class for all our custom tools\"\"\"\n",
+    "    \n",
+    "    async def run(self, messages=None):\n",
+    "        \"\"\"\n",
+    "        Main entry point for running the tool\n",
+    "        Args:\n",
+    "            messages: List of messages (can be None for backward compatibility)\n",
+    "        \"\"\"\n",
+    "        if messages and len(messages) > 0:\n",
+    "            # Extract parameters from the message if it contains function parameters\n",
+    "            message = messages[0]\n",
+    "            if hasattr(message, 'function_parameters'):\n",
+    "                return await self.run_impl(**message.function_parameters)\n",
+    "            else:\n",
+    "                return await self.run_impl()\n",
+    "        return await self.run_impl()\n",
+    "    \n",
+    "    async def run_impl(self, **kwargs):\n",
+    "        \"\"\"Each tool will implement this method with their specific logic\"\"\"\n",
+    "        raise NotImplementedError()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Creating Your First Tool: Calculator\n",
+    " \n",
+    "Let's create a simple calculator tool. This will help us understand the basic structure of a tool.\n",
+    "Our calculator can:\n",
+    "- Add\n",
+    "- Subtract\n",
+    "- Multiply\n",
+    "- Divide\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Calculator Tool implementation\n",
+    "class CalculatorTool(SingleMessageCustomTool):\n",
+    "    \"\"\"A simple calculator tool that can perform basic math operations\"\"\"\n",
+    "    \n",
+    "    def get_name(self) -> str:\n",
+    "        return \"calculator\"\n",
+    "    \n",
+    "    def get_description(self) -> str:\n",
+    "        return \"Perform basic arithmetic operations (add, subtract, multiply, divide)\"\n",
+    "    \n",
+    "    def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
+    "        return {\n",
+    "            \"operation\": ToolParamDefinitionParam(\n",
+    "                param_type=\"str\",\n",
+    "                description=\"Operation to perform (add, subtract, multiply, divide)\",\n",
+    "                required=True\n",
+    "            ),\n",
+    "            \"x\": ToolParamDefinitionParam(\n",
+    "                param_type=\"float\",\n",
+    "                description=\"First number\",\n",
+    "                required=True\n",
+    "            ),\n",
+    "            \"y\": ToolParamDefinitionParam(\n",
+    "                param_type=\"float\",\n",
+    "                description=\"Second number\",\n",
+    "                required=True\n",
+    "            )\n",
+    "        }\n",
+    "    \n",
+    "    async def run_impl(self, operation: str = None, x: float = None, y: float = None):\n",
+    "        \"\"\"The actual implementation of our calculator\"\"\"\n",
+    "        if not all([operation, x, y]):\n",
+    "            return json.dumps({\"error\": \"Missing required parameters\"})\n",
+    "            \n",
+    "        # Dictionary of math operations\n",
+    "        operations = {\n",
+    "            \"add\": lambda a, b: a + b,\n",
+    "            \"subtract\": lambda a, b: a - b,\n",
+    "            \"multiply\": lambda a, b: a * b,\n",
+    "            \"divide\": lambda a, b: a / b if b != 0 else \"Error: Division by zero\"\n",
+    "        }\n",
+    "        \n",
+    "        # Check if the operation is valid\n",
+    "        if operation not in operations:\n",
+    "            return json.dumps({\"error\": f\"Unknown operation '{operation}'\"})\n",
+    "        \n",
+    "        try:\n",
+    "            # Convert string inputs to float if needed\n",
+    "            x = float(x) if isinstance(x, str) else x\n",
+    "            y = float(y) if isinstance(y, str) else y\n",
+    "            \n",
+    "            # Perform the calculation\n",
+    "            result = operations[operation](x, y)\n",
+    "            return json.dumps({\"result\": result})\n",
+    "        except ValueError:\n",
+    "            return json.dumps({\"error\": \"Invalid number format\"})\n",
+    "        except Exception as e:\n",
+    "            return json.dumps({\"error\": str(e)})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Building a Mock Weather Tool\n",
+    " \n",
+    "Now let's create something a bit more complex: a weather tool! \n",
+    "While this is just a mock version (it doesn't actually fetch real weather data),\n",
+    "it shows how you might structure a tool that interfaces with an external API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class WeatherTool(SingleMessageCustomTool):\n",
+    "    \"async def run_single_query(agent, session_id, query: str):\n",
+    "    \"\"\"Run a single query through our agent with complete interaction cycle\"\"\"\n",
+    "    print(\"\\n\" + \"=\"*50)\n",
+    "    print(f\"🤔 User asks: {query}\")\n",
+    "    print(\"=\"*50)\n",
+    "    \n",
+    "    # Get the initial response and tool call\n",
+    "    response = agent.create_turn(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": query,\n",
+    "            }\n",
+    "        ],\n",
+    "        session_id=session_id,\n",
+    "    )\n",
+    "    \n",
+    "    # Process all events including tool calls and final response\n",
+    "    async for event in EventLogger().log(response):\n",
+    "        event.print()\n",
+    "        \n",
+    "        # If this was a tool call, we need to create another turn with the result\n",
+    "        if hasattr(event, 'tool_calls') and event.tool_calls:\n",
+    "            tool_call = event.tool_calls[0]  # Get the first tool call\n",
+    "            \n",
+    "            # Execute the custom tool\n",
+    "            if tool_call.tool_name in [t.get_name() for t in agent.custom_tools]:\n",
+    "                tool = [t for t in agent.custom_tools if t.get_name() == tool_call.tool_name][0]\n",
+    "                result = await tool.run_impl(**tool_call.arguments)\n",
+    "                \n",
+    "                # Create a follow-up turn with the tool result\n",
+    "                follow_up = agent.create_turn(\n",
+    "                    messages=[\n",
+    "                        {\n",
+    "                            \"role\": \"tool\",\n",
+    "                            \"content\": result,\n",
+    "                            \"tool_call_id\": tool_call.call_id,\n",
+    "                            \"name\": tool_call.tool_name\n",
+    "                        }\n",
+    "                    ],\n",
+    "                    session_id=session_id,\n",
+    "                )\n",
+    "                \n",
+    "                # Process the follow-up response\n",
+    "                async for follow_up_event in EventLogger().log(follow_up):\n",
+    "                    follow_up_event.print()\"\"A mock weather tool that simulates getting weather data\"\"\"\n",
+    "    \n",
+    "    def get_name(self) -> str:\n",
+    "        return \"get_weather\"\n",
+    "    \n",
+    "    def get_description(self) -> str:\n",
+    "        return \"Get current weather information for major cities\"\n",
+    "    \n",
+    "    def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
+    "        return {\n",
+    "            \"city\": ToolParamDefinitionParam(\n",
+    "                param_type=\"str\",\n",
+    "                description=\"Name of the city (e.g., New York, London, Tokyo)\",\n",
+    "                required=True\n",
+    "            ),\n",
+    "            \"date\": ToolParamDefinitionParam(\n",
+    "                param_type=\"str\",\n",
+    "                description=\"Date in YYYY-MM-DD format (optional)\",\n",
+    "                required=False\n",
+    "            )\n",
+    "        }\n",
+    "    \n",
+    "    async def run_impl(self, city: str = None, date: str = None):\n",
+    "        if not city:\n",
+    "            return json.dumps({\"error\": \"City parameter is required\"})\n",
+    "            \n",
+    "        # Mock database of weather information\n",
+    "        weather_data = {\n",
+    "            \"New York\": {\"temp\": 20, \"condition\": \"sunny\"},\n",
+    "            \"London\": {\"temp\": 15, \"condition\": \"rainy\"},\n",
+    "            \"Tokyo\": {\"temp\": 25, \"condition\": \"cloudy\"}\n",
+    "        }\n",
+    "        \n",
+    "        try:\n",
+    "            # Check if we have data for the requested city\n",
+    "            if city not in weather_data:\n",
+    "                return json.dumps({\n",
+    "                    \"error\": f\"Sorry! No data available for {city}\",\n",
+    "                    \"available_cities\": list(weather_data.keys())\n",
+    "                })\n",
+    "            \n",
+    "            # Return the weather information\n",
+    "            return json.dumps({\n",
+    "                \"city\": city,\n",
+    "                \"date\": date or datetime.now().strftime(\"%Y-%m-%d\"),\n",
+    "                \"data\": weather_data[city]\n",
+    "            })\n",
+    "        except Exception as e:\n",
+    "            return json.dumps({\"error\": str(e)})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ## 5. Setting Up the LlamaStack Agent\n",
+    "# \n",
+    "# Now that we have our tools, we need to create an agent that can use them.\n",
+    "# The agent is like a smart assistant that knows how to use our tools when needed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "async def setup_agent(host: str = \"localhost\", port: int = 5001):\n",
+    "    \"\"\"Creates and configures our LlamaStack agent\"\"\"\n",
+    "    \n",
+    "    # Create a client to connect to the LlamaStack server\n",
+    "    client = LlamaStackClient(\n",
+    "        base_url=f\"http://{host}:{port}\",\n",
+    "    )\n",
+    "    \n",
+    "    # Configure how we want our agent to behave\n",
+    "    agent_config = AgentConfig(\n",
+    "        model=\"Llama3.1-8B-Instruct\",\n",
+    "        instructions=\"\"\"You are a helpful assistant that can:\n",
+    "        1. Perform mathematical calculations\n",
+    "        2. Check weather information\n",
+    "        Always explain your thinking before using a tool.\"\"\",\n",
+    "        \n",
+    "        sampling_params={\n",
+    "            \"strategy\": \"greedy\",\n",
+    "            \"temperature\": 1.0,\n",
+    "            \"top_p\": 0.9,\n",
+    "        },\n",
+    "        \n",
+    "        # List of tools available to the agent\n",
+    "        tools=[\n",
+    "            {\n",
+    "                \"function_name\": \"calculator\",\n",
+    "                \"description\": \"Perform basic arithmetic operations\",\n",
+    "                \"parameters\": {\n",
+    "                    \"operation\": {\n",
+    "                        \"param_type\": \"str\",\n",
+    "                        \"description\": \"Operation to perform (add, subtract, multiply, divide)\",\n",
+    "                        \"required\": True,\n",
+    "                    },\n",
+    "                    \"x\": {\n",
+    "                        \"param_type\": \"float\",\n",
+    "                        \"description\": \"First number\",\n",
+    "                        \"required\": True,\n",
+    "                    },\n",
+    "                    \"y\": {\n",
+    "                        \"param_type\": \"float\",\n",
+    "                        \"description\": \"Second number\",\n",
+    "                        \"required\": True,\n",
+    "                    },\n",
+    "                },\n",
+    "                \"type\": \"function_call\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"function_name\": \"get_weather\",\n",
+    "                \"description\": \"Get weather information for a given city\",\n",
+    "                \"parameters\": {\n",
+    "                    \"city\": {\n",
+    "                        \"param_type\": \"str\",\n",
+    "                        \"description\": \"Name of the city\",\n",
+    "                        \"required\": True,\n",
+    "                    },\n",
+    "                    \"date\": {\n",
+    "                        \"param_type\": \"str\",\n",
+    "                        \"description\": \"Date in YYYY-MM-DD format\",\n",
+    "                        \"required\": False,\n",
+    "                    },\n",
+    "                },\n",
+    "                \"type\": \"function_call\",\n",
+    "            },\n",
+    "        ],\n",
+    "        tool_choice=\"auto\",\n",
+    "        # Using standard JSON format for tools\n",
+    "        tool_prompt_format=\"json\",  \n",
+    "        input_shields=[],\n",
+    "        output_shields=[],\n",
+    "        enable_session_persistence=False,\n",
+    "    )\n",
+    "    \n",
+    "    # Create our tools\n",
+    "    custom_tools = [CalculatorTool(), WeatherTool()]\n",
+    "    \n",
+    "    # Create the agent\n",
+    "    agent = Agent(client, agent_config, custom_tools)\n",
+    "    session_id = agent.create_session(\"tutorial-session\")\n",
+    "    print(f\"🎉 Created session_id={session_id} for Agent({agent.agent_id})\")\n",
+    "    \n",
+    "    return agent, session_id"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ## 6. Running Examples\n",
+    "# \n",
+    "# Let's try out our agent with some example questions!\n",
+    "\n",
+    "# %%"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nest_asyncio\n",
+    "nest_asyncio.apply()  # This allows async operations to work in Jupyter\n",
+    "\n",
+    "# %%\n",
+    "# Initialize the agent\n",
+    "async def init_agent():\n",
+    "    \"\"\"Initialize our agent - run this first!\"\"\"\n",
+    "    agent, session_id = await setup_agent()\n",
+    "    print(f\"✨ Agent initialized with session {session_id}\")\n",
+    "    return agent, session_id\n",
+    "\n",
+    "# %%\n",
+    "# Function to run a single query\n",
+    "async def run_single_query(agent, session_id, query: str):\n",
+    "    \"\"\"Run a single query through our agent\"\"\"\n",
+    "    print(\"\\n\" + \"=\"*50)\n",
+    "    print(f\"🤔 User asks: {query}\")\n",
+    "    print(\"=\"*50)\n",
+    "    \n",
+    "    response = agent.create_turn(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": query,\n",
+    "            }\n",
+    "        ],\n",
+    "        session_id=session_id,\n",
+    "    )\n",
+    "    \n",
+    "    async for log in EventLogger().log(response):\n",
+    "        log.print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's run everything and see it in action!\n",
+    "\n",
+    "Create and run our agent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "🎉 Created session_id=fbe83bb6-bdfd-497c-b920-d7307482d8ba for Agent(3997eeda-4ffd-4b05-9026-28b4da206a11)\n",
+      "✨ Agent initialized with session fbe83bb6-bdfd-497c-b920-d7307482d8ba\n"
+     ]
+    }
+   ],
+   "source": [
+    "agent, session_id = await init_agent()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "==================================================\n",
+      "🤔 User asks: What's 25 plus 17?\n",
+      "==================================================\n",
+      "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[36m\u001b[0m\u001b[36m{\"\u001b[0m\u001b[36mtype\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mfunction\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mname\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mcalculator\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mparameters\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m {\"\u001b[0m\u001b[36moperation\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36madd\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36my\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36m17\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mx\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36m25\u001b[0m\u001b[36m\"}}\u001b[0m\u001b[97m\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "await run_single_query(agent, session_id, \"What's 25 plus 17?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "==================================================\n",
+      "🤔 User asks: What's the weather like in Tokyo?\n",
+      "==================================================\n",
+      "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[36m\u001b[0m\u001b[36m{\"\u001b[0m\u001b[36mtype\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mfunction\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mname\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mget\u001b[0m\u001b[36m_weather\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mparameters\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m {\"\u001b[0m\u001b[36mcity\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mTok\u001b[0m\u001b[36myo\u001b[0m\u001b[36m\"}}\u001b[0m\u001b[97m\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "await run_single_query(agent, session_id, \"What's the weather like in Tokyo?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#fin"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/docs/zero_to_hero_guide/chat_completion_guide.md
+++ b/docs/zero_to_hero_guide/chat_completion_guide.md
@ -1,7 +1,7 @@

-# Llama Stack Text Generation Guide
+# Llama Stack Inference Guide

-This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). 
+This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).

 ### Table of Contents
 1. [Quickstart](#quickstart)
--- a/docs/zero_to_hero_guide/chat_few_shot_guide.md
+++ b/docs/zero_to_hero_guide/chat_few_shot_guide.md
--- a/docs/zero_to_hero_guide/chat_local_cloud_guide.md
+++ b/docs/zero_to_hero_guide/chat_local_cloud_guide.md
--- a/docs/zero_to_hero_guide/quickstart.md
+++ b/docs/zero_to_hero_guide/quickstart.md
@ -157,16 +157,15 @@ With these steps, you should have a functional Llama Stack setup capable of gene
 ## Next Steps

 - **Explore Other Guides**: Dive deeper into specific topics by following these guides:
-  - [Understanding Distributions](#)
-  - [Configure your Distro](#)
-  - [Doing Inference API Call and Fetching a Response from Endpoints](#)
-  - [Creating a Conversation Loop](#)
-  - [Sending Image to the Model](#)
-  - [Tool Calling: How to and Details](#)
-  - [Memory API: Show Simple In-Memory Retrieval](#)
-  - [Agents API: Explain Components](#)
-  - [Using Safety API in Conversation](#)
-  - [Prompt Engineering Guide](#)
+- [Inference 101](00_Inference101.ipynb)
+- [Simple switch between local and cloud model](00_Local_Cloud_Inference101.ipynb)
+- [Prompt Engineering](01_Prompt_Engineering101.ipynb)
+- [Chat with Image - LlamaStack Vision API](02_Image_Chat101.ipynb)
+- [Tool Calling: How to and Details](03_Tool_Calling101.ipynb)
+- [Memory API: Show Simple In-Memory Retrieval](04_Memory101.ipynb)
+- [Using Safety API in Conversation](05_Safety101.ipynb)
+- [Agents API: Explain Components](06_Agents101.ipynb)
+

 - **Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications:
  - [Python SDK](https://github.com/meta-llama/llama-stack-client-python)
@ -180,5 +179,3 @@ With these steps, you should have a functional Llama Stack setup capable of gene


 ---
-
-