diff --git a/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb b/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb
index 19a7fe3be..4f6ca4080 100644
--- a/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb
+++ b/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb
@@ -1,262 +1,239 @@
 {
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "a0ed972d",
-      "metadata": {},
-      "source": [
-        "# Switching between Local and Cloud Model with Llama Stack\n",
-        "\n",
-        "This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stack’s `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n",
-        "\n",
-        "### Prerequisites\n",
-        "Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n",
-        "\n",
-        "### Implementation"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "bfac8382",
-      "metadata": {},
-      "source": [
-        "### 1. Configuration\n",
-        "Set up your connection parameters:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 1,
-      "id": "d80c0926",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "HOST = \"localhost\"  # Replace with your host\n",
-        "LOCAL_PORT = 8321        # Replace with your local distro port\n",
-        "CLOUD_PORT = 8322        # Replace with your cloud distro port"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "df89cff7",
-      "metadata": {},
-      "source": [
-        "#### 2. Set Up Local and Cloud Clients\n",
-        "\n",
-        "Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:8321` and the cloud distribution running on `http://localhost:8322`.\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 2,
-      "id": "7f868dfe",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from llama_stack_client import LlamaStackClient\n",
-        "\n",
-        "# Configure local and cloud clients\n",
-        "local_client = LlamaStackClient(base_url=f'http://{HOST}:{LOCAL_PORT}')\n",
-        "cloud_client = LlamaStackClient(base_url=f'http://{HOST}:{CLOUD_PORT}')"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "894689c1",
-      "metadata": {},
-      "source": [
-        "#### 3. Client Selection with Fallback\n",
-        "\n",
-        "The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 3,
-      "id": "ff0c8277",
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "\u001b[33mUsing local client.\u001b[0m\n"
-          ]
-        }
-      ],
-      "source": [
-        "import httpx\n",
-        "from termcolor import cprint\n",
-        "\n",
-        "async def check_client_health(client, client_name: str) -> bool:\n",
-        "    try:\n",
-        "        async with httpx.AsyncClient() as http_client:\n",
-        "            response = await http_client.get(f'{client.base_url}/health')\n",
-        "            if response.status_code == 200:\n",
-        "                cprint(f'Using {client_name} client.', 'yellow')\n",
-        "                return True\n",
-        "            else:\n",
-        "                cprint(f'{client_name} client health check failed.', 'red')\n",
-        "                return False\n",
-        "    except httpx.RequestError:\n",
-        "        cprint(f'Failed to connect to {client_name} client.', 'red')\n",
-        "        return False\n",
-        "\n",
-        "async def select_client(use_local: bool) -> LlamaStackClient:\n",
-        "    if use_local and await check_client_health(local_client, 'local'):\n",
-        "        return local_client\n",
-        "\n",
-        "    if await check_client_health(cloud_client, 'cloud'):\n",
-        "        return cloud_client\n",
-        "\n",
-        "    raise ConnectionError('Unable to connect to any client.')\n",
-        "\n",
-        "# Example usage: pass True for local, False for cloud\n",
-        "client = await select_client(use_local=True)\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "9ccfe66f",
-      "metadata": {},
-      "source": [
-        "#### 4. Generate a Response\n",
-        "\n",
-        "After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 4,
-      "id": "5e19cc20",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from termcolor import cprint\n",
-        "from llama_stack_client.lib.inference.event_logger import EventLogger\n",
-        "\n",
-        "async def get_llama_response(stream: bool = True, use_local: bool = True):\n",
-        "    client = await select_client(use_local)  # Selects the available client\n",
-        "    message = {\n",
-        "        \"role\": \"user\",\n",
-        "        \"content\": 'hello world, write me a 2 sentence poem about the moon'\n",
-        "    }\n",
-        "    cprint(f'User> {message[\"content\"]}', 'green')\n",
-        "\n",
-        "    response = client.inference.chat_completion(\n",
-        "        messages=[message],\n",
-        "        model='Llama3.2-11B-Vision-Instruct',\n",
-        "        stream=stream,\n",
-        "    )\n",
-        "\n",
-        "    if not stream:\n",
-        "        cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
-        "    else:\n",
-        "        async for log in EventLogger().log(response):\n",
-        "            log.print()\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "6edf5e57",
-      "metadata": {},
-      "source": [
-        "#### 5. Run with Cloud Model\n",
-        "\n",
-        "Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 7,
-      "id": "c10f487e",
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "\u001b[33mUsing cloud client.\u001b[0m\n",
-            "\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n",
-            "\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n",
-            "\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n"
-          ]
-        }
-      ],
-      "source": [
-        "import asyncio\n",
-        "\n",
-        "\n",
-        "# Run this function directly in a Jupyter Notebook cell with `await`\n",
-        "await get_llama_response(use_local=False)\n",
-        "# To run it in a python file, use this line instead\n",
-        "# asyncio.run(get_llama_response(use_local=False))"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "5c433511-9321-4718-ab7f-e21cf6b5ca79",
-      "metadata": {},
-      "source": [
-        "#### 6. Run with Local Model\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 8,
-      "id": "02eacfaf-c7f1-494b-ac28-129d2a0258e3",
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "\u001b[33mUsing local client.\u001b[0m\n",
-            "\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n",
-            "\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n",
-            "\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n"
-          ]
-        }
-      ],
-      "source": [
-        "import asyncio\n",
-        "\n",
-        "await get_llama_response(use_local=True)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "7e3a3ffa",
-      "metadata": {},
-      "source": [
-        "Thanks for checking out this notebook! \n",
-        "\n",
-        "The next one will be a guide on [Prompt Engineering](./02_Prompt_Engineering101.ipynb), please continue learning!"
-      ]
-    }
-  ],
-  "metadata": {
-    "fileHeader": "",
-    "fileUid": "e11939ac-dfbc-4a1c-83be-e494c7f803b8",
-    "isAdHoc": false,
-    "kernelspec": {
-      "display_name": "Python 3 (ipykernel)",
-      "language": "python",
-      "name": "python3"
-    },
-    "language_info": {
-      "codemirror_mode": {
-        "name": "ipython",
-        "version": 3
-      },
-      "file_extension": ".py",
-      "mimetype": "text/x-python",
-      "name": "python",
-      "nbconvert_exporter": "python",
-      "pygments_lexer": "ipython3",
-      "version": "3.10.15"
-    }
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a0ed972d",
+   "metadata": {},
+   "source": [
+    "# Switching between Local and Cloud Model with Llama Stack\n",
+    "\n",
+    "This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stack’s `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n",
+    "\n",
+    "### Prerequisites\n",
+    "Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n",
+    "\n",
+    "### Implementation"
+   ]
   },
-  "nbformat": 4,
-  "nbformat_minor": 5
+  {
+   "cell_type": "markdown",
+   "id": "bfac8382",
+   "metadata": {},
+   "source": [
+    "### 1. Configuration\n",
+    "Set up your connection parameters:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d80c0926",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "HOST = \"localhost\"  # Replace with your host\n",
+    "LOCAL_PORT = 8321        # Replace with your local distro port\n",
+    "CLOUD_PORT = 8322        # Replace with your cloud distro port"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df89cff7",
+   "metadata": {},
+   "source": [
+    "#### 2. Set Up Local and Cloud Clients\n",
+    "\n",
+    "Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:8321` and the cloud distribution running on `http://localhost:8322`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7f868dfe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client import AsyncLlamaStackClient\n",
+    "\n",
+    "# Configure local and cloud clients\n",
+    "local_client = AsyncLlamaStackClient(base_url=f'http://{HOST}:{LOCAL_PORT}')\n",
+    "cloud_client = AsyncLlamaStackClient(base_url=f'http://{HOST}:{CLOUD_PORT}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "894689c1",
+   "metadata": {},
+   "source": [
+    "#### 3. Client Selection with Fallback\n",
+    "\n",
+    "The `select_client` function checks if the local client is available using a lightweight `/v1/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff0c8277",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import httpx\n",
+    "from termcolor import cprint\n",
+    "\n",
+    "async def check_client_health(client, client_name: str) -> bool:\n",
+    "    try:\n",
+    "        async with httpx.AsyncClient() as http_client:\n",
+    "            response = await http_client.get(f'{client.base_url}/v1/health')\n",
+    "            if response.status_code == 200:\n",
+    "                cprint(f'Using {client_name} client.', 'yellow')\n",
+    "                return True\n",
+    "            else:\n",
+    "                cprint(f'{client_name} client health check failed.', 'red')\n",
+    "                return False\n",
+    "    except httpx.RequestError:\n",
+    "        cprint(f'Failed to connect to {client_name} client.', 'red')\n",
+    "        return False\n",
+    "\n",
+    "async def select_client(use_local: bool) -> AsyncLlamaStackClient:\n",
+    "    if use_local and await check_client_health(local_client, 'local'):\n",
+    "        return local_client\n",
+    "\n",
+    "    if await check_client_health(cloud_client, 'cloud'):\n",
+    "        return cloud_client\n",
+    "\n",
+    "    raise ConnectionError('Unable to connect to any client.')\n",
+    "\n",
+    "# Example usage: pass True for local, False for cloud\n",
+    "client = await select_client(use_local=True)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ccfe66f",
+   "metadata": {},
+   "source": [
+    "#### 4. Generate a Response\n",
+    "\n",
+    "After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e19cc20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from termcolor import cprint\n",
+    "\n",
+    "async def get_llama_response(stream: bool = True, use_local: bool = True):\n",
+    "    client = await select_client(use_local)  # Selects the available client\n",
+    "    message = {\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": 'hello world, write me a 2 sentence poem about the moon'\n",
+    "    }\n",
+    "    cprint(f'User> {message[\"content\"]}', 'green')\n",
+    "\n",
+    "    response = await client.inference.chat_completion(\n",
+    "        messages=[message],\n",
+    "        model_id='meta-llama/Llama3.2-11B-Vision-Instruct',\n",
+    "        stream=stream,\n",
+    "    )\n",
+    "\n",
+    "    cprint(f'Assistant> ', color='cyan', end='')\n",
+    "    if not stream:\n",
+    "        cprint(response.completion_message.content, color='yellow')\n",
+    "    else:\n",
+    "        async for chunk in response:\n",
+    "            cprint(chunk.event.delta.text, color='yellow', end='')\n",
+    "        cprint('')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6edf5e57",
+   "metadata": {},
+   "source": [
+    "#### 5. Run with Cloud Model\n",
+    "\n",
+    "Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c10f487e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "\n",
+    "\n",
+    "# Run this function directly in a Jupyter Notebook cell with `await`\n",
+    "await get_llama_response(use_local=False)\n",
+    "# To run it in a python file, use this line instead\n",
+    "# asyncio.run(get_llama_response(use_local=False))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c433511-9321-4718-ab7f-e21cf6b5ca79",
+   "metadata": {},
+   "source": [
+    "#### 6. Run with Local Model\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "02eacfaf-c7f1-494b-ac28-129d2a0258e3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "\n",
+    "await get_llama_response(use_local=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e3a3ffa",
+   "metadata": {},
+   "source": [
+    "Thanks for checking out this notebook! \n",
+    "\n",
+    "The next one will be a guide on [Prompt Engineering](./02_Prompt_Engineering101.ipynb), please continue learning!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ad6db48",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "fileHeader": "",
+  "fileUid": "e11939ac-dfbc-4a1c-83be-e494c7f803b8",
+  "isAdHoc": false,
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
 }