{ "cells": [ { "cell_type": "markdown", "id": "a0ed972d", "metadata": {}, "source": [ "# Switching between Local and Cloud Model with Llama Stack\n", "\n", "This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stack’s `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n", "\n", "### Prerequisites\n", "Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n", "\n", "### Implementation" ] }, { "cell_type": "markdown", "id": "bfac8382", "metadata": {}, "source": [ "### 1. Configuration\n", "Set up your connection parameters:" ] }, { "cell_type": "code", "execution_count": 1, "id": "d80c0926", "metadata": {}, "outputs": [], "source": [ "HOST = \"localhost\" # Replace with your host\n", "LOCAL_PORT = 8321 # Replace with your local distro port\n", "CLOUD_PORT = 8322 # Replace with your cloud distro port" ] }, { "cell_type": "markdown", "id": "df89cff7", "metadata": {}, "source": [ "#### 2. Set Up Local and Cloud Clients\n", "\n", "Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:8321` and the cloud distribution running on `http://localhost:5001`.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "7f868dfe", "metadata": {}, "outputs": [], "source": [ "from llama_stack_client import LlamaStackClient\n", "\n", "# Configure local and cloud clients\n", "local_client = LlamaStackClient(base_url=f'http://{HOST}:{LOCAL_PORT}')\n", "cloud_client = LlamaStackClient(base_url=f'http://{HOST}:{CLOUD_PORT}')" ] }, { "cell_type": "markdown", "id": "894689c1", "metadata": {}, "source": [ "#### 3. Client Selection with Fallback\n", "\n", "The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "ff0c8277", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mUsing local client.\u001b[0m\n" ] } ], "source": [ "import httpx\n", "from termcolor import cprint\n", "\n", "async def check_client_health(client, client_name: str) -> bool:\n", " try:\n", " async with httpx.AsyncClient() as http_client:\n", " response = await http_client.get(f'{client.base_url}/health')\n", " if response.status_code == 200:\n", " cprint(f'Using {client_name} client.', 'yellow')\n", " return True\n", " else:\n", " cprint(f'{client_name} client health check failed.', 'red')\n", " return False\n", " except httpx.RequestError:\n", " cprint(f'Failed to connect to {client_name} client.', 'red')\n", " return False\n", "\n", "async def select_client(use_local: bool) -> LlamaStackClient:\n", " if use_local and await check_client_health(local_client, 'local'):\n", " return local_client\n", "\n", " if await check_client_health(cloud_client, 'cloud'):\n", " return cloud_client\n", "\n", " raise ConnectionError('Unable to connect to any client.')\n", "\n", "# Example usage: pass True for local, False for cloud\n", "client = await select_client(use_local=True)\n" ] }, { "cell_type": "markdown", "id": "9ccfe66f", "metadata": {}, "source": [ "#### 4. Generate a Response\n", "\n", "After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "5e19cc20", "metadata": {}, "outputs": [], "source": [ "from termcolor import cprint\n", "from llama_stack_client.lib.inference.event_logger import EventLogger\n", "\n", "async def get_llama_response(stream: bool = True, use_local: bool = True):\n", " client = await select_client(use_local) # Selects the available client\n", " message = {\n", " \"role\": \"user\",\n", " \"content\": 'hello world, write me a 2 sentence poem about the moon'\n", " }\n", " cprint(f'User> {message[\"content\"]}', 'green')\n", "\n", " response = client.inference.chat_completion(\n", " messages=[message],\n", " model='Llama3.2-11B-Vision-Instruct',\n", " stream=stream,\n", " )\n", "\n", " if not stream:\n", " cprint(f'> Response: {response.completion_message.content}', 'cyan')\n", " else:\n", " async for log in EventLogger().log(response):\n", " log.print()\n" ] }, { "cell_type": "markdown", "id": "6edf5e57", "metadata": {}, "source": [ "#### 5. Run with Cloud Model\n", "\n", "Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "c10f487e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mUsing cloud client.\u001b[0m\n", "\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n", "\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n", "\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n" ] } ], "source": [ "import asyncio\n", "\n", "\n", "# Run this function directly in a Jupyter Notebook cell with `await`\n", "await get_llama_response(use_local=False)\n", "# To run it in a python file, use this line instead\n", "# asyncio.run(get_llama_response(use_local=False))" ] }, { "cell_type": "markdown", "id": "5c433511-9321-4718-ab7f-e21cf6b5ca79", "metadata": {}, "source": [ "#### 6. Run with Local Model\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "02eacfaf-c7f1-494b-ac28-129d2a0258e3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mUsing local client.\u001b[0m\n", "\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n", "\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n", "\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n" ] } ], "source": [ "import asyncio\n", "\n", "await get_llama_response(use_local=True)" ] }, { "cell_type": "markdown", "id": "7e3a3ffa", "metadata": {}, "source": [ "Thanks for checking out this notebook! \n", "\n", "The next one will be a guide on [Prompt Engineering](./02_Prompt_Engineering101.ipynb), please continue learning!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 5 }