forked from phoenix-oss/llama-stack-mirror
# What does this PR do? Adding nbformat version fixes this issue. Not sure exactly why this needs to be done, but this version was rewritten to the bottom of a nb file when I changed its name trying to get to the bottom of this. When I opened it on GH the issue was no longer present Closes #1837 ## Test Plan N/A
262 lines
9.5 KiB
Text
262 lines
9.5 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a0ed972d",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Switching between Local and Cloud Model with Llama Stack\n",
|
||
"\n",
|
||
"This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stack’s `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n",
|
||
"\n",
|
||
"### Prerequisites\n",
|
||
"Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n",
|
||
"\n",
|
||
"### Implementation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "bfac8382",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1. Configuration\n",
|
||
"Set up your connection parameters:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "d80c0926",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"HOST = \"localhost\" # Replace with your host\n",
|
||
"LOCAL_PORT = 8321 # Replace with your local distro port\n",
|
||
"CLOUD_PORT = 8322 # Replace with your cloud distro port"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "df89cff7",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 2. Set Up Local and Cloud Clients\n",
|
||
"\n",
|
||
"Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:8321` and the cloud distribution running on `http://localhost:8322`.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "7f868dfe",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from llama_stack_client import LlamaStackClient\n",
|
||
"\n",
|
||
"# Configure local and cloud clients\n",
|
||
"local_client = LlamaStackClient(base_url=f'http://{HOST}:{LOCAL_PORT}')\n",
|
||
"cloud_client = LlamaStackClient(base_url=f'http://{HOST}:{CLOUD_PORT}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "894689c1",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 3. Client Selection with Fallback\n",
|
||
"\n",
|
||
"The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "ff0c8277",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\u001b[33mUsing local client.\u001b[0m\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import httpx\n",
|
||
"from termcolor import cprint\n",
|
||
"\n",
|
||
"async def check_client_health(client, client_name: str) -> bool:\n",
|
||
" try:\n",
|
||
" async with httpx.AsyncClient() as http_client:\n",
|
||
" response = await http_client.get(f'{client.base_url}/health')\n",
|
||
" if response.status_code == 200:\n",
|
||
" cprint(f'Using {client_name} client.', 'yellow')\n",
|
||
" return True\n",
|
||
" else:\n",
|
||
" cprint(f'{client_name} client health check failed.', 'red')\n",
|
||
" return False\n",
|
||
" except httpx.RequestError:\n",
|
||
" cprint(f'Failed to connect to {client_name} client.', 'red')\n",
|
||
" return False\n",
|
||
"\n",
|
||
"async def select_client(use_local: bool) -> LlamaStackClient:\n",
|
||
" if use_local and await check_client_health(local_client, 'local'):\n",
|
||
" return local_client\n",
|
||
"\n",
|
||
" if await check_client_health(cloud_client, 'cloud'):\n",
|
||
" return cloud_client\n",
|
||
"\n",
|
||
" raise ConnectionError('Unable to connect to any client.')\n",
|
||
"\n",
|
||
"# Example usage: pass True for local, False for cloud\n",
|
||
"client = await select_client(use_local=True)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9ccfe66f",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 4. Generate a Response\n",
|
||
"\n",
|
||
"After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "5e19cc20",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from termcolor import cprint\n",
|
||
"from llama_stack_client.lib.inference.event_logger import EventLogger\n",
|
||
"\n",
|
||
"async def get_llama_response(stream: bool = True, use_local: bool = True):\n",
|
||
" client = await select_client(use_local) # Selects the available client\n",
|
||
" message = {\n",
|
||
" \"role\": \"user\",\n",
|
||
" \"content\": 'hello world, write me a 2 sentence poem about the moon'\n",
|
||
" }\n",
|
||
" cprint(f'User> {message[\"content\"]}', 'green')\n",
|
||
"\n",
|
||
" response = client.inference.chat_completion(\n",
|
||
" messages=[message],\n",
|
||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||
" stream=stream,\n",
|
||
" )\n",
|
||
"\n",
|
||
" if not stream:\n",
|
||
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
|
||
" else:\n",
|
||
" async for log in EventLogger().log(response):\n",
|
||
" log.print()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "6edf5e57",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 5. Run with Cloud Model\n",
|
||
"\n",
|
||
"Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "c10f487e",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\u001b[33mUsing cloud client.\u001b[0m\n",
|
||
"\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n",
|
||
"\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n",
|
||
"\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import asyncio\n",
|
||
"\n",
|
||
"\n",
|
||
"# Run this function directly in a Jupyter Notebook cell with `await`\n",
|
||
"await get_llama_response(use_local=False)\n",
|
||
"# To run it in a python file, use this line instead\n",
|
||
"# asyncio.run(get_llama_response(use_local=False))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5c433511-9321-4718-ab7f-e21cf6b5ca79",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 6. Run with Local Model\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "02eacfaf-c7f1-494b-ac28-129d2a0258e3",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\u001b[33mUsing local client.\u001b[0m\n",
|
||
"\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n",
|
||
"\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n",
|
||
"\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import asyncio\n",
|
||
"\n",
|
||
"await get_llama_response(use_local=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7e3a3ffa",
|
||
"metadata": {},
|
||
"source": [
|
||
"Thanks for checking out this notebook! \n",
|
||
"\n",
|
||
"The next one will be a guide on [Prompt Engineering](./02_Prompt_Engineering101.ipynb), please continue learning!"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"fileHeader": "",
|
||
"fileUid": "e11939ac-dfbc-4a1c-83be-e494c7f803b8",
|
||
"isAdHoc": false,
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.15"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|