llama-stack/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb
Justin Lee 65371a5067
[Docs] Zero-to-Hero notebooks and quick start documentation (#368)
Co-authored-by: Kai Wu <kaiwu@meta.com>
Co-authored-by: Sanyam Bhutani <sanyambhutani@meta.com>
Co-authored-by: Justin Lee <justinai@fb.com>
2024-11-08 17:16:44 -08:00

267 lines
8.9 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "785bd3ff",
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/zero_to_hero_guide/01_Local_Cloud_Inference101.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"id": "a0ed972d",
"metadata": {},
"source": [
"# Switching between Local and Cloud Model with Llama Stack\n",
"\n",
"This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stacks `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n",
"\n",
"### Prerequisites\n",
"Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n",
"\n",
"### Implementation"
]
},
{
"cell_type": "markdown",
"id": "bfac8382",
"metadata": {},
"source": [
"### 1. Configuration\n",
"Set up your connection parameters:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d80c0926",
"metadata": {},
"outputs": [],
"source": [
"HOST = \"localhost\" # Replace with your host\n",
"LOCAL_PORT = 5000 # Replace with your local distro port\n",
"CLOUD_PORT = 5001 # Replace with your cloud distro port"
]
},
{
"cell_type": "markdown",
"id": "df89cff7",
"metadata": {},
"source": [
"#### 2. Set Up Local and Cloud Clients\n",
"\n",
"Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:5000` and the cloud distribution running on `http://localhost:5001`.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7f868dfe",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client import LlamaStackClient\n",
"\n",
"# Configure local and cloud clients\n",
"local_client = LlamaStackClient(base_url=f'http://{HOST}:{LOCAL_PORT}')\n",
"cloud_client = LlamaStackClient(base_url=f'http://{HOST}:{CLOUD_PORT}')"
]
},
{
"cell_type": "markdown",
"id": "894689c1",
"metadata": {},
"source": [
"#### 3. Client Selection with Fallback\n",
"\n",
"The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ff0c8277",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mUsing local client.\u001b[0m\n"
]
}
],
"source": [
"import httpx\n",
"from termcolor import cprint\n",
"\n",
"async def check_client_health(client, client_name: str) -> bool:\n",
" try:\n",
" async with httpx.AsyncClient() as http_client:\n",
" response = await http_client.get(f'{client.base_url}/health')\n",
" if response.status_code == 200:\n",
" cprint(f'Using {client_name} client.', 'yellow')\n",
" return True\n",
" else:\n",
" cprint(f'{client_name} client health check failed.', 'red')\n",
" return False\n",
" except httpx.RequestError:\n",
" cprint(f'Failed to connect to {client_name} client.', 'red')\n",
" return False\n",
"\n",
"async def select_client(use_local: bool) -> LlamaStackClient:\n",
" if use_local and await check_client_health(local_client, 'local'):\n",
" return local_client\n",
"\n",
" if await check_client_health(cloud_client, 'cloud'):\n",
" return cloud_client\n",
"\n",
" raise ConnectionError('Unable to connect to any client.')\n",
"\n",
"# Example usage: pass True for local, False for cloud\n",
"client = await select_client(use_local=True)\n"
]
},
{
"cell_type": "markdown",
"id": "9ccfe66f",
"metadata": {},
"source": [
"#### 4. Generate a Response\n",
"\n",
"After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5e19cc20",
"metadata": {},
"outputs": [],
"source": [
"from termcolor import cprint\n",
"from llama_stack_client.lib.inference.event_logger import EventLogger\n",
"\n",
"async def get_llama_response(stream: bool = True, use_local: bool = True):\n",
" client = await select_client(use_local) # Selects the available client\n",
" message = {\n",
" \"role\": \"user\",\n",
" \"content\": 'hello world, write me a 2 sentence poem about the moon'\n",
" }\n",
" cprint(f'User> {message[\"content\"]}', 'green')\n",
"\n",
" response = client.inference.chat_completion(\n",
" messages=[message],\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
" stream=stream,\n",
" )\n",
"\n",
" if not stream:\n",
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
" else:\n",
" async for log in EventLogger().log(response):\n",
" log.print()\n"
]
},
{
"cell_type": "markdown",
"id": "6edf5e57",
"metadata": {},
"source": [
"#### 5. Run with Cloud Model\n",
"\n",
"Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c10f487e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mUsing cloud client.\u001b[0m\n",
"\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n",
"\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n",
"\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n"
]
}
],
"source": [
"import asyncio\n",
"\n",
"\n",
"# Run this function directly in a Jupyter Notebook cell with `await`\n",
"await get_llama_response(use_local=False)\n",
"# To run it in a python file, use this line instead\n",
"# asyncio.run(get_llama_response(use_local=False))"
]
},
{
"cell_type": "markdown",
"id": "5c433511-9321-4718-ab7f-e21cf6b5ca79",
"metadata": {},
"source": [
"#### 6. Run with Local Model\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "02eacfaf-c7f1-494b-ac28-129d2a0258e3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mUsing local client.\u001b[0m\n",
"\u001b[32mUser> hello world, write me a 2 sentence poem about the moon\u001b[0m\n",
"\u001b[36mAssistant> \u001b[0m\u001b[33mSilver\u001b[0m\u001b[33m cres\u001b[0m\u001b[33mcent\u001b[0m\u001b[33m in\u001b[0m\u001b[33m the\u001b[0m\u001b[33m midnight\u001b[0m\u001b[33m sky\u001b[0m\u001b[33m,\n",
"\u001b[0m\u001b[33mA\u001b[0m\u001b[33m gentle\u001b[0m\u001b[33m glow\u001b[0m\u001b[33m that\u001b[0m\u001b[33m whispers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mI\u001b[0m\u001b[33m'm\u001b[0m\u001b[33m passing\u001b[0m\u001b[33m by\u001b[0m\u001b[33m.\"\u001b[0m\u001b[97m\u001b[0m\n"
]
}
],
"source": [
"import asyncio\n",
"\n",
"await get_llama_response(use_local=True)"
]
},
{
"cell_type": "markdown",
"id": "7e3a3ffa",
"metadata": {},
"source": [
"Thanks for checking out this notebook! \n",
"\n",
"The next one will be a guide on [Prompt Engineering](./01_Prompt_Engineering101.ipynb), please continue learning!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.15"
}
},
"nbformat": 4,
"nbformat_minor": 5
}