{ "cells": [ { "cell_type": "markdown", "id": "d2bf5275", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "cd96f85a", "metadata": {}, "source": [ "# Prompt Engineering with Llama Stack\n", "\n", "Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n", "\n", "This interactive guide covers prompt engineering & best practices with Llama 3.2 and Llama Stack.\n", "\n", "Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html)." ] }, { "cell_type": "markdown", "id": "3e1ef1c9", "metadata": {}, "source": [ "## Few-Shot Inference for LLMs\n", "\n", "This guide provides instructions on how to use Llama Stack’s `chat_completion` API with a few-shot learning approach to enhance text generation. Few-shot examples enable the model to recognize patterns by providing labeled prompts, allowing it to complete tasks based on minimal prior examples.\n", "\n", "### Overview\n", "\n", "Few-shot learning provides the model with multiple examples of input-output pairs. This is particularly useful for guiding the model's behavior in specific tasks, helping it understand the desired completion format and content based on a few sample interactions.\n", "\n", "### Implementation" ] }, { "cell_type": "markdown", "id": "e065af43", "metadata": {}, "source": [ "### 0. Configuration\n", "Set up your connection parameters:" ] }, { "cell_type": "code", "execution_count": 1, "id": "df35d1e2", "metadata": {}, "outputs": [], "source": [ "HOST = \"localhost\" # Replace with your host\n", "PORT = 5000 # Replace with your port" ] }, { "cell_type": "markdown", "id": "a7a25a7e", "metadata": {}, "source": [ "#### 1. Initialize the Client\n", "\n", "Begin by setting up the `LlamaStackClient` to connect to the inference endpoint.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "c2a0e359", "metadata": {}, "outputs": [], "source": [ "from llama_stack_client import LlamaStackClient\n", "\n", "client = LlamaStackClient(base_url=f'http://{HOST}:{PORT}')" ] }, { "cell_type": "markdown", "id": "02cdf3f6", "metadata": {}, "source": [ "#### 2. Define Few-Shot Examples\n", "\n", "Construct a series of labeled `UserMessage` and `CompletionMessage` instances to demonstrate the task to the model. Each `UserMessage` represents an input prompt, and each `CompletionMessage` is the desired output. The model uses these examples to infer the appropriate response patterns.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "da140b33", "metadata": {}, "outputs": [], "source": [ "few_shot_examples = [\n", " {\"role\": \"user\", \"content\": 'Have shorter, spear-shaped ears.'},\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"That's Alpaca!\",\n", " \"stop_reason\": 'end_of_message',\n", " \"tool_calls\": []\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": 'Known for their calm nature and used as pack animals in mountainous regions.'\n", " },\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"That's Llama!\",\n", " \"stop_reason\": 'end_of_message',\n", " \"tool_calls\": []\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": 'Has a straight, slender neck and is smaller in size compared to its relative.'\n", " },\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"That's Alpaca!\",\n", " \"stop_reason\": 'end_of_message',\n", " \"tool_calls\": []\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": 'Generally taller and more robust, commonly seen as guard animals.'\n", " }\n", "]" ] }, { "cell_type": "markdown", "id": "6eece9cc", "metadata": {}, "source": [ "#### Note\n", "- **Few-Shot Examples**: These examples show the model the correct responses for specific prompts.\n", "- **CompletionMessage**: This defines the model's expected completion for each prompt.\n" ] }, { "cell_type": "markdown", "id": "5a0de6c7", "metadata": {}, "source": [ "#### 3. Invoke `chat_completion` with Few-Shot Examples\n", "\n", "Use the few-shot examples as the message input for `chat_completion`. The model will use the examples to generate contextually appropriate responses, allowing it to infer and complete new queries in a similar format.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "8b321089", "metadata": {}, "outputs": [], "source": [ "response = client.inference.chat_completion(\n", " messages=few_shot_examples, model='Llama3.1-8B-Instruct'\n", ")" ] }, { "cell_type": "markdown", "id": "063265d2", "metadata": {}, "source": [ "#### 4. Display the Model’s Response\n", "\n", "The `completion_message` contains the assistant’s generated content based on the few-shot examples provided. Output this content to see the model's response directly in the console.\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "4ac1ac3e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[36m> Response: That's Llama!\u001b[0m\n" ] } ], "source": [ "from termcolor import cprint\n", "\n", "cprint(f'> Response: {response.completion_message.content}', 'cyan')" ] }, { "cell_type": "markdown", "id": "d936ab59", "metadata": {}, "source": [ "### Complete code\n", "Summing it up, here's the code for few-shot implementation with llama-stack:\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "524189bd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[36m> Response: That's Llama!\u001b[0m\n" ] } ], "source": [ "from llama_stack_client import LlamaStackClient\n", "from llama_stack_client.types import CompletionMessage, UserMessage\n", "from termcolor import cprint\n", "\n", "client = LlamaStackClient(base_url=f'http://{HOST}:{PORT}')\n", "\n", "response = client.inference.chat_completion(\n", " messages=[\n", " {\"role\": \"user\", \"content\": 'Have shorter, spear-shaped ears.'},\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"That's Alpaca!\",\n", " \"stop_reason\": 'end_of_message',\n", " \"tool_calls\": []\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": 'Known for their calm nature and used as pack animals in mountainous regions.'\n", " },\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"That's Llama!\",\n", " \"stop_reason\": 'end_of_message',\n", " \"tool_calls\": []\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": 'Has a straight, slender neck and is smaller in size compared to its relative.'\n", " },\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"That's Alpaca!\",\n", " \"stop_reason\": 'end_of_message',\n", " \"tool_calls\": []\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": 'Generally taller and more robust, commonly seen as guard animals.'\n", " }\n", "],\n", " model='Llama3.2-11B-Vision-Instruct',\n", ")\n", "\n", "cprint(f'> Response: {response.completion_message.content}', 'cyan')" ] }, { "cell_type": "markdown", "id": "76d053b8", "metadata": {}, "source": [ "Thanks for checking out this notebook! \n", "\n", "The next one will be a guide on how to chat with images, continue to the notebook [here](./02_Image_Chat101.ipynb). Happy learning!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 5 }