mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-25 01:01:13 +00:00 
			
		
		
		
	# What does this PR do?
BREAKING CHANGE: removes /inference/chat-completion route and updates
relevant documentation
## Test Plan
🤷
		
	
			
		
			
				
	
	
		
			307 lines
		
	
	
	
		
			10 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			307 lines
		
	
	
	
		
			10 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| {
 | ||
|   "cells": [
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "cd96f85a",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "# Prompt Engineering with Llama Stack\n",
 | ||
|         "\n",
 | ||
|         "Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n",
 | ||
|         "\n",
 | ||
|         "This interactive guide covers prompt engineering & best practices with Llama 3.2 and Llama Stack.\n",
 | ||
|         "\n",
 | ||
|         "Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llamastack.github.io/latest/getting_started/index.html)."
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "3e1ef1c9",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "## Few-Shot Inference for LLMs\n",
 | ||
|         "\n",
 | ||
|         "This guide provides instructions on how to use Llama Stack’s `chat_completion` API with a few-shot learning approach to enhance text generation. Few-shot examples enable the model to recognize patterns by providing labeled prompts, allowing it to complete tasks based on minimal prior examples.\n",
 | ||
|         "\n",
 | ||
|         "### Overview\n",
 | ||
|         "\n",
 | ||
|         "Few-shot learning provides the model with multiple examples of input-output pairs. This is particularly useful for guiding the model's behavior in specific tasks, helping it understand the desired completion format and content based on a few sample interactions.\n",
 | ||
|         "\n",
 | ||
|         "### Implementation"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "e065af43",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "### 0. Configuration\n",
 | ||
|         "Set up your connection parameters:"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "code",
 | ||
|       "execution_count": 1,
 | ||
|       "id": "df35d1e2",
 | ||
|       "metadata": {},
 | ||
|       "outputs": [],
 | ||
|       "source": [
 | ||
|         "HOST = \"localhost\"  # Replace with your host\n",
 | ||
|         "PORT = 8321        # Replace with your port\n",
 | ||
|         "MODEL_NAME='meta-llama/Llama-3.2-3B-Instruct'"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "a7a25a7e",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "#### 1. Initialize the Client\n",
 | ||
|         "\n",
 | ||
|         "Begin by setting up the `LlamaStackClient` to connect to the inference endpoint.\n"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "code",
 | ||
|       "execution_count": 2,
 | ||
|       "id": "c2a0e359",
 | ||
|       "metadata": {},
 | ||
|       "outputs": [],
 | ||
|       "source": [
 | ||
|         "from llama_stack_client import LlamaStackClient\n",
 | ||
|         "\n",
 | ||
|         "client = LlamaStackClient(base_url=f'http://{HOST}:{PORT}')"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "02cdf3f6",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "#### 2. Define Few-Shot Examples\n",
 | ||
|         "\n",
 | ||
|         "Construct a series of labeled `UserMessage` and `CompletionMessage` instances to demonstrate the task to the model. Each `UserMessage` represents an input prompt, and each `CompletionMessage` is the desired output. The model uses these examples to infer the appropriate response patterns.\n"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "code",
 | ||
|       "execution_count": 3,
 | ||
|       "id": "da140b33",
 | ||
|       "metadata": {},
 | ||
|       "outputs": [],
 | ||
|       "source": [
 | ||
|         "few_shot_examples = [\n",
 | ||
|         "    {\"role\": \"user\", \"content\": 'Have shorter, spear-shaped ears.'},\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"assistant\",\n",
 | ||
|         "        \"content\": \"That's Alpaca!\",\n",
 | ||
|         "        \"stop_reason\": 'end_of_message',\n",
 | ||
|         "        \"tool_calls\": []\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"user\",\n",
 | ||
|         "        \"content\": 'Known for their calm nature and used as pack animals in mountainous regions.'\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"assistant\",\n",
 | ||
|         "        \"content\": \"That's Llama!\",\n",
 | ||
|         "        \"stop_reason\": 'end_of_message',\n",
 | ||
|         "        \"tool_calls\": []\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"user\",\n",
 | ||
|         "        \"content\": 'Has a straight, slender neck and is smaller in size compared to its relative.'\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"assistant\",\n",
 | ||
|         "        \"content\": \"That's Alpaca!\",\n",
 | ||
|         "        \"stop_reason\": 'end_of_message',\n",
 | ||
|         "        \"tool_calls\": []\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"user\",\n",
 | ||
|         "        \"content\": 'Generally taller and more robust, commonly seen as guard animals.'\n",
 | ||
|         "    }\n",
 | ||
|         "]"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "6eece9cc",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "#### Note\n",
 | ||
|         "- **Few-Shot Examples**: These examples show the model the correct responses for specific prompts.\n",
 | ||
|         "- **CompletionMessage**: This defines the model's expected completion for each prompt.\n"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "5a0de6c7",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "#### 3. Invoke `chat_completion` with Few-Shot Examples\n",
 | ||
|         "\n",
 | ||
|         "Use the few-shot examples as the message input for `chat_completion`. The model will use the examples to generate contextually appropriate responses, allowing it to infer and complete new queries in a similar format.\n"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "code",
 | ||
|       "execution_count": 8,
 | ||
|       "id": "8b321089",
 | ||
|       "metadata": {},
 | ||
|       "outputs": [],
 | ||
|       "source": [
 | ||
|         "response = client.chat.completions.create(\n",
 | ||
|         "    messages=few_shot_examples, model=MODEL_NAME\n",
 | ||
|         ")"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "063265d2",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "#### 4. Display the Model’s Response\n",
 | ||
|         "\n",
 | ||
|         "The `choices[0].message.content` contains the assistant’s generated content based on the few-shot examples provided. Output this content to see the model's response directly in the console.\n"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "code",
 | ||
|       "execution_count": 9,
 | ||
|       "id": "4ac1ac3e",
 | ||
|       "metadata": {},
 | ||
|       "outputs": [
 | ||
|         {
 | ||
|           "name": "stdout",
 | ||
|           "output_type": "stream",
 | ||
|           "text": [
 | ||
|             "\u001b[36m> Response: That sounds like a Donkey or an Ass (also known as a Burro)!\u001b[0m\n"
 | ||
|           ]
 | ||
|         }
 | ||
|       ],
 | ||
|       "source": [
 | ||
|         "from termcolor import cprint\n",
 | ||
|         "\n",
 | ||
|         "cprint(f'> Response: {response.choices[0].message.content}', 'cyan')"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "d936ab59",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "### Complete code\n",
 | ||
|         "Summing it up, here's the code for few-shot implementation with llama-stack:\n"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "code",
 | ||
|       "execution_count": 15,
 | ||
|       "id": "524189bd",
 | ||
|       "metadata": {},
 | ||
|       "outputs": [
 | ||
|         {
 | ||
|           "name": "stdout",
 | ||
|           "output_type": "stream",
 | ||
|           "text": [
 | ||
|             "\u001b[36m> Response: You're thinking of a Llama again!\n",
 | ||
|             "\n",
 | ||
|             "Is that correct?\u001b[0m\n"
 | ||
|           ]
 | ||
|         }
 | ||
|       ],
 | ||
|       "source": [
 | ||
|         "from llama_stack_client import LlamaStackClient\n",
 | ||
|         "from llama_stack_client.types import CompletionMessage, UserMessage\n",
 | ||
|         "from termcolor import cprint\n",
 | ||
|         "\n",
 | ||
|         "client = LlamaStackClient(base_url=f'http://{HOST}:{PORT}')\n",
 | ||
|         "\n",
 | ||
|         "response = client.chat.completions.create(\n",
 | ||
|         "    messages=[\n",
 | ||
|         "    {\"role\": \"user\", \"content\": 'Have shorter, spear-shaped ears.'},\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"assistant\",\n",
 | ||
|         "        \"content\": \"That's Alpaca!\",\n",
 | ||
|         "        \"stop_reason\": 'end_of_message',\n",
 | ||
|         "        \"tool_calls\": []\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"user\",\n",
 | ||
|         "        \"content\": 'Known for their calm nature and used as pack animals in mountainous regions.'\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"assistant\",\n",
 | ||
|         "        \"content\": \"That's Llama!\",\n",
 | ||
|         "        \"stop_reason\": 'end_of_message',\n",
 | ||
|         "        \"tool_calls\": []\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"user\",\n",
 | ||
|         "        \"content\": 'Has a straight, slender neck and is smaller in size compared to its relative.'\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"assistant\",\n",
 | ||
|         "        \"content\": \"That's Alpaca!\",\n",
 | ||
|         "        \"stop_reason\": 'end_of_message',\n",
 | ||
|         "        \"tool_calls\": []\n",
 | ||
|         "    },\n",
 | ||
|         "    {\n",
 | ||
|         "        \"role\": \"user\",\n",
 | ||
|         "        \"content\": 'Generally taller and more robust, commonly seen as guard animals.'\n",
 | ||
|         "    }\n",
 | ||
|         "],\n",
 | ||
|         "    model=MODEL_NAME,\n",
 | ||
|         ")\n",
 | ||
|         "\n",
 | ||
|         "cprint(f'> Response: {response.choices[0].message.content}', 'cyan')"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "code",
 | ||
|       "execution_count": 16,
 | ||
|       "id": "a38dcb91",
 | ||
|       "metadata": {},
 | ||
|       "outputs": [],
 | ||
|       "source": [
 | ||
|         "#fin"
 | ||
|       ]
 | ||
|     },
 | ||
|     {
 | ||
|       "cell_type": "markdown",
 | ||
|       "id": "76d053b8",
 | ||
|       "metadata": {},
 | ||
|       "source": [
 | ||
|         "Thanks for checking out this notebook! \n",
 | ||
|         "\n",
 | ||
|         "The next one will be a guide on how to chat with images, continue to the notebook [here](./03_Image_Chat101.ipynb). Happy learning!"
 | ||
|       ]
 | ||
|     }
 | ||
|   ],
 | ||
|   "metadata": {
 | ||
|     "fileHeader": "",
 | ||
|     "fileUid": "b1b93b6e-22a2-4c24-8cb0-161fdafff29a",
 | ||
|     "isAdHoc": false,
 | ||
|     "kernelspec": {
 | ||
|       "display_name": "base",
 | ||
|       "language": "python",
 | ||
|       "name": "python3"
 | ||
|     },
 | ||
|     "language_info": {
 | ||
|       "codemirror_mode": {
 | ||
|         "name": "ipython",
 | ||
|         "version": 3
 | ||
|       },
 | ||
|       "file_extension": ".py",
 | ||
|       "mimetype": "text/x-python",
 | ||
|       "name": "python",
 | ||
|       "nbconvert_exporter": "python",
 | ||
|       "pygments_lexer": "ipython3",
 | ||
|       "version": "3.12.2"
 | ||
|     }
 | ||
|   },
 | ||
|   "nbformat": 4,
 | ||
|   "nbformat_minor": 5
 | ||
| }
 |