From bfb04cdc0f49543bae36690551d9e4b4677262b5 Mon Sep 17 00:00:00 2001 From: Justin Lee Date: Tue, 5 Nov 2024 15:05:48 -0800 Subject: [PATCH] improvement on prompt_engineering --- .../01_Prompt_Engineering101.ipynb | 333 +++++++----------- .../_archive_01_Prompt_Engineering101.ipynb | 312 ++++++++++++++++ docs/zero_to_hero_guide/quickstart.md | 16 +- 3 files changed, 448 insertions(+), 213 deletions(-) create mode 100644 docs/zero_to_hero_guide/_archive_01_Prompt_Engineering101.ipynb diff --git a/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb b/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb index 681c2b8a8..167a43b70 100644 --- a/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb +++ b/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb @@ -1,312 +1,233 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", + "id": "cd96f85a", "metadata": {}, "source": [ "\"Open\n", "\n", - "# Prompt Engineering with Llama 3.1\n", + "# Prompt Engineering with Llama Stack\n", "\n", "Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n", "\n", - "This interactive guide covers prompt engineering & best practices with Llama 3.1." + "This interactive guide covers prompt engineering & best practices with Llama 3.1 and Llama Stack" ] }, { - "attachments": {}, "cell_type": "markdown", + "id": "3e1ef1c9", "metadata": {}, "source": [ - "## Introduction" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Why now?\n", + "## Few-Shot Inference for LLMs\n", "\n", - "[Vaswani et al. (2017)](https://arxiv.org/abs/1706.03762) introduced the world to transformer neural networks (originally for machine translation). Transformers ushered an era of generative AI with diffusion models for image creation and large language models (`LLMs`) as **programmable deep learning networks**.\n", + "This guide provides instructions on how to use Llama Stack’s `chat_completion` API with a few-shot learning approach to enhance text generation. Few-shot examples enable the model to recognize patterns by providing labeled prompts, allowing it to complete tasks based on minimal prior examples.\n", "\n", - "Programming foundational LLMs is done with natural language – it doesn't require training/tuning like ML models of the past. This has opened the door to a massive amount of innovation and a paradigm shift in how technology can be deployed. The science/art of using natural language to program language models to accomplish a task is referred to as **Prompt Engineering**." + "### Overview\n", + "\n", + "Few-shot learning provides the model with multiple examples of input-output pairs. This is particularly useful for guiding the model's behavior in specific tasks, helping it understand the desired completion format and content based on a few sample interactions.\n", + "\n", + "### Implementation" ] }, { - "attachments": {}, "cell_type": "markdown", + "id": "a7a25a7e", "metadata": {}, "source": [ - "## Prompting Techniques" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explicit Instructions\n", + "#### 1. Initialize the Client\n", "\n", - "Detailed, explicit instructions produce better results than open-ended prompts:" + "Begin by setting up the `LlamaStackClient` to connect to the inference endpoint.\n" ] }, { "cell_type": "code", "execution_count": null, + "id": "c2a0e359", "metadata": {}, "outputs": [], "source": [ - "complete_and_print(prompt=\"Describe quantum physics in one short sentence of no more than 12 words\")\n", - "# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously." + "from llama_stack_client import LlamaStackClient\n", + "\n", + "client = LlamaStackClient(base_url='http://localhost:5000')" ] }, { - "attachments": {}, "cell_type": "markdown", + "id": "02cdf3f6", "metadata": {}, "source": [ - "You can think about giving explicit instructions as using rules and restrictions to how Llama 3 responds to your prompt.\n", + "#### 2. Define Few-Shot Examples\n", "\n", - "- Stylization\n", - " - `Explain this to me like a topic on a children's educational network show teaching elementary students.`\n", - " - `I'm a software engineer using large language models for summarization. Summarize the following text in under 250 words:`\n", - " - `Give your answer like an old timey private investigator hunting down a case step by step.`\n", - "- Formatting\n", - " - `Use bullet points.`\n", - " - `Return as a JSON object.`\n", - " - `Use less technical terms and help me apply it in my work in communications.`\n", - "- Restrictions\n", - " - `Only use academic papers.`\n", - " - `Never give sources older than 2020.`\n", - " - `If you don't know the answer, say that you don't know.`\n", - "\n", - "Here's an example of giving explicit instructions to give more specific results by limiting the responses to recently created sources." + "Construct a series of labeled `UserMessage` and `CompletionMessage` instances to demonstrate the task to the model. Each `UserMessage` represents an input prompt, and each `CompletionMessage` is the desired output. The model uses these examples to infer the appropriate response patterns.\n" ] }, { "cell_type": "code", "execution_count": null, + "id": "da140b33", "metadata": {}, "outputs": [], "source": [ - "complete_and_print(\"Explain the latest advances in large language models to me.\")\n", - "# More likely to cite sources from 2017\n", + "from llama_stack_client.types import CompletionMessage, UserMessage\n", "\n", - "complete_and_print(\"Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.\")\n", - "# Gives more specific advances and only cites sources from 2020" + "few_shot_examples = messages=[\n", + " UserMessage(content='Have shorter, spear-shaped ears.', role='user'),\n", + " CompletionMessage(\n", + " content=\"That's Alpaca!\",\n", + " role='assistant',\n", + " stop_reason='end_of_message',\n", + " tool_calls=[],\n", + " ),\n", + " UserMessage(\n", + " content='Known for their calm nature and used as pack animals in mountainous regions.',\n", + " role='user',\n", + " ),\n", + " CompletionMessage(\n", + " content=\"That's Llama!\",\n", + " role='assistant',\n", + " stop_reason='end_of_message',\n", + " tool_calls=[],\n", + " ),\n", + " UserMessage(\n", + " content='Has a straight, slender neck and is smaller in size compared to its relative.',\n", + " role='user',\n", + " ),\n", + " CompletionMessage(\n", + " content=\"That's Alpaca!\",\n", + " role='assistant',\n", + " stop_reason='end_of_message',\n", + " tool_calls=[],\n", + " ),\n", + " UserMessage(\n", + " content='Generally taller and more robust, commonly seen as guard animals.',\n", + " role='user',\n", + " ),\n", + "]" ] }, { - "attachments": {}, "cell_type": "markdown", + "id": "6eece9cc", "metadata": {}, "source": [ - "### Example Prompting using Zero- and Few-Shot Learning\n", + "#### Note\n", + "- **Few-Shot Examples**: These examples show the model the correct responses for specific prompts.\n", + "- **CompletionMessage**: This defines the model's expected completion for each prompt.\n" + ] + }, + { + "cell_type": "markdown", + "id": "5a0de6c7", + "metadata": {}, + "source": [ + "#### 3. Invoke `chat_completion` with Few-Shot Examples\n", "\n", - "A shot is an example or demonstration of what type of prompt and response you expect from a large language model. This term originates from training computer vision models on photographs, where one shot was one example or instance that the model used to classify an image ([Fei-Fei et al. (2006)](http://vision.stanford.edu/documents/Fei-FeiFergusPerona2006.pdf)).\n", - "\n", - "#### Zero-Shot Prompting\n", - "\n", - "Large language models like Llama 3 are unique because they are capable of following instructions and producing responses without having previously seen an example of a task. Prompting without examples is called \"zero-shot prompting\".\n", - "\n", - "Let's try using Llama 3 as a sentiment detector. You may notice that output format varies - we can improve this with better prompting." + "Use the few-shot examples as the message input for `chat_completion`. The model will use the examples to generate contextually appropriate responses, allowing it to infer and complete new queries in a similar format.\n" ] }, { "cell_type": "code", "execution_count": null, + "id": "8b321089", "metadata": {}, "outputs": [], "source": [ - "complete_and_print(\"Text: This was the best movie I've ever seen! \\n The sentiment of the text is: \")\n", - "# Returns positive sentiment\n", - "\n", - "complete_and_print(\"Text: The director was trying too hard. \\n The sentiment of the text is: \")\n", - "# Returns negative sentiment" + "response = client.inference.chat_completion(\n", + " messages=few_shot_examples, model='Llama3.2-11B-Vision-Instruct'\n", + ")" ] }, { - "attachments": {}, "cell_type": "markdown", + "id": "063265d2", "metadata": {}, "source": [ + "#### 4. Display the Model’s Response\n", "\n", - "#### Few-Shot Prompting\n", - "\n", - "Adding specific examples of your desired output generally results in more accurate, consistent output. This technique is called \"few-shot prompting\".\n", - "\n", - "In this example, the generated response follows our desired format that offers a more nuanced sentiment classifer that gives a positive, neutral, and negative response confidence percentage.\n", - "\n", - "See also: [Zhao et al. (2021)](https://arxiv.org/abs/2102.09690), [Liu et al. (2021)](https://arxiv.org/abs/2101.06804), [Su et al. (2022)](https://arxiv.org/abs/2209.01975), [Rubin et al. (2022)](https://arxiv.org/abs/2112.08633).\n", - "\n" + "The `completion_message` contains the assistant’s generated content based on the few-shot examples provided. Output this content to see the model's response directly in the console.\n" ] }, { "cell_type": "code", "execution_count": null, + "id": "4ac1ac3e", "metadata": {}, "outputs": [], "source": [ - "def sentiment(text):\n", - " response = chat_completion(messages=[\n", - " user(\"You are a sentiment classifier. For each message, give the percentage of positive/netural/negative.\"),\n", - " user(\"I liked it\"),\n", - " assistant(\"70% positive 30% neutral 0% negative\"),\n", - " user(\"It could be better\"),\n", - " assistant(\"0% positive 50% neutral 50% negative\"),\n", - " user(\"It's fine\"),\n", - " assistant(\"25% positive 50% neutral 25% negative\"),\n", - " user(text),\n", - " ])\n", - " return response\n", + "from termcolor import cprint\n", "\n", - "def print_sentiment(text):\n", - " print(f'INPUT: {text}')\n", - " print(sentiment(text))\n", - "\n", - "print_sentiment(\"I thought it was okay\")\n", - "# More likely to return a balanced mix of positive, neutral, and negative\n", - "print_sentiment(\"I loved it!\")\n", - "# More likely to return 100% positive\n", - "print_sentiment(\"Terrible service 0/10\")\n", - "# More likely to return 100% negative" + "cprint(f'> Response: {response.completion_message.content}', 'cyan')" ] }, { - "attachments": {}, "cell_type": "markdown", + "id": "d936ab59", "metadata": {}, "source": [ - "### Role Prompting\n", - "\n", - "Llama will often give more consistent responses when given a role ([Kong et al. (2023)](https://browse.arxiv.org/pdf/2308.07702.pdf)). Roles give context to the LLM on what type of answers are desired.\n", - "\n", - "Let's use Llama 3 to create a more focused, technical response for a question around the pros and cons of using PyTorch." + "### Complete code\n", + "Summing it up, here's the code for few-shot implementation with llama-stack:\n" ] }, { "cell_type": "code", "execution_count": null, + "id": "524189bd", "metadata": {}, "outputs": [], "source": [ - "complete_and_print(\"Explain the pros and cons of using PyTorch.\")\n", - "# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curve\n", + "from llama_stack_client import LlamaStackClient\n", + "from llama_stack_client.types import CompletionMessage, UserMessage\n", + "from termcolor import cprint\n", "\n", - "complete_and_print(\"Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.\")\n", - "# Often results in more technical benefits and drawbacks that provide more technical details on how model layers" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Chain-of-Thought\n", + "client = LlamaStackClient(base_url='http://localhost:5000')\n", "\n", - "Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting.\n", + "response = client.inference.chat_completion(\n", + " messages=[\n", + " UserMessage(content='Have shorter, spear-shaped ears.', role='user'),\n", + " CompletionMessage(\n", + " content=\"That's Alpaca!\",\n", + " role='assistant',\n", + " stop_reason='end_of_message',\n", + " tool_calls=[],\n", + " ),\n", + " UserMessage(\n", + " content='Known for their calm nature and used as pack animals in mountainous regions.',\n", + " role='user',\n", + " ),\n", + " CompletionMessage(\n", + " content=\"That's Llama!\",\n", + " role='assistant',\n", + " stop_reason='end_of_message',\n", + " tool_calls=[],\n", + " ),\n", + " UserMessage(\n", + " content='Has a straight, slender neck and is smaller in size compared to its relative.',\n", + " role='user',\n", + " ),\n", + " CompletionMessage(\n", + " content=\"That's Alpaca!\",\n", + " role='assistant',\n", + " stop_reason='end_of_message',\n", + " tool_calls=[],\n", + " ),\n", + " UserMessage(\n", + " content='Generally taller and more robust, commonly seen as guard animals.',\n", + " role='user',\n", + " ),\n", + " ],\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + ")\n", "\n", - "Llama 3.1 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "prompt = \"Who lived longer, Mozart or Elvis?\"\n", - "\n", - "complete_and_print(prompt)\n", - "# Llama 2 would often give the incorrect answer of \"Mozart\"\n", - "\n", - "complete_and_print(f\"{prompt} Let's think through this carefully, step by step.\")\n", - "# Gives the correct answer \"Elvis\"" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Self-Consistency\n", - "\n", - "LLMs are probablistic, so even with Chain-of-Thought, a single generation might produce incorrect results. Self-Consistency ([Wang et al. (2022)](https://arxiv.org/abs/2203.11171)) introduces enhanced accuracy by selecting the most frequent answer from multiple generations (at the cost of higher compute):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import re\n", - "from statistics import mode\n", - "\n", - "def gen_answer():\n", - " response = completion(\n", - " \"John found that the average of 15 numbers is 40.\"\n", - " \"If 10 is added to each number then the mean of the numbers is?\"\n", - " \"Report the answer surrounded by backticks (example: `123`)\",\n", - " )\n", - " match = re.search(r'`(\\d+)`', response)\n", - " if match is None:\n", - " return None\n", - " return match.group(1)\n", - "\n", - "answers = [gen_answer() for i in range(5)]\n", - "\n", - "print(\n", - " f\"Answers: {answers}\\n\",\n", - " f\"Final answer: {mode(answers)}\",\n", - " )\n", - "\n", - "# Sample runs of Llama-3-70B (all correct):\n", - "# ['60', '50', '50', '50', '50'] -> 50\n", - "# ['50', '50', '50', '60', '50'] -> 50\n", - "# ['50', '50', '60', '50', '50'] -> 50" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Author & Contact\n", - "\n", - "Edited by [Dalton Flanagan](https://www.linkedin.com/in/daltonflanagan/) (dalton@meta.com) with contributions from Mohsen Agsen, Bryce Bortree, Ricardo Juan Palma Duran, Kaolin Fire, Thomas Scialom." + "cprint(f'> Response: {response.completion_message.content}', 'cyan')" ] } ], "metadata": { - "captumWidgetMessage": [], - "dataExplorerConfig": [], - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.14" - }, - "last_base_url": "https://bento.edge.x2p.facebook.net/", - "last_kernel_id": "161e2a7b-2d2b-4995-87f3-d1539860ecac", - "last_msg_id": "4eab1242-d815b886ebe4f5b1966da982_543", - "last_server_session_id": "4a7b41c5-ed66-4dcb-a376-22673aebb469", - "operator_data": [], - "outputWidgetContext": [] + "name": "python" + } }, "nbformat": 4, - "nbformat_minor": 4 + "nbformat_minor": 5 } diff --git a/docs/zero_to_hero_guide/_archive_01_Prompt_Engineering101.ipynb b/docs/zero_to_hero_guide/_archive_01_Prompt_Engineering101.ipynb new file mode 100644 index 000000000..681c2b8a8 --- /dev/null +++ b/docs/zero_to_hero_guide/_archive_01_Prompt_Engineering101.ipynb @@ -0,0 +1,312 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Open\n", + "\n", + "# Prompt Engineering with Llama 3.1\n", + "\n", + "Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n", + "\n", + "This interactive guide covers prompt engineering & best practices with Llama 3.1." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Why now?\n", + "\n", + "[Vaswani et al. (2017)](https://arxiv.org/abs/1706.03762) introduced the world to transformer neural networks (originally for machine translation). Transformers ushered an era of generative AI with diffusion models for image creation and large language models (`LLMs`) as **programmable deep learning networks**.\n", + "\n", + "Programming foundational LLMs is done with natural language – it doesn't require training/tuning like ML models of the past. This has opened the door to a massive amount of innovation and a paradigm shift in how technology can be deployed. The science/art of using natural language to program language models to accomplish a task is referred to as **Prompt Engineering**." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prompting Techniques" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Explicit Instructions\n", + "\n", + "Detailed, explicit instructions produce better results than open-ended prompts:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "complete_and_print(prompt=\"Describe quantum physics in one short sentence of no more than 12 words\")\n", + "# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can think about giving explicit instructions as using rules and restrictions to how Llama 3 responds to your prompt.\n", + "\n", + "- Stylization\n", + " - `Explain this to me like a topic on a children's educational network show teaching elementary students.`\n", + " - `I'm a software engineer using large language models for summarization. Summarize the following text in under 250 words:`\n", + " - `Give your answer like an old timey private investigator hunting down a case step by step.`\n", + "- Formatting\n", + " - `Use bullet points.`\n", + " - `Return as a JSON object.`\n", + " - `Use less technical terms and help me apply it in my work in communications.`\n", + "- Restrictions\n", + " - `Only use academic papers.`\n", + " - `Never give sources older than 2020.`\n", + " - `If you don't know the answer, say that you don't know.`\n", + "\n", + "Here's an example of giving explicit instructions to give more specific results by limiting the responses to recently created sources." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "complete_and_print(\"Explain the latest advances in large language models to me.\")\n", + "# More likely to cite sources from 2017\n", + "\n", + "complete_and_print(\"Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.\")\n", + "# Gives more specific advances and only cites sources from 2020" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Example Prompting using Zero- and Few-Shot Learning\n", + "\n", + "A shot is an example or demonstration of what type of prompt and response you expect from a large language model. This term originates from training computer vision models on photographs, where one shot was one example or instance that the model used to classify an image ([Fei-Fei et al. (2006)](http://vision.stanford.edu/documents/Fei-FeiFergusPerona2006.pdf)).\n", + "\n", + "#### Zero-Shot Prompting\n", + "\n", + "Large language models like Llama 3 are unique because they are capable of following instructions and producing responses without having previously seen an example of a task. Prompting without examples is called \"zero-shot prompting\".\n", + "\n", + "Let's try using Llama 3 as a sentiment detector. You may notice that output format varies - we can improve this with better prompting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "complete_and_print(\"Text: This was the best movie I've ever seen! \\n The sentiment of the text is: \")\n", + "# Returns positive sentiment\n", + "\n", + "complete_and_print(\"Text: The director was trying too hard. \\n The sentiment of the text is: \")\n", + "# Returns negative sentiment" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Few-Shot Prompting\n", + "\n", + "Adding specific examples of your desired output generally results in more accurate, consistent output. This technique is called \"few-shot prompting\".\n", + "\n", + "In this example, the generated response follows our desired format that offers a more nuanced sentiment classifer that gives a positive, neutral, and negative response confidence percentage.\n", + "\n", + "See also: [Zhao et al. (2021)](https://arxiv.org/abs/2102.09690), [Liu et al. (2021)](https://arxiv.org/abs/2101.06804), [Su et al. (2022)](https://arxiv.org/abs/2209.01975), [Rubin et al. (2022)](https://arxiv.org/abs/2112.08633).\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sentiment(text):\n", + " response = chat_completion(messages=[\n", + " user(\"You are a sentiment classifier. For each message, give the percentage of positive/netural/negative.\"),\n", + " user(\"I liked it\"),\n", + " assistant(\"70% positive 30% neutral 0% negative\"),\n", + " user(\"It could be better\"),\n", + " assistant(\"0% positive 50% neutral 50% negative\"),\n", + " user(\"It's fine\"),\n", + " assistant(\"25% positive 50% neutral 25% negative\"),\n", + " user(text),\n", + " ])\n", + " return response\n", + "\n", + "def print_sentiment(text):\n", + " print(f'INPUT: {text}')\n", + " print(sentiment(text))\n", + "\n", + "print_sentiment(\"I thought it was okay\")\n", + "# More likely to return a balanced mix of positive, neutral, and negative\n", + "print_sentiment(\"I loved it!\")\n", + "# More likely to return 100% positive\n", + "print_sentiment(\"Terrible service 0/10\")\n", + "# More likely to return 100% negative" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Role Prompting\n", + "\n", + "Llama will often give more consistent responses when given a role ([Kong et al. (2023)](https://browse.arxiv.org/pdf/2308.07702.pdf)). Roles give context to the LLM on what type of answers are desired.\n", + "\n", + "Let's use Llama 3 to create a more focused, technical response for a question around the pros and cons of using PyTorch." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "complete_and_print(\"Explain the pros and cons of using PyTorch.\")\n", + "# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curve\n", + "\n", + "complete_and_print(\"Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.\")\n", + "# Often results in more technical benefits and drawbacks that provide more technical details on how model layers" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Chain-of-Thought\n", + "\n", + "Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting.\n", + "\n", + "Llama 3.1 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prompt = \"Who lived longer, Mozart or Elvis?\"\n", + "\n", + "complete_and_print(prompt)\n", + "# Llama 2 would often give the incorrect answer of \"Mozart\"\n", + "\n", + "complete_and_print(f\"{prompt} Let's think through this carefully, step by step.\")\n", + "# Gives the correct answer \"Elvis\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Self-Consistency\n", + "\n", + "LLMs are probablistic, so even with Chain-of-Thought, a single generation might produce incorrect results. Self-Consistency ([Wang et al. (2022)](https://arxiv.org/abs/2203.11171)) introduces enhanced accuracy by selecting the most frequent answer from multiple generations (at the cost of higher compute):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "from statistics import mode\n", + "\n", + "def gen_answer():\n", + " response = completion(\n", + " \"John found that the average of 15 numbers is 40.\"\n", + " \"If 10 is added to each number then the mean of the numbers is?\"\n", + " \"Report the answer surrounded by backticks (example: `123`)\",\n", + " )\n", + " match = re.search(r'`(\\d+)`', response)\n", + " if match is None:\n", + " return None\n", + " return match.group(1)\n", + "\n", + "answers = [gen_answer() for i in range(5)]\n", + "\n", + "print(\n", + " f\"Answers: {answers}\\n\",\n", + " f\"Final answer: {mode(answers)}\",\n", + " )\n", + "\n", + "# Sample runs of Llama-3-70B (all correct):\n", + "# ['60', '50', '50', '50', '50'] -> 50\n", + "# ['50', '50', '50', '60', '50'] -> 50\n", + "# ['50', '50', '60', '50', '50'] -> 50" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Author & Contact\n", + "\n", + "Edited by [Dalton Flanagan](https://www.linkedin.com/in/daltonflanagan/) (dalton@meta.com) with contributions from Mohsen Agsen, Bryce Bortree, Ricardo Juan Palma Duran, Kaolin Fire, Thomas Scialom." + ] + } + ], + "metadata": { + "captumWidgetMessage": [], + "dataExplorerConfig": [], + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + }, + "last_base_url": "https://bento.edge.x2p.facebook.net/", + "last_kernel_id": "161e2a7b-2d2b-4995-87f3-d1539860ecac", + "last_msg_id": "4eab1242-d815b886ebe4f5b1966da982_543", + "last_server_session_id": "4a7b41c5-ed66-4dcb-a376-22673aebb469", + "operator_data": [], + "outputWidgetContext": [] + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/zero_to_hero_guide/quickstart.md b/docs/zero_to_hero_guide/quickstart.md index 3bc11285e..4b53812f9 100644 --- a/docs/zero_to_hero_guide/quickstart.md +++ b/docs/zero_to_hero_guide/quickstart.md @@ -2,6 +2,8 @@ This guide will walk you through setting up an end-to-end workflow with Llama Stack, enabling you to perform text generation using the `Llama3.2-11B-Vision-Instruct` model. Follow these steps to get started quickly. +If you're looking for more specific topics like tool calling or agent setup, we have a [Zero to Hero Guide](#next-steps) that covers everything from Tool Calling to Agents in detail. Feel free to skip to the end to explore the advanced topics you're interested in. + ## Table of Contents 1. [Prerequisite](#prerequisite) 2. [Installation](#installation) @@ -19,7 +21,6 @@ Ensure you have the following installed on your system: - **Conda**: A package, dependency, and environment management tool. - --- ## Installation @@ -52,7 +53,7 @@ llama download --model-id Llama3.2-11B-Vision-Instruct ### 1. Build the Llama Stack Distribution -We will default into building a `meta-reference-gpu` distribution, however you could read more about the different distriubtion [here](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/index.html). +We will default into building a `meta-reference-gpu` distribution, however you could read more about the different distriubtion [here](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#decide-your-inference-provider). ```bash llama stack build --template meta-reference-gpu --image-type conda @@ -156,9 +157,10 @@ With these steps, you should have a functional Llama Stack setup capable of gene ## Next Steps -- **Explore Other Guides**: Dive deeper into specific topics by following these guides: +**Explore Other Guides**: Dive deeper into specific topics by following these guides: +- [Understanding Distribution](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#decide-your-inference-provider) - [Inference 101](00_Inference101.ipynb) -- [Simple switch between local and cloud model](00_Local_Cloud_Inference101.ipynb) +- [Local and Cloud Model Toggling 101](00_Local_Cloud_Inference101.ipynb) - [Prompt Engineering](01_Prompt_Engineering101.ipynb) - [Chat with Image - LlamaStack Vision API](02_Image_Chat101.ipynb) - [Tool Calling: How to and Details](03_Tool_Calling101.ipynb) @@ -167,15 +169,15 @@ With these steps, you should have a functional Llama Stack setup capable of gene - [Agents API: Explain Components](06_Agents101.ipynb) -- **Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications: +**Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications: - [Python SDK](https://github.com/meta-llama/llama-stack-client-python) - [Node SDK](https://github.com/meta-llama/llama-stack-client-node) - [Swift SDK](https://github.com/meta-llama/llama-stack-client-swift) - [Kotlin SDK](https://github.com/meta-llama/llama-stack-client-kotlin) -- **Advanced Configuration**: Learn how to customize your Llama Stack distribution by referring to the [Building a Llama Stack Distribution](./building_distro.md) guide. +**Advanced Configuration**: Learn how to customize your Llama Stack distribution by referring to the [Building a Llama Stack Distribution](./building_distro.md) guide. -- **Explore Example Apps**: Check out [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) for example applications built using Llama Stack. +**Explore Example Apps**: Check out [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) for example applications built using Llama Stack. ---