diff --git a/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb b/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb
index 681c2b8a8..167a43b70 100644
--- a/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb
+++ b/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb
@@ -1,312 +1,233 @@
{
"cells": [
{
- "attachments": {},
"cell_type": "markdown",
+ "id": "cd96f85a",
"metadata": {},
"source": [
"
\n",
"\n",
- "# Prompt Engineering with Llama 3.1\n",
+ "# Prompt Engineering with Llama Stack\n",
"\n",
"Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n",
"\n",
- "This interactive guide covers prompt engineering & best practices with Llama 3.1."
+ "This interactive guide covers prompt engineering & best practices with Llama 3.1 and Llama Stack"
]
},
{
- "attachments": {},
"cell_type": "markdown",
+ "id": "3e1ef1c9",
"metadata": {},
"source": [
- "## Introduction"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Why now?\n",
+ "## Few-Shot Inference for LLMs\n",
"\n",
- "[Vaswani et al. (2017)](https://arxiv.org/abs/1706.03762) introduced the world to transformer neural networks (originally for machine translation). Transformers ushered an era of generative AI with diffusion models for image creation and large language models (`LLMs`) as **programmable deep learning networks**.\n",
+ "This guide provides instructions on how to use Llama Stack’s `chat_completion` API with a few-shot learning approach to enhance text generation. Few-shot examples enable the model to recognize patterns by providing labeled prompts, allowing it to complete tasks based on minimal prior examples.\n",
"\n",
- "Programming foundational LLMs is done with natural language – it doesn't require training/tuning like ML models of the past. This has opened the door to a massive amount of innovation and a paradigm shift in how technology can be deployed. The science/art of using natural language to program language models to accomplish a task is referred to as **Prompt Engineering**."
+ "### Overview\n",
+ "\n",
+ "Few-shot learning provides the model with multiple examples of input-output pairs. This is particularly useful for guiding the model's behavior in specific tasks, helping it understand the desired completion format and content based on a few sample interactions.\n",
+ "\n",
+ "### Implementation"
]
},
{
- "attachments": {},
"cell_type": "markdown",
+ "id": "a7a25a7e",
"metadata": {},
"source": [
- "## Prompting Techniques"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Explicit Instructions\n",
+ "#### 1. Initialize the Client\n",
"\n",
- "Detailed, explicit instructions produce better results than open-ended prompts:"
+ "Begin by setting up the `LlamaStackClient` to connect to the inference endpoint.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
+ "id": "c2a0e359",
"metadata": {},
"outputs": [],
"source": [
- "complete_and_print(prompt=\"Describe quantum physics in one short sentence of no more than 12 words\")\n",
- "# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously."
+ "from llama_stack_client import LlamaStackClient\n",
+ "\n",
+ "client = LlamaStackClient(base_url='http://localhost:5000')"
]
},
{
- "attachments": {},
"cell_type": "markdown",
+ "id": "02cdf3f6",
"metadata": {},
"source": [
- "You can think about giving explicit instructions as using rules and restrictions to how Llama 3 responds to your prompt.\n",
+ "#### 2. Define Few-Shot Examples\n",
"\n",
- "- Stylization\n",
- " - `Explain this to me like a topic on a children's educational network show teaching elementary students.`\n",
- " - `I'm a software engineer using large language models for summarization. Summarize the following text in under 250 words:`\n",
- " - `Give your answer like an old timey private investigator hunting down a case step by step.`\n",
- "- Formatting\n",
- " - `Use bullet points.`\n",
- " - `Return as a JSON object.`\n",
- " - `Use less technical terms and help me apply it in my work in communications.`\n",
- "- Restrictions\n",
- " - `Only use academic papers.`\n",
- " - `Never give sources older than 2020.`\n",
- " - `If you don't know the answer, say that you don't know.`\n",
- "\n",
- "Here's an example of giving explicit instructions to give more specific results by limiting the responses to recently created sources."
+ "Construct a series of labeled `UserMessage` and `CompletionMessage` instances to demonstrate the task to the model. Each `UserMessage` represents an input prompt, and each `CompletionMessage` is the desired output. The model uses these examples to infer the appropriate response patterns.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
+ "id": "da140b33",
"metadata": {},
"outputs": [],
"source": [
- "complete_and_print(\"Explain the latest advances in large language models to me.\")\n",
- "# More likely to cite sources from 2017\n",
+ "from llama_stack_client.types import CompletionMessage, UserMessage\n",
"\n",
- "complete_and_print(\"Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.\")\n",
- "# Gives more specific advances and only cites sources from 2020"
+ "few_shot_examples = messages=[\n",
+ " UserMessage(content='Have shorter, spear-shaped ears.', role='user'),\n",
+ " CompletionMessage(\n",
+ " content=\"That's Alpaca!\",\n",
+ " role='assistant',\n",
+ " stop_reason='end_of_message',\n",
+ " tool_calls=[],\n",
+ " ),\n",
+ " UserMessage(\n",
+ " content='Known for their calm nature and used as pack animals in mountainous regions.',\n",
+ " role='user',\n",
+ " ),\n",
+ " CompletionMessage(\n",
+ " content=\"That's Llama!\",\n",
+ " role='assistant',\n",
+ " stop_reason='end_of_message',\n",
+ " tool_calls=[],\n",
+ " ),\n",
+ " UserMessage(\n",
+ " content='Has a straight, slender neck and is smaller in size compared to its relative.',\n",
+ " role='user',\n",
+ " ),\n",
+ " CompletionMessage(\n",
+ " content=\"That's Alpaca!\",\n",
+ " role='assistant',\n",
+ " stop_reason='end_of_message',\n",
+ " tool_calls=[],\n",
+ " ),\n",
+ " UserMessage(\n",
+ " content='Generally taller and more robust, commonly seen as guard animals.',\n",
+ " role='user',\n",
+ " ),\n",
+ "]"
]
},
{
- "attachments": {},
"cell_type": "markdown",
+ "id": "6eece9cc",
"metadata": {},
"source": [
- "### Example Prompting using Zero- and Few-Shot Learning\n",
+ "#### Note\n",
+ "- **Few-Shot Examples**: These examples show the model the correct responses for specific prompts.\n",
+ "- **CompletionMessage**: This defines the model's expected completion for each prompt.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a0de6c7",
+ "metadata": {},
+ "source": [
+ "#### 3. Invoke `chat_completion` with Few-Shot Examples\n",
"\n",
- "A shot is an example or demonstration of what type of prompt and response you expect from a large language model. This term originates from training computer vision models on photographs, where one shot was one example or instance that the model used to classify an image ([Fei-Fei et al. (2006)](http://vision.stanford.edu/documents/Fei-FeiFergusPerona2006.pdf)).\n",
- "\n",
- "#### Zero-Shot Prompting\n",
- "\n",
- "Large language models like Llama 3 are unique because they are capable of following instructions and producing responses without having previously seen an example of a task. Prompting without examples is called \"zero-shot prompting\".\n",
- "\n",
- "Let's try using Llama 3 as a sentiment detector. You may notice that output format varies - we can improve this with better prompting."
+ "Use the few-shot examples as the message input for `chat_completion`. The model will use the examples to generate contextually appropriate responses, allowing it to infer and complete new queries in a similar format.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
+ "id": "8b321089",
"metadata": {},
"outputs": [],
"source": [
- "complete_and_print(\"Text: This was the best movie I've ever seen! \\n The sentiment of the text is: \")\n",
- "# Returns positive sentiment\n",
- "\n",
- "complete_and_print(\"Text: The director was trying too hard. \\n The sentiment of the text is: \")\n",
- "# Returns negative sentiment"
+ "response = client.inference.chat_completion(\n",
+ " messages=few_shot_examples, model='Llama3.2-11B-Vision-Instruct'\n",
+ ")"
]
},
{
- "attachments": {},
"cell_type": "markdown",
+ "id": "063265d2",
"metadata": {},
"source": [
+ "#### 4. Display the Model’s Response\n",
"\n",
- "#### Few-Shot Prompting\n",
- "\n",
- "Adding specific examples of your desired output generally results in more accurate, consistent output. This technique is called \"few-shot prompting\".\n",
- "\n",
- "In this example, the generated response follows our desired format that offers a more nuanced sentiment classifer that gives a positive, neutral, and negative response confidence percentage.\n",
- "\n",
- "See also: [Zhao et al. (2021)](https://arxiv.org/abs/2102.09690), [Liu et al. (2021)](https://arxiv.org/abs/2101.06804), [Su et al. (2022)](https://arxiv.org/abs/2209.01975), [Rubin et al. (2022)](https://arxiv.org/abs/2112.08633).\n",
- "\n"
+ "The `completion_message` contains the assistant’s generated content based on the few-shot examples provided. Output this content to see the model's response directly in the console.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
+ "id": "4ac1ac3e",
"metadata": {},
"outputs": [],
"source": [
- "def sentiment(text):\n",
- " response = chat_completion(messages=[\n",
- " user(\"You are a sentiment classifier. For each message, give the percentage of positive/netural/negative.\"),\n",
- " user(\"I liked it\"),\n",
- " assistant(\"70% positive 30% neutral 0% negative\"),\n",
- " user(\"It could be better\"),\n",
- " assistant(\"0% positive 50% neutral 50% negative\"),\n",
- " user(\"It's fine\"),\n",
- " assistant(\"25% positive 50% neutral 25% negative\"),\n",
- " user(text),\n",
- " ])\n",
- " return response\n",
+ "from termcolor import cprint\n",
"\n",
- "def print_sentiment(text):\n",
- " print(f'INPUT: {text}')\n",
- " print(sentiment(text))\n",
- "\n",
- "print_sentiment(\"I thought it was okay\")\n",
- "# More likely to return a balanced mix of positive, neutral, and negative\n",
- "print_sentiment(\"I loved it!\")\n",
- "# More likely to return 100% positive\n",
- "print_sentiment(\"Terrible service 0/10\")\n",
- "# More likely to return 100% negative"
+ "cprint(f'> Response: {response.completion_message.content}', 'cyan')"
]
},
{
- "attachments": {},
"cell_type": "markdown",
+ "id": "d936ab59",
"metadata": {},
"source": [
- "### Role Prompting\n",
- "\n",
- "Llama will often give more consistent responses when given a role ([Kong et al. (2023)](https://browse.arxiv.org/pdf/2308.07702.pdf)). Roles give context to the LLM on what type of answers are desired.\n",
- "\n",
- "Let's use Llama 3 to create a more focused, technical response for a question around the pros and cons of using PyTorch."
+ "### Complete code\n",
+ "Summing it up, here's the code for few-shot implementation with llama-stack:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
+ "id": "524189bd",
"metadata": {},
"outputs": [],
"source": [
- "complete_and_print(\"Explain the pros and cons of using PyTorch.\")\n",
- "# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curve\n",
+ "from llama_stack_client import LlamaStackClient\n",
+ "from llama_stack_client.types import CompletionMessage, UserMessage\n",
+ "from termcolor import cprint\n",
"\n",
- "complete_and_print(\"Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.\")\n",
- "# Often results in more technical benefits and drawbacks that provide more technical details on how model layers"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Chain-of-Thought\n",
+ "client = LlamaStackClient(base_url='http://localhost:5000')\n",
"\n",
- "Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting.\n",
+ "response = client.inference.chat_completion(\n",
+ " messages=[\n",
+ " UserMessage(content='Have shorter, spear-shaped ears.', role='user'),\n",
+ " CompletionMessage(\n",
+ " content=\"That's Alpaca!\",\n",
+ " role='assistant',\n",
+ " stop_reason='end_of_message',\n",
+ " tool_calls=[],\n",
+ " ),\n",
+ " UserMessage(\n",
+ " content='Known for their calm nature and used as pack animals in mountainous regions.',\n",
+ " role='user',\n",
+ " ),\n",
+ " CompletionMessage(\n",
+ " content=\"That's Llama!\",\n",
+ " role='assistant',\n",
+ " stop_reason='end_of_message',\n",
+ " tool_calls=[],\n",
+ " ),\n",
+ " UserMessage(\n",
+ " content='Has a straight, slender neck and is smaller in size compared to its relative.',\n",
+ " role='user',\n",
+ " ),\n",
+ " CompletionMessage(\n",
+ " content=\"That's Alpaca!\",\n",
+ " role='assistant',\n",
+ " stop_reason='end_of_message',\n",
+ " tool_calls=[],\n",
+ " ),\n",
+ " UserMessage(\n",
+ " content='Generally taller and more robust, commonly seen as guard animals.',\n",
+ " role='user',\n",
+ " ),\n",
+ " ],\n",
+ " model='Llama3.2-11B-Vision-Instruct',\n",
+ ")\n",
"\n",
- "Llama 3.1 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "prompt = \"Who lived longer, Mozart or Elvis?\"\n",
- "\n",
- "complete_and_print(prompt)\n",
- "# Llama 2 would often give the incorrect answer of \"Mozart\"\n",
- "\n",
- "complete_and_print(f\"{prompt} Let's think through this carefully, step by step.\")\n",
- "# Gives the correct answer \"Elvis\""
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Self-Consistency\n",
- "\n",
- "LLMs are probablistic, so even with Chain-of-Thought, a single generation might produce incorrect results. Self-Consistency ([Wang et al. (2022)](https://arxiv.org/abs/2203.11171)) introduces enhanced accuracy by selecting the most frequent answer from multiple generations (at the cost of higher compute):"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import re\n",
- "from statistics import mode\n",
- "\n",
- "def gen_answer():\n",
- " response = completion(\n",
- " \"John found that the average of 15 numbers is 40.\"\n",
- " \"If 10 is added to each number then the mean of the numbers is?\"\n",
- " \"Report the answer surrounded by backticks (example: `123`)\",\n",
- " )\n",
- " match = re.search(r'`(\\d+)`', response)\n",
- " if match is None:\n",
- " return None\n",
- " return match.group(1)\n",
- "\n",
- "answers = [gen_answer() for i in range(5)]\n",
- "\n",
- "print(\n",
- " f\"Answers: {answers}\\n\",\n",
- " f\"Final answer: {mode(answers)}\",\n",
- " )\n",
- "\n",
- "# Sample runs of Llama-3-70B (all correct):\n",
- "# ['60', '50', '50', '50', '50'] -> 50\n",
- "# ['50', '50', '50', '60', '50'] -> 50\n",
- "# ['50', '50', '60', '50', '50'] -> 50"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Author & Contact\n",
- "\n",
- "Edited by [Dalton Flanagan](https://www.linkedin.com/in/daltonflanagan/) (dalton@meta.com) with contributions from Mohsen Agsen, Bryce Bortree, Ricardo Juan Palma Duran, Kaolin Fire, Thomas Scialom."
+ "cprint(f'> Response: {response.completion_message.content}', 'cyan')"
]
}
],
"metadata": {
- "captumWidgetMessage": [],
- "dataExplorerConfig": [],
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
"language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.10.14"
- },
- "last_base_url": "https://bento.edge.x2p.facebook.net/",
- "last_kernel_id": "161e2a7b-2d2b-4995-87f3-d1539860ecac",
- "last_msg_id": "4eab1242-d815b886ebe4f5b1966da982_543",
- "last_server_session_id": "4a7b41c5-ed66-4dcb-a376-22673aebb469",
- "operator_data": [],
- "outputWidgetContext": []
+ "name": "python"
+ }
},
"nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 5
}
diff --git a/docs/zero_to_hero_guide/_archive_01_Prompt_Engineering101.ipynb b/docs/zero_to_hero_guide/_archive_01_Prompt_Engineering101.ipynb
new file mode 100644
index 000000000..681c2b8a8
--- /dev/null
+++ b/docs/zero_to_hero_guide/_archive_01_Prompt_Engineering101.ipynb
@@ -0,0 +1,312 @@
+{
+ "cells": [
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
\n",
+ "\n",
+ "# Prompt Engineering with Llama 3.1\n",
+ "\n",
+ "Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n",
+ "\n",
+ "This interactive guide covers prompt engineering & best practices with Llama 3.1."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Introduction"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Why now?\n",
+ "\n",
+ "[Vaswani et al. (2017)](https://arxiv.org/abs/1706.03762) introduced the world to transformer neural networks (originally for machine translation). Transformers ushered an era of generative AI with diffusion models for image creation and large language models (`LLMs`) as **programmable deep learning networks**.\n",
+ "\n",
+ "Programming foundational LLMs is done with natural language – it doesn't require training/tuning like ML models of the past. This has opened the door to a massive amount of innovation and a paradigm shift in how technology can be deployed. The science/art of using natural language to program language models to accomplish a task is referred to as **Prompt Engineering**."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prompting Techniques"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Explicit Instructions\n",
+ "\n",
+ "Detailed, explicit instructions produce better results than open-ended prompts:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "complete_and_print(prompt=\"Describe quantum physics in one short sentence of no more than 12 words\")\n",
+ "# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can think about giving explicit instructions as using rules and restrictions to how Llama 3 responds to your prompt.\n",
+ "\n",
+ "- Stylization\n",
+ " - `Explain this to me like a topic on a children's educational network show teaching elementary students.`\n",
+ " - `I'm a software engineer using large language models for summarization. Summarize the following text in under 250 words:`\n",
+ " - `Give your answer like an old timey private investigator hunting down a case step by step.`\n",
+ "- Formatting\n",
+ " - `Use bullet points.`\n",
+ " - `Return as a JSON object.`\n",
+ " - `Use less technical terms and help me apply it in my work in communications.`\n",
+ "- Restrictions\n",
+ " - `Only use academic papers.`\n",
+ " - `Never give sources older than 2020.`\n",
+ " - `If you don't know the answer, say that you don't know.`\n",
+ "\n",
+ "Here's an example of giving explicit instructions to give more specific results by limiting the responses to recently created sources."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "complete_and_print(\"Explain the latest advances in large language models to me.\")\n",
+ "# More likely to cite sources from 2017\n",
+ "\n",
+ "complete_and_print(\"Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.\")\n",
+ "# Gives more specific advances and only cites sources from 2020"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Example Prompting using Zero- and Few-Shot Learning\n",
+ "\n",
+ "A shot is an example or demonstration of what type of prompt and response you expect from a large language model. This term originates from training computer vision models on photographs, where one shot was one example or instance that the model used to classify an image ([Fei-Fei et al. (2006)](http://vision.stanford.edu/documents/Fei-FeiFergusPerona2006.pdf)).\n",
+ "\n",
+ "#### Zero-Shot Prompting\n",
+ "\n",
+ "Large language models like Llama 3 are unique because they are capable of following instructions and producing responses without having previously seen an example of a task. Prompting without examples is called \"zero-shot prompting\".\n",
+ "\n",
+ "Let's try using Llama 3 as a sentiment detector. You may notice that output format varies - we can improve this with better prompting."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "complete_and_print(\"Text: This was the best movie I've ever seen! \\n The sentiment of the text is: \")\n",
+ "# Returns positive sentiment\n",
+ "\n",
+ "complete_and_print(\"Text: The director was trying too hard. \\n The sentiment of the text is: \")\n",
+ "# Returns negative sentiment"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "#### Few-Shot Prompting\n",
+ "\n",
+ "Adding specific examples of your desired output generally results in more accurate, consistent output. This technique is called \"few-shot prompting\".\n",
+ "\n",
+ "In this example, the generated response follows our desired format that offers a more nuanced sentiment classifer that gives a positive, neutral, and negative response confidence percentage.\n",
+ "\n",
+ "See also: [Zhao et al. (2021)](https://arxiv.org/abs/2102.09690), [Liu et al. (2021)](https://arxiv.org/abs/2101.06804), [Su et al. (2022)](https://arxiv.org/abs/2209.01975), [Rubin et al. (2022)](https://arxiv.org/abs/2112.08633).\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def sentiment(text):\n",
+ " response = chat_completion(messages=[\n",
+ " user(\"You are a sentiment classifier. For each message, give the percentage of positive/netural/negative.\"),\n",
+ " user(\"I liked it\"),\n",
+ " assistant(\"70% positive 30% neutral 0% negative\"),\n",
+ " user(\"It could be better\"),\n",
+ " assistant(\"0% positive 50% neutral 50% negative\"),\n",
+ " user(\"It's fine\"),\n",
+ " assistant(\"25% positive 50% neutral 25% negative\"),\n",
+ " user(text),\n",
+ " ])\n",
+ " return response\n",
+ "\n",
+ "def print_sentiment(text):\n",
+ " print(f'INPUT: {text}')\n",
+ " print(sentiment(text))\n",
+ "\n",
+ "print_sentiment(\"I thought it was okay\")\n",
+ "# More likely to return a balanced mix of positive, neutral, and negative\n",
+ "print_sentiment(\"I loved it!\")\n",
+ "# More likely to return 100% positive\n",
+ "print_sentiment(\"Terrible service 0/10\")\n",
+ "# More likely to return 100% negative"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Role Prompting\n",
+ "\n",
+ "Llama will often give more consistent responses when given a role ([Kong et al. (2023)](https://browse.arxiv.org/pdf/2308.07702.pdf)). Roles give context to the LLM on what type of answers are desired.\n",
+ "\n",
+ "Let's use Llama 3 to create a more focused, technical response for a question around the pros and cons of using PyTorch."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "complete_and_print(\"Explain the pros and cons of using PyTorch.\")\n",
+ "# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curve\n",
+ "\n",
+ "complete_and_print(\"Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.\")\n",
+ "# Often results in more technical benefits and drawbacks that provide more technical details on how model layers"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Chain-of-Thought\n",
+ "\n",
+ "Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting.\n",
+ "\n",
+ "Llama 3.1 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt = \"Who lived longer, Mozart or Elvis?\"\n",
+ "\n",
+ "complete_and_print(prompt)\n",
+ "# Llama 2 would often give the incorrect answer of \"Mozart\"\n",
+ "\n",
+ "complete_and_print(f\"{prompt} Let's think through this carefully, step by step.\")\n",
+ "# Gives the correct answer \"Elvis\""
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Self-Consistency\n",
+ "\n",
+ "LLMs are probablistic, so even with Chain-of-Thought, a single generation might produce incorrect results. Self-Consistency ([Wang et al. (2022)](https://arxiv.org/abs/2203.11171)) introduces enhanced accuracy by selecting the most frequent answer from multiple generations (at the cost of higher compute):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import re\n",
+ "from statistics import mode\n",
+ "\n",
+ "def gen_answer():\n",
+ " response = completion(\n",
+ " \"John found that the average of 15 numbers is 40.\"\n",
+ " \"If 10 is added to each number then the mean of the numbers is?\"\n",
+ " \"Report the answer surrounded by backticks (example: `123`)\",\n",
+ " )\n",
+ " match = re.search(r'`(\\d+)`', response)\n",
+ " if match is None:\n",
+ " return None\n",
+ " return match.group(1)\n",
+ "\n",
+ "answers = [gen_answer() for i in range(5)]\n",
+ "\n",
+ "print(\n",
+ " f\"Answers: {answers}\\n\",\n",
+ " f\"Final answer: {mode(answers)}\",\n",
+ " )\n",
+ "\n",
+ "# Sample runs of Llama-3-70B (all correct):\n",
+ "# ['60', '50', '50', '50', '50'] -> 50\n",
+ "# ['50', '50', '50', '60', '50'] -> 50\n",
+ "# ['50', '50', '60', '50', '50'] -> 50"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Author & Contact\n",
+ "\n",
+ "Edited by [Dalton Flanagan](https://www.linkedin.com/in/daltonflanagan/) (dalton@meta.com) with contributions from Mohsen Agsen, Bryce Bortree, Ricardo Juan Palma Duran, Kaolin Fire, Thomas Scialom."
+ ]
+ }
+ ],
+ "metadata": {
+ "captumWidgetMessage": [],
+ "dataExplorerConfig": [],
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.14"
+ },
+ "last_base_url": "https://bento.edge.x2p.facebook.net/",
+ "last_kernel_id": "161e2a7b-2d2b-4995-87f3-d1539860ecac",
+ "last_msg_id": "4eab1242-d815b886ebe4f5b1966da982_543",
+ "last_server_session_id": "4a7b41c5-ed66-4dcb-a376-22673aebb469",
+ "operator_data": [],
+ "outputWidgetContext": []
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/zero_to_hero_guide/quickstart.md b/docs/zero_to_hero_guide/quickstart.md
index 3bc11285e..4b53812f9 100644
--- a/docs/zero_to_hero_guide/quickstart.md
+++ b/docs/zero_to_hero_guide/quickstart.md
@@ -2,6 +2,8 @@
This guide will walk you through setting up an end-to-end workflow with Llama Stack, enabling you to perform text generation using the `Llama3.2-11B-Vision-Instruct` model. Follow these steps to get started quickly.
+If you're looking for more specific topics like tool calling or agent setup, we have a [Zero to Hero Guide](#next-steps) that covers everything from Tool Calling to Agents in detail. Feel free to skip to the end to explore the advanced topics you're interested in.
+
## Table of Contents
1. [Prerequisite](#prerequisite)
2. [Installation](#installation)
@@ -19,7 +21,6 @@ Ensure you have the following installed on your system:
- **Conda**: A package, dependency, and environment management tool.
-
---
## Installation
@@ -52,7 +53,7 @@ llama download --model-id Llama3.2-11B-Vision-Instruct
### 1. Build the Llama Stack Distribution
-We will default into building a `meta-reference-gpu` distribution, however you could read more about the different distriubtion [here](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/index.html).
+We will default into building a `meta-reference-gpu` distribution, however you could read more about the different distriubtion [here](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#decide-your-inference-provider).
```bash
llama stack build --template meta-reference-gpu --image-type conda
@@ -156,9 +157,10 @@ With these steps, you should have a functional Llama Stack setup capable of gene
## Next Steps
-- **Explore Other Guides**: Dive deeper into specific topics by following these guides:
+**Explore Other Guides**: Dive deeper into specific topics by following these guides:
+- [Understanding Distribution](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#decide-your-inference-provider)
- [Inference 101](00_Inference101.ipynb)
-- [Simple switch between local and cloud model](00_Local_Cloud_Inference101.ipynb)
+- [Local and Cloud Model Toggling 101](00_Local_Cloud_Inference101.ipynb)
- [Prompt Engineering](01_Prompt_Engineering101.ipynb)
- [Chat with Image - LlamaStack Vision API](02_Image_Chat101.ipynb)
- [Tool Calling: How to and Details](03_Tool_Calling101.ipynb)
@@ -167,15 +169,15 @@ With these steps, you should have a functional Llama Stack setup capable of gene
- [Agents API: Explain Components](06_Agents101.ipynb)
-- **Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications:
+**Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications:
- [Python SDK](https://github.com/meta-llama/llama-stack-client-python)
- [Node SDK](https://github.com/meta-llama/llama-stack-client-node)
- [Swift SDK](https://github.com/meta-llama/llama-stack-client-swift)
- [Kotlin SDK](https://github.com/meta-llama/llama-stack-client-kotlin)
-- **Advanced Configuration**: Learn how to customize your Llama Stack distribution by referring to the [Building a Llama Stack Distribution](./building_distro.md) guide.
+**Advanced Configuration**: Learn how to customize your Llama Stack distribution by referring to the [Building a Llama Stack Distribution](./building_distro.md) guide.
-- **Explore Example Apps**: Check out [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) for example applications built using Llama Stack.
+**Explore Example Apps**: Check out [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) for example applications built using Llama Stack.
---