Support for Llama3.2 models and Swift SDK (#98)

2024-09-25 10:29:58 -07:00 · 2024-09-25 10:29:58 -07:00 · 56aed59eb4
commit 56aed59eb4
parent 95abbf576b
56 changed files with 3745 additions and 630 deletions
--- a/docs/cli_reference.md
+++ b/docs/cli_reference.md
@ -37,50 +37,74 @@ llama model list
 You should see a table like this:

 <pre style="font-family: monospace;">
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Model Descriptor                      | HuggingFace Repo                            | Context Length | Hardware Requirements      |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-8B                      | meta-llama/Meta-Llama-3.1-8B                | 128K           | 1 GPU, each >= 20GB VRAM   |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-70B                     | meta-llama/Meta-Llama-3.1-70B               | 128K           | 8 GPUs, each >= 20GB VRAM  |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-405B:bf16-mp8           |                                             | 128K           | 8 GPUs, each >= 120GB VRAM |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-405B                    | meta-llama/Meta-Llama-3.1-405B-FP8          | 128K           | 8 GPUs, each >= 70GB VRAM  |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-405B:bf16-mp16          | meta-llama/Meta-Llama-3.1-405B              | 128K           | 16 GPUs, each >= 70GB VRAM |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-8B-Instruct             | meta-llama/Meta-Llama-3.1-8B-Instruct       | 128K           | 1 GPU, each >= 20GB VRAM   |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-70B-Instruct            | meta-llama/Meta-Llama-3.1-70B-Instruct      | 128K           | 8 GPUs, each >= 20GB VRAM  |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-405B-Instruct:bf16-mp8  |                                             | 128K           | 8 GPUs, each >= 120GB VRAM |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-405B-Instruct           | meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 | 128K           | 8 GPUs, each >= 70GB VRAM  |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Meta-Llama3.1-405B-Instruct:bf16-mp16 | meta-llama/Meta-Llama-3.1-405B-Instruct     | 128K           | 16 GPUs, each >= 70GB VRAM |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Llama-Guard-3-8B                      | meta-llama/Llama-Guard-3-8B                 | 128K           | 1 GPU, each >= 20GB VRAM   |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Llama-Guard-3-8B:int8-mp1             | meta-llama/Llama-Guard-3-8B-INT8            | 128K           | 1 GPU, each >= 10GB VRAM   |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
-| Prompt-Guard-86M                      | meta-llama/Prompt-Guard-86M                 | 128K           | 1 GPU, each >= 1GB VRAM    |
-+---------------------------------------+---------------------------------------------+----------------+----------------------------+
+----------------------------------+------------------------------------------+----------------+
+| Model Descriptor                 | HuggingFace Repo                         | Context Length |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-8B                      | meta-llama/Llama-3.1-8B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-70B                     | meta-llama/Llama-3.1-70B                 | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B:bf16-mp8           | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B                    | meta-llama/Llama-3.1-405B-FP8            | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B:bf16-mp16          | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-8B-Instruct             | meta-llama/Llama-3.1-8B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-70B-Instruct            | meta-llama/Llama-3.1-70B-Instruct        | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct:bf16-mp8  | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct           | meta-llama/Llama-3.1-405B-Instruct-FP8   | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct:bf16-mp16 | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-1B                      | meta-llama/Llama-3.2-1B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-3B                      | meta-llama/Llama-3.2-3B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-11B-Vision              | meta-llama/Llama-3.2-11B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-90B-Vision              | meta-llama/Llama-3.2-90B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-1B-Instruct             | meta-llama/Llama-3.2-1B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-3B-Instruct             | meta-llama/Llama-3.2-3B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-11B-Vision-Instruct     | meta-llama/Llama-3.2-11B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-90B-Vision-Instruct     | meta-llama/Llama-3.2-90B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-11B-Vision         | meta-llama/Llama-Guard-3-11B-Vision      | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-1B:int4-mp1        | meta-llama/Llama-Guard-3-1B-INT4         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-1B                 | meta-llama/Llama-Guard-3-1B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-8B                 | meta-llama/Llama-Guard-3-8B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-8B:int8-mp1        | meta-llama/Llama-Guard-3-8B-INT8         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Prompt-Guard-86M                 | meta-llama/Prompt-Guard-86M              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-2-8B                 | meta-llama/Llama-Guard-2-8B              | 4K             |
+----------------------------------+------------------------------------------+----------------+
 </pre>

 To download models, you can use the llama download command.

 #### Downloading from [Meta](https://llama.meta.com/llama-downloads/)

-Here is an example download command to get the 8B/70B Instruct model. You will need META_URL which can be obtained from [here](https://llama.meta.com/docs/getting_the_models/meta/)
+Here is an example download command to get the 3B-Instruct/11B-Vision-Instruct model. You will need META_URL which can be obtained from [here](https://llama.meta.com/docs/getting_the_models/meta/)

 Download the required checkpoints using the following commands:
 ```bash
 # download the 8B model, this can be run on a single GPU
-llama download --source meta --model-id Meta-Llama3.1-8B-Instruct --meta-url META_URL
+llama download --source meta --model-id Llama3.2-3B-Instruct --meta-url META_URL

 # you can also get the 70B model, this will require 8 GPUs however
-llama download --source meta --model-id Meta-Llama3.1-70B-Instruct --meta-url META_URL
+llama download --source meta --model-id Llama3.2-11B-Vision-Instruct --meta-url META_URL

 # llama-agents have safety enabled by default. For this, you will need
 # safety models -- Llama-Guard and Prompt-Guard
@ -124,7 +148,7 @@ The `llama model` command helps you explore the model’s interface.
 ### 2.1 Subcommands
 1. `download`: Download the model from different sources. (meta, huggingface)
 2. `list`: Lists all the models available for download with hardware requirements to deploy the models.
-3. `template`: <TODO: What is a template?>
+3. `prompt-format`: Show llama model message formats.
 4. `describe`: Describes all the properties of the model.

 ### 2.2 Sample Usage
@ -135,7 +159,7 @@ The `llama model` command helps you explore the model’s interface.
 llama model --help
 ```
 <pre style="font-family: monospace;">
-usage: llama model [-h] {download,list,template,describe} ...
+usage: llama model [-h] {download,list,prompt-format,describe} ...

 Work with llama models

@ -143,124 +167,67 @@ options:
  -h, --help            show this help message and exit

 model_subcommands:
-  {download,list,template,describe}
+  {download,list,prompt-format,describe}
 </pre>

 You can use the describe command to know more about a model:
 ```
-llama model describe -m Meta-Llama3.1-8B-Instruct
+llama model describe -m Llama3.2-3B-Instruct
 ```
 ### 2.3 Describe

 <pre style="font-family: monospace;">
-+-----------------------------+---------------------------------------+
-| Model                       | Meta-                                 |
-|                             | Llama3.1-8B-Instruct                  |
-+-----------------------------+---------------------------------------+
-| HuggingFace ID              | meta-llama/Meta-Llama-3.1-8B-Instruct |
-+-----------------------------+---------------------------------------+
-| Description                 | Llama 3.1 8b instruct model           |
-+-----------------------------+---------------------------------------+
-| Context Length              | 128K tokens                           |
-+-----------------------------+---------------------------------------+
-| Weights format              | bf16                                  |
-+-----------------------------+---------------------------------------+
-| Model params.json           | {                                     |
-|                             |     "dim": 4096,                      |
-|                             |     "n_layers": 32,                   |
-|                             |     "n_heads": 32,                    |
-|                             |     "n_kv_heads": 8,                  |
-|                             |     "vocab_size": 128256,             |
-|                             |     "ffn_dim_multiplier": 1.3,        |
-|                             |     "multiple_of": 1024,              |
-|                             |     "norm_eps": 1e-05,                |
-|                             |     "rope_theta": 500000.0,           |
-|                             |     "use_scaled_rope": true           |
-|                             | }                                     |
-+-----------------------------+---------------------------------------+
-| Recommended sampling params | {                                     |
-|                             |     "strategy": "top_p",              |
-|                             |     "temperature": 1.0,               |
-|                             |     "top_p": 0.9,                     |
-|                             |     "top_k": 0                        |
-|                             | }                                     |
-+-----------------------------+---------------------------------------+
+-----------------------------+----------------------------------+
+| Model                       | Llama3.2-3B-Instruct             |
+-----------------------------+----------------------------------+
+| HuggingFace ID              | meta-llama/Llama-3.2-3B-Instruct |
+-----------------------------+----------------------------------+
+| Description                 | Llama 3.2 3b instruct model      |
+-----------------------------+----------------------------------+
+| Context Length              | 128K tokens                      |
+-----------------------------+----------------------------------+
+| Weights format              | bf16                             |
+-----------------------------+----------------------------------+
+| Model params.json           | {                                |
+|                             |     "dim": 3072,                 |
+|                             |     "n_layers": 28,              |
+|                             |     "n_heads": 24,               |
+|                             |     "n_kv_heads": 8,             |
+|                             |     "vocab_size": 128256,        |
+|                             |     "ffn_dim_multiplier": 1.0,   |
+|                             |     "multiple_of": 256,          |
+|                             |     "norm_eps": 1e-05,           |
+|                             |     "rope_theta": 500000.0,      |
+|                             |     "use_scaled_rope": true      |
+|                             | }                                |
+-----------------------------+----------------------------------+
+| Recommended sampling params | {                                |
+|                             |     "strategy": "top_p",         |
+|                             |     "temperature": 1.0,          |
+|                             |     "top_p": 0.9,                |
+|                             |     "top_k": 0                   |
+|                             | }                                |
+-----------------------------+----------------------------------+
 </pre>
-### 2.4 Template
-You can even run `llama model template` see all of the templates and their tokens:
+### 2.4 Prompt Format
+You can even run `llama model prompt-format` see all of the templates and their tokens:

 ```
-llama model template
+llama model prompt-format -m Llama3.2-3B-Instruct
 ```
+<p align="center">
+<img width="719" alt="image" src="https://github.com/user-attachments/assets/c5332026-8c0b-4edc-b438-ec60cd7ca554">
+</p>

-<pre style="font-family: monospace;">
-+-----------+---------------------------------+
-| Role      | Template Name                   |
-+-----------+---------------------------------+
-| user      | user-default                    |
-| assistant | assistant-builtin-tool-call     |
-| assistant | assistant-custom-tool-call      |
-| assistant | assistant-default               |
-| system    | system-builtin-and-custom-tools |
-| system    | system-builtin-tools-only       |
-| system    | system-custom-tools-only        |
-| system    | system-default                  |
-| tool      | tool-success                    |
-| tool      | tool-failure                    |
-+-----------+---------------------------------+
-</pre>

-And fetch an example by passing it to `--name`:
-```
-llama model template --name tool-success
-```
-
-<pre style="font-family: monospace;">
-+----------+----------------------------------------------------------------+
-| Name     | tool-success                                                   |
-+----------+----------------------------------------------------------------+
-| Template | <|start_header_id|>ipython<|end_header_id|>                    |
-|          |                                                                |
-|          | completed                                                      |
-|          | [stdout]{"results":["something                                 |
-|          | something"]}[/stdout]<|eot_id|>                                |
-|          |                                                                |
-+----------+----------------------------------------------------------------+
-| Notes    | Note ipython header and [stdout]                               |
-+----------+----------------------------------------------------------------+
-</pre>
-
-Or:
-```
-llama model template --name system-builtin-tools-only
-```
-
-<pre style="font-family: monospace;">
-+----------+--------------------------------------------+
-| Name     | system-builtin-tools-only                  |
-+----------+--------------------------------------------+
-| Template | <|start_header_id|>system<|end_header_id|> |
-|          |                                            |
-|          | Environment: ipython                       |
-|          | Tools: brave_search, wolfram_alpha         |
-|          |                                            |
-|          | Cutting Knowledge Date: December 2023      |
-|          | Today Date: 21 August 2024                 |
-|          | <|eot_id|>                                 |
-|          |                                            |
-+----------+--------------------------------------------+
-| Notes    |                                            |
-+----------+--------------------------------------------+
-</pre>
-
-These commands can help understand the model interface and how prompts / messages are formatted for various scenarios.
+You will be shown a Markdown formatted description of the model interface and how prompts / messages are formatted for various scenarios.

 **NOTE**: Outputs in terminal are color printed to show special tokens.


 ## Step 3: Building, and Configuring Llama Stack Distributions

- Please see our [Getting Started](getting_started.md) guide for details.
+- Please see our [Getting Started](getting_started.md) guide for more details on how to build and start a Llama Stack distribution.

 ### Step 3.1 Build
 In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify: