llama-stack-mirror/docs/source/cli_reference/download_models.md
Xi Yan c810a4184d
[docs] update documentations (#356)
* move docs -> source

* Add files via upload

* mv image

* Add files via upload

* colocate iOS setup doc

* delete image

* Add files via upload

* fix

* delete image

* Add files via upload

* Update developer_cookbook.md

* toctree

* wip subfolder

* docs update

* subfolder

* updates

* name

* updates

* index

* updates

* refactor structure

* depth

* docs

* content

* docs

* getting started

* distributions

* fireworks

* fireworks

* update

* theme

* theme

* theme

* pdj theme

* pytorch theme

* css

* theme

* agents example

* format

* index

* headers

* copy button

* test tabs

* test tabs

* fix

* tabs

* tab

* tabs

* sphinx_design

* quick start commands

* size

* width

* css

* css

* download models

* asthetic fix

* tab format

* update

* css

* width

* css

* docs

* tab based

* tab

* tabs

* docs

* style

* image

* css

* color

* typo

* update docs

* missing links

* list templates

* links

* links update

* troubleshooting

* fix

* distributions

* docs

* fix table

* kill llamastack-local-gpu/cpu

* Update index.md

* Update index.md

* mv ios_setup.md

* Update ios_setup.md

* Add remote_or_local.gif

* Update ios_setup.md

* release notes

* typos

* Add ios_setup to index

* nav bar

* hide torctree

* ios image

* links update

* rename

* rename

* docs

* rename

* links

* distributions

* distributions

* distributions

* distributions

* remove release

* remote

---------

Co-authored-by: dltn <6599399+dltn@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-11-04 16:52:38 -08:00

8 KiB

Downloading Models

The llama CLI tool helps you setup and use the Llama Stack. It should be available on your path after installing the llama-stack package.

Installation

You have two ways to install Llama Stack:

  1. Install as a package: You can install the repository directly from PyPI by running the following command:

    pip install llama-stack
    
  2. Install from source: If you prefer to install from the source code, follow these steps:

     mkdir -p ~/local
     cd ~/local
     git clone git@github.com:meta-llama/llama-stack.git
    
     conda create -n myenv python=3.10
     conda activate myenv
    
     cd llama-stack
     $CONDA_PREFIX/bin/pip install -e .
    
    

Downloading models via CLI

You first need to have models downloaded locally.

To download any model you need the Model Descriptor. This can be obtained by running the command

llama model list

You should see a table like this:

+----------------------------------+------------------------------------------+----------------+
| Model Descriptor                 | Hugging Face Repo                        | Context Length |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-8B                      | meta-llama/Llama-3.1-8B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-70B                     | meta-llama/Llama-3.1-70B                 | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-405B:bf16-mp8           | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-405B                    | meta-llama/Llama-3.1-405B-FP8            | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-405B:bf16-mp16          | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-8B-Instruct             | meta-llama/Llama-3.1-8B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-70B-Instruct            | meta-llama/Llama-3.1-70B-Instruct        | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-405B-Instruct:bf16-mp8  | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-405B-Instruct           | meta-llama/Llama-3.1-405B-Instruct-FP8   | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.1-405B-Instruct:bf16-mp16 | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-1B                      | meta-llama/Llama-3.2-1B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-3B                      | meta-llama/Llama-3.2-3B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-11B-Vision              | meta-llama/Llama-3.2-11B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-90B-Vision              | meta-llama/Llama-3.2-90B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-1B-Instruct             | meta-llama/Llama-3.2-1B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-3B-Instruct             | meta-llama/Llama-3.2-3B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-11B-Vision-Instruct     | meta-llama/Llama-3.2-11B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama3.2-90B-Vision-Instruct     | meta-llama/Llama-3.2-90B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama-Guard-3-11B-Vision         | meta-llama/Llama-Guard-3-11B-Vision      | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama-Guard-3-1B:int4-mp1        | meta-llama/Llama-Guard-3-1B-INT4         | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama-Guard-3-1B                 | meta-llama/Llama-Guard-3-1B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama-Guard-3-8B                 | meta-llama/Llama-Guard-3-8B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama-Guard-3-8B:int8-mp1        | meta-llama/Llama-Guard-3-8B-INT8         | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Prompt-Guard-86M                 | meta-llama/Prompt-Guard-86M              | 128K           |
+----------------------------------+------------------------------------------+----------------+
| Llama-Guard-2-8B                 | meta-llama/Llama-Guard-2-8B              | 4K             |
+----------------------------------+------------------------------------------+----------------+

To download models, you can use the llama download command.

Downloading from Meta

Here is an example download command to get the 3B-Instruct/11B-Vision-Instruct model. You will need META_URL which can be obtained from here

Download the required checkpoints using the following commands:

# download the 8B model, this can be run on a single GPU
llama download --source meta --model-id Llama3.2-3B-Instruct --meta-url META_URL

# you can also get the 70B model, this will require 8 GPUs however
llama download --source meta --model-id Llama3.2-11B-Vision-Instruct --meta-url META_URL

# llama-agents have safety enabled by default. For this, you will need
# safety models -- Llama-Guard and Prompt-Guard
llama download --source meta --model-id Prompt-Guard-86M --meta-url META_URL
llama download --source meta --model-id Llama-Guard-3-1B --meta-url META_URL

Downloading from Hugging Face

Essentially, the same commands above work, just replace --source meta with --source huggingface.

llama download --source huggingface --model-id  Llama3.1-8B-Instruct --hf-token <HF_TOKEN>

llama download --source huggingface --model-id Llama3.1-70B-Instruct --hf-token <HF_TOKEN>

llama download --source huggingface --model-id Llama-Guard-3-1B --ignore-patterns *original*
llama download --source huggingface --model-id Prompt-Guard-86M --ignore-patterns *original*

Important: Set your environment variable HF_TOKEN or pass in --hf-token to the command to validate your access. You can find your token at https://huggingface.co/settings/tokens.

Tip: Default for llama download is to run with --ignore-patterns *.safetensors since we use the .pth files in the original folder. For Llama Guard and Prompt Guard, however, we need safetensors. Hence, please run with --ignore-patterns original so that safetensors are downloaded and .pth files are ignored.