llama-stack-mirror/docs/docs/references/llama_cli_reference/download_models.md
Sébastien Han a99e4db0dc
chore!: remove model mgmt from CLI for Hugging Face CLI
This change removes the `llama model` and `llama download` subcommands
from the CLI, replacing them with recommendations to use the Hugging
Face CLI instead.

Rationale for this change:
- The model management functionality was largely duplicating what
  Hugging Face CLI already provides, leading to unnecessary maintenance
  overhead (except the download source from Meta?)
- Maintaining our own implementation required fixing bugs and keeping
  up with changes in model repositories and download mechanisms
- The Hugging Face CLI is more mature, widely adopted, and better
  maintained
- This allows us to focus on the core Llama Stack functionality rather
  than reimplementing model management tools

Changes made:
- Removed all model-related CLI commands and their implementations
- Updated documentation to recommend using `huggingface-cli` for model
  downloads
- Removed Meta-specific download logic and statements
- Simplified the CLI to focus solely on stack management operations

Users should now use:
- `huggingface-cli download` for downloading models
- `huggingface-cli scan-cache` for listing downloaded models

This is a breaking change as it removes previously available CLI
commands.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-08 15:54:26 +02:00

2.2 KiB

Downloading Models

The llama CLI tool helps you setup and use the Llama Stack. It should be available on your path after installing the llama-stack package.

Installation

You have two ways to install Llama Stack:

  1. Install as a package: You can install the repository directly from PyPI by running the following command:

    pip install llama-stack
    
  2. Install from source: If you prefer to install from the source code, follow these steps:

     mkdir -p ~/local
     cd ~/local
     git clone git@github.com:meta-llama/llama-stack.git
    
     uv venv myenv --python 3.12
     source myenv/bin/activate  # On Windows: myenv\Scripts\activate
    
     cd llama-stack
     pip install -e .
    
    

Downloading models via Hugging Face CLI

You first need to have models downloaded locally. We recommend using the Hugging Face CLI to download models.

Install Hugging Face CLI

First, install the Hugging Face CLI:

pip install huggingface_hub[cli]

Download models from Hugging Face

You can download models using the huggingface-cli download command. Here are some examples:

# Download Llama 3.2 3B Instruct model
huggingface-cli download meta-llama/Llama-3.2-3B-Instruct --local-dir ~/.llama/Llama-3.2-3B-Instruct

# Download Llama 3.2 1B Instruct model
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir ~/.llama/Llama-3.2-1B-Instruct

# Download Llama Guard 3 1B model
huggingface-cli download meta-llama/Llama-Guard-3-1B --local-dir ~/.llama/Llama-Guard-3-1B

# Download Prompt Guard model
huggingface-cli download meta-llama/Prompt-Guard-86M --local-dir ~/.llama/Prompt-Guard-86M

Important: You need to authenticate with Hugging Face to download models. You can do this by:

  1. Getting your token from https://huggingface.co/settings/tokens
  2. Running huggingface-cli login and entering your token

List the downloaded models

To list the downloaded models, you can use the Hugging Face CLI:

# List all downloaded models in your local cache
huggingface-cli scan-cache