This change removes the `llama model` and `llama download` subcommands from the CLI, replacing them with recommendations to use the Hugging Face CLI instead. Rationale for this change: - The model management functionality was largely duplicating what Hugging Face CLI already provides, leading to unnecessary maintenance overhead (except the download source from Meta?) - Maintaining our own implementation required fixing bugs and keeping up with changes in model repositories and download mechanisms - The Hugging Face CLI is more mature, widely adopted, and better maintained - This allows us to focus on the core Llama Stack functionality rather than reimplementing model management tools Changes made: - Removed all model-related CLI commands and their implementations - Updated documentation to recommend using `huggingface-cli` for model downloads - Removed Meta-specific download logic and statements - Simplified the CLI to focus solely on stack management operations Users should now use: - `huggingface-cli download` for downloading models - `huggingface-cli scan-cache` for listing downloaded models This is a breaking change as it removes previously available CLI commands. Signed-off-by: Sébastien Han <seb@redhat.com>
2.2 KiB
Downloading Models
The llama CLI tool helps you setup and use the Llama Stack. It should be available on your path after installing the llama-stack package.
Installation
You have two ways to install Llama Stack:
-
Install as a package: You can install the repository directly from PyPI by running the following command:
pip install llama-stack -
Install from source: If you prefer to install from the source code, follow these steps:
mkdir -p ~/local cd ~/local git clone git@github.com:meta-llama/llama-stack.git uv venv myenv --python 3.12 source myenv/bin/activate # On Windows: myenv\Scripts\activate cd llama-stack pip install -e .
Downloading models via Hugging Face CLI
You first need to have models downloaded locally. We recommend using the Hugging Face CLI to download models.
Install Hugging Face CLI
First, install the Hugging Face CLI:
pip install huggingface_hub[cli]
Download models from Hugging Face
You can download models using the huggingface-cli download command. Here are some examples:
# Download Llama 3.2 3B Instruct model
huggingface-cli download meta-llama/Llama-3.2-3B-Instruct --local-dir ~/.llama/Llama-3.2-3B-Instruct
# Download Llama 3.2 1B Instruct model
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir ~/.llama/Llama-3.2-1B-Instruct
# Download Llama Guard 3 1B model
huggingface-cli download meta-llama/Llama-Guard-3-1B --local-dir ~/.llama/Llama-Guard-3-1B
# Download Prompt Guard model
huggingface-cli download meta-llama/Prompt-Guard-86M --local-dir ~/.llama/Prompt-Guard-86M
Important: You need to authenticate with Hugging Face to download models. You can do this by:
- Getting your token from https://huggingface.co/settings/tokens
- Running
huggingface-cli loginand entering your token
List the downloaded models
To list the downloaded models, you can use the Hugging Face CLI:
# List all downloaded models in your local cache
huggingface-cli scan-cache