diff --git a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md index b183757db..b8d1b1714 100644 --- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md +++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md @@ -41,12 +41,31 @@ The following environment variables can be configured: ## Prerequisite: Downloading Models -Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. +Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. ``` -$ ls ~/.llama/checkpoints -Llama3.1-8B Llama3.2-11B-Vision-Instruct Llama3.2-1B-Instruct Llama3.2-90B-Vision-Instruct Llama-Guard-3-8B -Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M +$ llama model list --downloaded +┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ +┃ Model ┃ Size ┃ Modified Time ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ +│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │ +└─────────────────────────────────────────┴──────────┴─────────────────────┘ ``` ## Running the Distribution diff --git a/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md b/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md index 9aeb7a88b..a49175e22 100644 --- a/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md +++ b/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md @@ -41,12 +41,31 @@ The following environment variables can be configured: ## Prerequisite: Downloading Models -Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. +Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. ``` -$ ls ~/.llama/checkpoints -Llama3.1-8B Llama3.2-11B-Vision-Instruct Llama3.2-1B-Instruct Llama3.2-90B-Vision-Instruct Llama-Guard-3-8B -Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M +$ llama model list --downloaded +┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ +┃ Model ┃ Size ┃ Modified Time ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ +│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │ +└─────────────────────────────────────────┴──────────┴─────────────────────┘ ``` ## Running the Distribution diff --git a/docs/source/references/llama_cli_reference/download_models.md b/docs/source/references/llama_cli_reference/download_models.md index 6c791bcb7..ca470f8c2 100644 --- a/docs/source/references/llama_cli_reference/download_models.md +++ b/docs/source/references/llama_cli_reference/download_models.md @@ -129,3 +129,35 @@ llama download --source huggingface --model-id Prompt-Guard-86M --ignore-pattern **Important:** Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command to validate your access. You can find your token at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). > **Tip:** Default for `llama download` is to run with `--ignore-patterns *.safetensors` since we use the `.pth` files in the `original` folder. For Llama Guard and Prompt Guard, however, we need safetensors. Hence, please run with `--ignore-patterns original` so that safetensors are downloaded and `.pth` files are ignored. + +## List the downloaded models + +To list the downloaded models with the following command: +``` +llama model list --downloaded +``` + +You should see a table like this: +``` +┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ +┃ Model ┃ Size ┃ Modified Time ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ +│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │ +└─────────────────────────────────────────┴──────────┴─────────────────────┘ +``` diff --git a/docs/source/references/llama_cli_reference/index.md b/docs/source/references/llama_cli_reference/index.md index a43666963..8a38fc3ae 100644 --- a/docs/source/references/llama_cli_reference/index.md +++ b/docs/source/references/llama_cli_reference/index.md @@ -154,6 +154,38 @@ llama download --source huggingface --model-id Prompt-Guard-86M --ignore-pattern > **Tip:** Default for `llama download` is to run with `--ignore-patterns *.safetensors` since we use the `.pth` files in the `original` folder. For Llama Guard and Prompt Guard, however, we need safetensors. Hence, please run with `--ignore-patterns original` so that safetensors are downloaded and `.pth` files are ignored. +## List the downloaded models + +To list the downloaded models with the following command: +``` +llama model list --downloaded +``` + +You should see a table like this: +``` +┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ +┃ Model ┃ Size ┃ Modified Time ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ +│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │ +└─────────────────────────────────────────┴──────────┴─────────────────────┘ +``` + ## Understand the models The `llama model` command helps you explore the model’s interface. diff --git a/llama_stack/templates/meta-reference-gpu/doc_template.md b/llama_stack/templates/meta-reference-gpu/doc_template.md index 60556a6f3..87438fb6d 100644 --- a/llama_stack/templates/meta-reference-gpu/doc_template.md +++ b/llama_stack/templates/meta-reference-gpu/doc_template.md @@ -29,12 +29,31 @@ The following environment variables can be configured: ## Prerequisite: Downloading Models -Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. +Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. ``` -$ ls ~/.llama/checkpoints -Llama3.1-8B Llama3.2-11B-Vision-Instruct Llama3.2-1B-Instruct Llama3.2-90B-Vision-Instruct Llama-Guard-3-8B -Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M +$ llama model list --downloaded +┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ +┃ Model ┃ Size ┃ Modified Time ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ +│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │ +└─────────────────────────────────────────┴──────────┴─────────────────────┘ ``` ## Running the Distribution diff --git a/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md b/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md index 2b117120c..e8dfaaf3c 100644 --- a/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md +++ b/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md @@ -31,12 +31,31 @@ The following environment variables can be configured: ## Prerequisite: Downloading Models -Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. +Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. ``` -$ ls ~/.llama/checkpoints -Llama3.1-8B Llama3.2-11B-Vision-Instruct Llama3.2-1B-Instruct Llama3.2-90B-Vision-Instruct Llama-Guard-3-8B -Llama3.1-8B-Instruct Llama3.2-1B Llama3.2-3B-Instruct Llama-Guard-3-1B Prompt-Guard-86M +$ llama model list --downloaded +┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ +┃ Model ┃ Size ┃ Modified Time ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ +│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │ +├─────────────────────────────────────────┼──────────┼─────────────────────┤ +│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │ +└─────────────────────────────────────────┴──────────┴─────────────────────┘ ``` ## Running the Distribution