fix: update dangling references to llama download command (#3763)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 2m14s

## Summary
After removing model management CLI in #3700, this PR updates remaining
references to the old `llama download` command to use `huggingface-cli
download` instead.

## Changes
- Updated error messages in `meta_reference/common.py` to recommend
`huggingface-cli download`
- Updated error messages in
`torchtune/recipes/lora_finetuning_single_device.py` to use
`huggingface-cli download`
- Updated post-training notebook to use `huggingface-cli download`
instead of `llama download`
- Fixed typo: "you model" -> "your model"

## Test Plan
- Verified error messages provide correct guidance for users
- Checked that notebook instructions are up-to-date with current tooling
This commit is contained in:
Ashwin Bharambe 2025-10-09 18:35:02 -07:00 committed by GitHub
parent 8fe4a216b5
commit ebae0385bb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 6369 additions and 6410 deletions

View file

@ -4236,24 +4236,7 @@
"metadata": {
"id": "RWa220T5sjbR"
},
"source": [
"# 2. Start Post Training\n",
"Currenty, Llama stack post training APIs support [Supervised Fine-tune](https://cameronrwolfe.substack.com/p/understanding-and-using-supervised) which is a straightfoard and effective way to boost model performance on specific tasks.\n",
"\n",
"We start from [LoRA finetune algorithm](https://pytorch.org/torchtune/main/tutorials/lora_finetune.html#what-is-lora) that can significantly reduce finetune GPU memory usage as well as needs less data\n",
"\n",
"\n",
"#### 2.0. Download the base model\n",
"Download the Llama model that will be used with [the downloading model CLI](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html).\n",
"\n",
"Since ollama takes huggingface safetensor format checkpoint, we need to output the finetuned checkpoint in hugging face format. We download the model checkpoint from huggingface source.\n",
"\n",
"> You need to get a huggingface token from [here](https://huggingface.co/) and replace the \"HF_TOKEN\"\n",
"\n",
"\n",
"\n",
"\n"
]
"source": "# 2. Start Post Training\nCurrently, Llama stack post training APIs support [Supervised Fine-tune](https://cameronrwolfe.substack.com/p/understanding-and-using-supervised) which is a straightforward and effective way to boost model performance on specific tasks.\n\nWe start from [LoRA finetune algorithm](https://pytorch.org/torchtune/main/tutorials/lora_finetune.html#what-is-lora) that can significantly reduce finetune GPU memory usage as well as needs less data\n\n\n#### 2.0. Download the base model\nDownload the Llama model using the [Hugging Face CLI](https://huggingface.co/docs/huggingface_hub/guides/cli).\n\nSince ollama takes huggingface safetensor format checkpoint, we need to output the finetuned checkpoint in hugging face format. We download the model checkpoint from huggingface source.\n\n> You need to authenticate with Hugging Face by getting your token from [here](https://huggingface.co/settings/tokens) and running `huggingface-cli login`"
},
{
"cell_type": "code",
@ -4266,33 +4249,8 @@
"id": "yF50MtwcsogU",
"outputId": "92ba3b3a-63a0-4ab8-c8cd-5437365128fc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
".gitattributes: 100% 1.52k/1.52k [00:00<00:00, 12.1MB/s]\n",
"LICENSE.txt: 100% 7.71k/7.71k [00:00<00:00, 33.3MB/s]\n",
"README.md: 100% 41.7k/41.7k [00:00<00:00, 56.9MB/s]\n",
"USE_POLICY.md: 100% 6.02k/6.02k [00:00<00:00, 32.4MB/s]\n",
"config.json: 100% 878/878 [00:00<00:00, 6.94MB/s]\n",
"generation_config.json: 100% 189/189 [00:00<00:00, 1.71MB/s]\n",
"model.safetensors.index.json: 100% 20.9k/20.9k [00:00<00:00, 87.0MB/s]\n",
"consolidated.00.pth: 100% 6.43G/6.43G [00:18<00:00, 353MB/s]\n",
"original%2Forig_params.json: 100% 220/220 [00:00<00:00, 1.69MB/s]\n",
"original%2Fparams.json: 100% 220/220 [00:00<00:00, 1.64MB/s]\n",
"tokenizer.model: 100% 2.18M/2.18M [00:00<00:00, 44.8MB/s]\n",
"special_tokens_map.json: 100% 296/296 [00:00<00:00, 2.69MB/s]\n",
"tokenizer.json: 100% 9.09M/9.09M [00:01<00:00, 8.57MB/s]\n",
"tokenizer_config.json: 100% 54.5k/54.5k [00:00<00:00, 172MB/s]\n",
"\n",
"Successfully downloaded model to /root/.llama/checkpoints/Llama3.2-3B-Instruct\n"
]
}
],
"source": [
"!llama download --source huggingface --model-id Llama3.2-3B-Instruct --hf-token \"HF_TOKEN\""
]
"outputs": [],
"source": "!huggingface-cli download meta-llama/Llama-3.2-3B-Instruct --local-dir ~/.llama/Llama-3.2-3B-Instruct"
},
{
"cell_type": "markdown",

View file

@ -18,7 +18,7 @@ def model_checkpoint_dir(model_id) -> str:
assert checkpoint_dir.exists(), (
f"Could not find checkpoints in: {model_local_dir(model_id)}. "
f"If you try to use the native llama model, Please download model using `llama download --model-id {model_id}`"
f"Otherwise, please save you model checkpoint under {model_local_dir(model_id)}"
f"If you try to use the native llama model, please download the model using `llama-model download --source meta --model-id {model_id}` (see https://github.com/meta-llama/llama-models). "
f"Otherwise, please save your model checkpoint under {model_local_dir(model_id)}"
)
return str(checkpoint_dir)

View file

@ -104,9 +104,10 @@ class LoraFinetuningSingleDevice:
if not any(p.exists() for p in paths):
checkpoint_dir = checkpoint_dir / "original"
hf_repo = model.huggingface_repo or f"meta-llama/{model.descriptor()}"
assert checkpoint_dir.exists(), (
f"Could not find checkpoints in: {model_local_dir(model.descriptor())}. "
f"Please download model using `llama download --model-id {model.descriptor()}`"
f"Please download the model using `huggingface-cli download {hf_repo} --local-dir ~/.llama/{model.descriptor()}`"
)
return str(checkpoint_dir)