Commit graph

15 commits

Author SHA1 Message Date
Reid
3a002f6cf1
chore: update download error message (#1217)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Actually, the incorrect token also will hit `RepositoryNotFoundError`,
e.g.
```
$ llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx  ### xx is incorrect token
----RepositoryNotFoundError--->
usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID]
                            [--hf-token HF_TOKEN] [--meta-url META_URL]
                            [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS]
                            [--manifest-file MANIFEST_FILE]
llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub.

so update to:
 llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx
----RepositoryNotFoundError--->
usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID]
                            [--hf-token HF_TOKEN] [--meta-url META_URL]
                            [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS]
                            [--manifest-file MANIFEST_FILE]
llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub or incorrect Hugging Face token.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-25 21:38:10 -08:00
Reid
d9f5beb15a
style: update download help text (#1135)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Based on the cade:

6b1773d530/llama_stack/cli/download.py (L454)
and the test, it can use comma to specify multiple model ids. So update
the usage.
```
$ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B
Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):

Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.5/2.5 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B

[Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B
Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  6.4/6.4 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B


$ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B
original%2Fparams.json: 100%|██████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 564kB/
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
...
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 9.09M/9.09M [00:00<00:00, 9.18MB/s]
Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B


before:
$ llama model download --help
 --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models

after:
$ llama model download --help
  --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:24:31 -08:00
Reid
3d88b81ccf
fix: remove the empty line (#1097)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Remove the empty line from help
```
before:
$ llama model download --help
  --max-parallel MAX_PARALLEL
                        Maximum number of concurrent downloads
  --ignore-patterns IGNORE_PATTERNS
                      <<<<<<<<<empty line>>>>>>>>>>
                        For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
                        safetensors files to avoid downloading duplicate weights.

after:
$ llama model download --help
  --max-parallel MAX_PARALLEL
                        Maximum number of concurrent downloads
  --ignore-patterns IGNORE_PATTERNS
                        For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
                        safetensors files to avoid downloading duplicate weights.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-14 09:33:20 -08:00
Ashwin Bharambe
314ee09ae3
chore: move all Llama Stack types from llama-models to llama-stack (#1098)
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.

This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279

## Test Plan

Ensure all `llama` CLI `model` sub-commands work:

```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```

Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```

Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs

Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.

```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
2025-02-14 09:10:59 -08:00
Sébastien Han
e4a1579e63
build: format codebase imports using ruff linter (#1028)
# What does this PR do?

- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 10:06:21 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
varunfb
08be023290
Added optional md5 validate command once download is completed (#486)
# What does this PR do?

Adds description at the end of successful download the optionally run
the verify md5 checksums command.

## Test Plan
<img width="2004" alt="Screenshot 2024-11-19 at 12 11 37 PM"
src="https://github.com/user-attachments/assets/8d617aef-99f5-4c3b-b93c-eff3e68289ea">

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

---------

Co-authored-by: varunfb <vontimitta@devgpu004.eag5.facebook.com>
2024-11-19 17:42:43 -08:00
Matthew Farrellee
fcc2132e6f
remove pydantic namespace warnings using model_config (#470)
# What does this PR do?

remove another model_ pydantic namespace warning and convert old-style
'class Config' to new-style 'model_config' workaround.

also a whitespace change to get past -


flake8...................................................................Failed
llama_stack/cli/download.py:296:85: E226 missing whitespace around
arithmetic operator
llama_stack/cli/download.py:297:54: E226 missing whitespace around
arithmetic operator


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
2024-11-18 19:24:14 -08:00
Ashwin Bharambe
0713607b68
Support parallel downloads for llama model download (#448)
# What does this PR do?

Enables parallel downloads for `llama model download` CLI command. It is
rather necessary for folks having high bandwidth connections to the
Internet in order to download checkpoints quickly.

## Test Plan


![image](https://github.com/user-attachments/assets/f5df69e2-ec4f-4360-bf84-91273d8cee22)
2024-11-14 09:56:22 -08:00
Tam
a07dfffbbf
initial changes (#261)
Update the parsing logic for comma-separated list and download function
2024-10-16 23:15:59 -07:00
Russell Bryant
a4e775c465
download: improve help text (#204) 2024-10-07 08:40:04 -07:00
AshleyT3
734f59d3b8
Check that the model is found before use. (#182) 2024-10-03 23:24:47 -07:00
Ashwin Bharambe
cc5029a716 Add special case for prompt guard 2024-10-02 08:43:12 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Renamed from llama_toolchain/cli/download.py (Browse further)