# What does this PR do?
The goal of this PR is code base modernization.
Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)
Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.
Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.
## Test Plan
```
LLAMA_MODELS_DEBUG=1 \
with-proxy llama stack run meta-reference-gpu \
--env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
--env INFERENCE_CHECKPOINT_DIR=<DIR> \
--env MODEL_PARALLEL_SIZE=4 \
--env QUANTIZATION_TYPE=fp8_mixed
```
Start a server with and without quantization. Point integration tests to
it using:
```
pytest -s -v tests/integration/inference/test_text_inference.py \
--stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
# What does this PR do?
Updated all instances of datetime.now() to use timezone.utc for
consistency in handling time across different systems. This ensures that
timestamps are always in Coordinated Universal Time (UTC), avoiding
issues with time zone discrepancies and promoting uniformity in
time-related data.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Actually, the incorrect token also will hit `RepositoryNotFoundError`,
e.g.
```
$ llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx ### xx is incorrect token
----RepositoryNotFoundError--->
usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID]
[--hf-token HF_TOKEN] [--meta-url META_URL]
[--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS]
[--manifest-file MANIFEST_FILE]
llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub.
so update to:
llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx
----RepositoryNotFoundError--->
usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID]
[--hf-token HF_TOKEN] [--meta-url META_URL]
[--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS]
[--manifest-file MANIFEST_FILE]
llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub or incorrect Hugging Face token.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Based on the cade:
6b1773d530/llama_stack/cli/download.py (L454)
and the test, it can use comma to specify multiple model ids. So update
the usage.
```
$ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B
Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00
Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00
Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00
Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
[Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B
Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00
Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00
Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00
Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 6.4/6.4 GB - 0:00:00
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B
$ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B
original%2Fparams.json: 100%|██████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 564kB/
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
...
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 9.09M/9.09M [00:00<00:00, 9.18MB/s]
Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B
before:
$ llama model download --help
--model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models
after:
$ llama model download --help
--model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Remove the empty line from help
```
before:
$ llama model download --help
--max-parallel MAX_PARALLEL
Maximum number of concurrent downloads
--ignore-patterns IGNORE_PATTERNS
<<<<<<<<<empty line>>>>>>>>>>
For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
safetensors files to avoid downloading duplicate weights.
after:
$ llama model download --help
--max-parallel MAX_PARALLEL
Maximum number of concurrent downloads
--ignore-patterns IGNORE_PATTERNS
For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
safetensors files to avoid downloading duplicate weights.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.
This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279
## Test Plan
Ensure all `llama` CLI `model` sub-commands work:
```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```
Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```
Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs
Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.
```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
# What does this PR do?
- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Adds description at the end of successful download the optionally run
the verify md5 checksums command.
## Test Plan
<img width="2004" alt="Screenshot 2024-11-19 at 12 11 37 PM"
src="https://github.com/user-attachments/assets/8d617aef-99f5-4c3b-b93c-eff3e68289ea">
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
---------
Co-authored-by: varunfb <vontimitta@devgpu004.eag5.facebook.com>
# What does this PR do?
remove another model_ pydantic namespace warning and convert old-style
'class Config' to new-style 'model_config' workaround.
also a whitespace change to get past -
flake8...................................................................Failed
llama_stack/cli/download.py:296:85: E226 missing whitespace around
arithmetic operator
llama_stack/cli/download.py:297:54: E226 missing whitespace around
arithmetic operator
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
# What does this PR do?
Enables parallel downloads for `llama model download` CLI command. It is
rather necessary for folks having high bandwidth connections to the
Internet in order to download checkpoints quickly.
## Test Plan

* API Keys passed from Client instead of distro configuration
* delete distribution registry
* Rename the "package" word away
* Introduce a "Router" layer for providers
Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:
- The inference API should be a routing layer over inference providers,
routed using the "model" key
- The memory banks API is another instance where various memory bank
types will be provided by independent providers (e.g., a vector store
is served by Chroma while a keyvalue memory can be served by Redis or
PGVector)
This commit introduces a generalized routing layer for this purpose.
* update `apis_to_serve`
* llama_toolchain -> llama_stack
* Codemod from llama_toolchain -> llama_stack
- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent
* Moved some stuff out of common/; re-generated OpenAPI spec
* llama-toolchain -> llama-stack (hyphens)
* add control plane API
* add redis adapter + sqlite provider
* move core -> distribution
* Some more toolchain -> stack changes
* small naming shenanigans
* Removing custom tool and agent utilities and moving them client side
* Move control plane to distribution server for now
* Remove control plane from API list
* no codeshield dependency randomly plzzzzz
* Add "fire" as a dependency
* add back event loggers
* stack configure fixes
* use brave instead of bing in the example client
* add init file so it gets packaged
* add init files so it gets packaged
* Update MANIFEST
* bug fix
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Renamed from llama_toolchain/cli/download.py (Browse further)