# What does this PR do?
Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.
Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.
## Test Plan
```
LLAMA_MODELS_DEBUG=1 \
with-proxy llama stack run meta-reference-gpu \
--env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
--env INFERENCE_CHECKPOINT_DIR=<DIR> \
--env MODEL_PARALLEL_SIZE=4 \
--env QUANTIZATION_TYPE=fp8_mixed
```
Start a server with and without quantization. Point integration tests to
it using:
```
pytest -s -v tests/integration/inference/test_text_inference.py \
--stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
# What does this PR do?
Updates the help text for the `llama model prompt-format` command to
clarify that users should provide a specific model name (e.g.,
Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the
default value and field for `--model-name` to prevent users from
mistakenly thinking a model family name is acceptable. Adds guidance to
run `llama model list` to view valid model names.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Output of `llama model prompt-format -h` Before:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Model Family (llama3_1, llama3_X, etc.)
Example:
llama model prompt-format <options>
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1
usage: llama model prompt-format [-h] [-m MODEL_NAME]
llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct
```
Output of `llama model prompt-format -h` After:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Example: Llama3.1-8B or Llama3.2-11B-Vision, etc
(Run `llama model list` to see a list of valid model names)
Example:
llama model prompt-format <options>
```
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Summary:
+ llama model prompt-format -m Llama3.2-11B-Vision-Instruct
Traceback (most recent call last):
File "/tmp/tmp.gCwyyCcjoA/.venv/bin/llama", line 10, in <module>
sys.exit(main())
File
"/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py",
line 50, in main
parser.run(args)
File
"/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py",
line 44, in run
args.func(args)
File
"/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/model/prompt_format.py",
line 59, in _run_model_template_cmd
if args.list:
AttributeError: 'Namespace' object has no attribute 'list'
Test Plan:
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
# What does this PR do?
The method "dict" in class "BaseModel" is deprecated we should use
model_dump instead.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)
Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1,
3.2 are not all models can show it with `prompt-format`, so cannot refer
to `llama model list`,
only refer to list when enter a invalid model, so it would be nice to
help to check the valid models:
```
llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]
llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct
before:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Model Family (llama3_1, llama3_X, etc.)
Example:
llama model prompt-format <options>
after:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Model Family (llama3_1, llama3_X, etc.)
-l, --list List the valid supported models
Example:
llama model prompt-format <options>
$ llama model prompt-format -l
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Llama3.1-8B │
├──────────────────────────────┤
│ Llama3.1-70B │
├──────────────────────────────┤
│ Llama3.1-405B │
├──────────────────────────────┤
│ Llama3.1-8B-Instruct │
├──────────────────────────────┤
│ Llama3.1-70B-Instruct │
├──────────────────────────────┤
│ Llama3.1-405B-Instruct │
├──────────────────────────────┤
│ Llama3.2-1B │
├──────────────────────────────┤
│ Llama3.2-3B │
├──────────────────────────────┤
│ Llama3.2-1B-Instruct │
├──────────────────────────────┤
│ Llama3.2-3B-Instruct │
├──────────────────────────────┤
│ Llama3.2-11B-Vision │
├──────────────────────────────┤
│ Llama3.2-90B-Vision │
├──────────────────────────────┤
│ Llama3.2-11B-Vision-Instruct │
├──────────────────────────────┤
│ Llama3.2-90B-Vision-Instruct │
└──────────────────────────────┘
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
This fixes release build failure:
3796356500
```
+ llama model prompt-format -m Llama3.2-11B-Vision-Instruct
Traceback (most recent call last):
File "/tmp/tmp.PXMDlmD0x5/.venv/bin/llama", line 4, in <module>
from llama_stack.cli.llama import main
File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 10, in <module>
from .model import ModelParser
File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/__init__.py", line 7, in <module>
from .model import ModelParser # noqa
File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/model.py", line 16, in <module>
from llama_stack.cli.utils import print_subcommand_description
ModuleNotFoundError: No module named 'llama_stack.cli.utils'
```
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
```
before:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{model,stack,download,verify-download}
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
Work with llama models
options:
-h, --help show this help message and exit
model_subcommands:
{download,list,prompt-format,describe,verify-download,remove}
$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...
Operations for the Llama Stack / Distributions
options:
-h, --help show this help message and exit
--version show program's version number and exit
stack_subcommands:
{build,list-apis,list-providers,run}
===================
after:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{model,stack,download,verify-download}
model Work with llama models
stack Operations for the Llama Stack / Distributions
download Download a model from llama.meta.com or Hugging Face Hub
verify-download Verify integrity of downloaded model files
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
Work with llama models
options:
-h, --help show this help message and exit
model_subcommands:
{download,list,prompt-format,describe,verify-download,remove}
download Download a model from llama.meta.com or Hugging Face Hub
list Show available llama models
prompt-format Show llama model message formats
describe Show details about a llama model
verify-download Verify the downloaded checkpoints' checksums for models downloaded from Meta
remove Remove the downloaded llama model
$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...
Operations for the Llama Stack / Distributions
options:
-h, --help show this help message and exit
--version show program's version number and exit
stack_subcommands:
{build,list-apis,list-providers,run}
build Build a Llama stack container
list-apis List APIs part of the Llama Stack implementation
list-providers Show available Llama Stack Providers for an API
run Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
If not passed the `headers`, it will display empty for the first row,
also might break the second row, make the `Model` row as `headers`.
```
Before:
$ llama model describe -m Llama3.1-70B
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ ┃ <<<---------
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Model │ Llama3.1-70B │ <<<---------
├─────────────────────────────┼────────────────────────────────┤
│ Hugging Face ID │ meta-llama/Llama-3.1-70B │
├─────────────────────────────┼────────────────────────────────┤
│ Description │ Llama 3.1 70b model │
├─────────────────────────────┼────────────────────────────────┤
......
after:
$ llama model describe -m Llama3.1-70B
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model ┃ Llama3.1-70B ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Hugging Face ID │ meta-llama/Llama-3.1-70B │
├─────────────────────────────┼────────────────────────────────┤
│ Description │ Llama 3.1 70b model │
├─────────────────────────────┼────────────────────────────────┤
......
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
`llama model list` or `llama model list --show-all` will list more or
all for the models, so add the `search` option to simplify the output.
```
$ llama model list --help
usage: llama model list [-h] [--show-all] [-s SEARCH]
Show available llama models
options:
-h, --help show this help message and exit
--show-all Show all models (not just defaults)
-s SEARCH, --search SEARCH
Search for the input string as a substring in the model descriptor(ID)
$ llama model list -s 70b
+-----------------------+-----------------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo | Context Length |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B | meta-llama/Llama-3.1-70B | 128K |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B-Instruct | meta-llama/Llama-3.1-70B-Instruct | 128K |
+-----------------------+-----------------------------------+----------------+
| Llama3.3-70B-Instruct | meta-llama/Llama-3.3-70B-Instruct | 128K |
+-----------------------+-----------------------------------+----------------+
$ llama model list -s 3.1-8b
+----------------------+----------------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo | Context Length |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B | meta-llama/Llama-3.1-8B | 128K |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B-Instruct | meta-llama/Llama-3.1-8B-Instruct | 128K |
+----------------------+----------------------------------+----------------+
$ llama model list --show-all -s pro
+----------------------+-----------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo | Context Length |
+----------------------+-----------------------------+----------------+
| Prompt-Guard-86M | meta-llama/Prompt-Guard-86M | 2K |
+----------------------+-----------------------------+----------------+
$ llama model list -s k
Not found for search.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Currently , `model` in `--downloaded` just use the directory(already
replace `:`), so covert back to descriptor keep the same with ` llama
model list`, and remove command also use `descriptor`.
```
before:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model | Size | Modified Time |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct-int4-qlora-eo8 | 1.53 GB | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+
after:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model | Size | Modified Time |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct:int4-qlora-eo8 | 1.53 GB | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
add a subcommand, help to clean the unneeded model:
```
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
Work with llama models
options:
-h, --help show this help message and exit
$ llama model remove --help
usage: llama model remove [-h] -m MODEL [-f]
Remove the downloaded llama model
options:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Specify the llama downloaded model name
-f, --force Used to forcefully remove the llama model from the storage without further confirmation
$ llama model remove -m Llama3.2-1B-Instruct:int4-qlora-eo8
Are you sure you want to remove Llama3.2-1B-Instruct:int4-qlora-eo8? (y/n): n
Removal aborted.
$ llama model remove -mLlama3.2-1B-Instruct:int4-qlora-eo8-f
Llama3.2-1B-Instruct:int4-qlora-eo8 removed.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
```
$ llama model list --help
usage: llama model list [-h] [--show-all] [--downloaded]
Show available llama models
options:
-h, --help show this help message and exit
--show-all Show all models (not just defaults)
--downloaded List the downloaded models
$ llama model list --downloaded
+-------------+----------+---------------------+
| Model | Size | Modified Time |
+-------------+----------+---------------------+
| Llama3.2-1B | 2.31 GB | 2025-02-16 13:38:04 |
+-------------+----------+---------------------+
| Llama3.1-8B | 14.97 GB | 2025-02-16 10:36:37 |
+-------------+----------+---------------------+
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Re-check and based on the doc, the download model id, actually is model
descriptor(also without `meta-llama/`).
https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html
```
$ llama download --source huggingface --model-id Llama-Guard-3-1B:int4 --hf-token xxx # model descriptor
Fetching 8 files: 0%| | 0/8 [00:00<?, ?it/s]
LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.71k/7.71k [00:00<00:00, 10.5MB/s]
$ llama download --source huggingface --model-id Llama-Guard-3-1B-INT4 --hf-token xxxx # hugging face repo without meta-llama/
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
[--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<---
$ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
[--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found
$ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8
Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): ^CTraceback (most recent call last):
$ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL]
[--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Based on the code
6b1773d530/llama_stack/cli/download.py (L379)
and test, `verify-download` should only use in `downloaded from Meta`.
```
test: no checklist.chk file for hf download
$ llama model download --source meta --model-id Llama3.2-1B
Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00
Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00
Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00
Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00
before:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID
Verify the downloaded checkpoints' checksums
options:
-h, --help show this help message and exit
--model-id MODEL_ID Model ID to verify
after:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID
Verify the downloaded checkpoints' checksums for models downloaded from Meta
options:
-h, --help show this help message and exit
--model-id MODEL_ID Model ID to verify (only for models downloaded from Meta)
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.
This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279
## Test Plan
Ensure all `llama` CLI `model` sub-commands work:
```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```
Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```
Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs
Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.
```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
# What does this PR do?
- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Since the subcommands used `MODEL_ID`, it would be better to use it in
`model list` and make it easy to find it.
```
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID <<
$ llama model describe --help
usage: llama model describe [-h] -m MODEL_ID <<
$ llama download --help
--model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models
before:
$ llama model list
+-----------------------------------------+-----------------------------------------------------+----------------+
| Model Descriptor | Hugging Face Repo | Context Length |
+-----------------------------------------+-----------------------------------------------------+----------------+
after:
$ llama model list
+-----------------------------------------+-----------------------------------------------------+----------------+
| Model Descriptor | Model ID | Context Length |
+-----------------------------------------+-----------------------------------------------------+----------------+
| Llama3.1-8B | meta-llama/Llama-3.1-8B | 128K |
+-----------------------------------------+-----------------------------------------------------+----------------+
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
when executing a sub-command like `llama model` the improper help text,
sub-commands, and flags are displayed. each command group needs to have
`.set_defaults` to display this info properly
before:
```
llama model
usage: llama [-h] {model,stack,download,verify-download} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{model,stack,download,verify-download}
```
after:
```
llama model
usage: llama model [-h] {download,list,prompt-format,describe,verify-download} ...
Work with llama models
options:
-h, --help show this help message and exit
model_subcommands:
{download,list,prompt-format,describe,verify-download}
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Cleans up how we provide sampling params. Earlier, strategy was an enum
and all params (top_p, temperature, top_k) across all strategies were
grouped. We now have a strategy union object with each strategy (greedy,
top_p, top_k) having its corresponding params.
Earlier,
```
class SamplingParams:
strategy: enum ()
top_p, temperature, top_k and other params
```
However, the `strategy` field was not being used in any providers making
it confusing to know the exact sampling behavior purely based on the
params since you could pass temperature, top_p, top_k and how the
provider would interpret those would not be clear.
Hence we introduced -- a union where the strategy and relevant params
are all clubbed together to avoid this confusion.
Have updated all providers, tests, notebooks, readme and otehr places
where sampling params was being used to use the new format.
## Test Plan
`pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py`
// inference on ollama, fireworks and together
`with-proxy pytest -v -s -k "ollama"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py `
// agents on fireworks
`pytest -v -s -k 'fireworks and create_agent'
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/agents/test_agents.py
--safety-shield="meta-llama/Llama-Guard-3-8B"`
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [X] Updated relevant documentation.
- [X] Wrote necessary unit or integration tests.
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
# What does this PR do?
It is important to verify large checkpoints downloaded via `llama model
download` because subtle corruptions can easily happen with large file
system writes. This PR adds a `verify-download` subcommand. Note that
verification itself is a very time consuming process (and will take
several **minutes** for the 405B model), hence this is a separate
subcommand (and not part of the download which can already be
time-consuming) and there are spinners and a bit of a "show" around it
in the implementation.
## Test Plan
<img width="1012" alt="image"
src="https://github.com/user-attachments/assets/f82b0d42-2a15-4917-b85e-6d3cd7d31e55">
* API Keys passed from Client instead of distro configuration
* delete distribution registry
* Rename the "package" word away
* Introduce a "Router" layer for providers
Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:
- The inference API should be a routing layer over inference providers,
routed using the "model" key
- The memory banks API is another instance where various memory bank
types will be provided by independent providers (e.g., a vector store
is served by Chroma while a keyvalue memory can be served by Redis or
PGVector)
This commit introduces a generalized routing layer for this purpose.
* update `apis_to_serve`
* llama_toolchain -> llama_stack
* Codemod from llama_toolchain -> llama_stack
- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent
* Moved some stuff out of common/; re-generated OpenAPI spec
* llama-toolchain -> llama-stack (hyphens)
* add control plane API
* add redis adapter + sqlite provider
* move core -> distribution
* Some more toolchain -> stack changes
* small naming shenanigans
* Removing custom tool and agent utilities and moving them client side
* Move control plane to distribution server for now
* Remove control plane from API list
* no codeshield dependency randomly plzzzzz
* Add "fire" as a dependency
* add back event loggers
* stack configure fixes
* use brave instead of bing in the example client
* add init file so it gets packaged
* add init files so it gets packaged
* Update MANIFEST
* bug fix
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>