llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 02:03:44 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	530d4bdfe1	refactor: move all llama code to models/llama out of meta reference (#1887 ) # What does this PR do? Move around bits. This makes the copies from llama-models _much_ easier to maintain and ensures we don't entangle meta-reference specific tidbits into llama-models code even by accident. Also, kills the meta-reference-quantized-gpu distro and rolls quantization deps into meta-reference-gpu. ## Test Plan ``` LLAMA_MODELS_DEBUG=1 \ with-proxy llama stack run meta-reference-gpu \ --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \ --env INFERENCE_CHECKPOINT_DIR=<DIR> \ --env MODEL_PARALLEL_SIZE=4 \ --env QUANTIZATION_TYPE=fp8_mixed ``` Start a server with and without quantization. Point integration tests to it using: ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-04-07 15:03:58 -07:00
Yuan Tang	ca0cbf4338	fix: Fix pre-commit check (#1628 ) # What does this PR do? Fixes pre-commit check failure after merging https://github.com/meta-llama/llama-stack/pull/1010: `3874877097` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-13 18:57:42 -07:00
Alina Ryan	c02464b635	fix: Clarify `llama model prompt-format` help text (#1010 ) # What does this PR do? Updates the help text for the `llama model prompt-format` command to clarify that users should provide a specific model name (e.g., Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the default value and field for `--model-name` to prevent users from mistakenly thinking a model family name is acceptable. Adds guidance to run `llama model list` to view valid model names. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Output of `llama model prompt-format -h` Before: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1 usage: llama model prompt-format [-h] [-m MODEL_NAME] llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct ``` Output of `llama model prompt-format -h` After: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Example: Llama3.1-8B or Llama3.2-11B-Vision, etc (Run `llama model list` to see a list of valid model names) Example: llama model prompt-format <options> ``` Signed-off-by: Alina Ryan <aliryan@redhat.com>	2025-03-13 20:47:09 -04:00
ehhuang	256448c14e	fix(cli): llama model prompt-format (#1481 ) Summary: + llama model prompt-format -m Llama3.2-11B-Vision-Instruct Traceback (most recent call last): File "/tmp/tmp.gCwyyCcjoA/.venv/bin/llama", line 10, in <module> sys.exit(main()) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 50, in main parser.run(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 44, in run args.func(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/model/prompt_format.py", line 59, in _run_model_template_cmd if args.list: AttributeError: 'Namespace' object has no attribute 'list' Test Plan: llama model prompt-format -m Llama3.2-11B-Vision-Instruct	2025-03-07 11:45:54 -08:00
Sébastien Han	4bbb4ddeae	fix: resolve pydantic warning on .dict() usage (#1445 ) # What does this PR do? The method "dict" in class "BaseModel" is deprecated we should use model_dump instead. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-06 11:27:47 -08:00
Ashwin Bharambe	46b0a404e8	chore: remove straggler references to llama-models (#1345 ) Straggler references cleanup	2025-03-01 14:26:03 -08:00
Reid	3b57d8ee88	feat: add prompt-format list (#1222 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)` Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1, 3.2 are not all models can show it with `prompt-format`, so cannot refer to `llama model list`, only refer to list when enter a invalid model, so it would be nice to help to check the valid models: ``` llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8 usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct before: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> after: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) -l, --list List the valid supported models Example: llama model prompt-format <options> $ llama model prompt-format -l ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Llama3.1-8B │ ├──────────────────────────────┤ │ Llama3.1-70B │ ├──────────────────────────────┤ │ Llama3.1-405B │ ├──────────────────────────────┤ │ Llama3.1-8B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-70B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-405B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-1B │ ├──────────────────────────────┤ │ Llama3.2-3B │ ├──────────────────────────────┤ │ Llama3.2-1B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-3B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision-Instruct │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision-Instruct │ └──────────────────────────────┘ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 09:27:22 -08:00
Yuan Tang	3567274183	fix: Incorrect import path for print_subcommand_description() (#1313 ) # What does this PR do? This fixes release build failure: `3796356500` ``` + llama model prompt-format -m Llama3.2-11B-Vision-Instruct Traceback (most recent call last): File "/tmp/tmp.PXMDlmD0x5/.venv/bin/llama", line 4, in <module> from llama_stack.cli.llama import main File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 10, in <module> from .model import ModelParser File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/__init__.py", line 7, in <module> from .model import ModelParser # noqa File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/model.py", line 16, in <module> from llama_stack.cli.utils import print_subcommand_description ModuleNotFoundError: No module named 'llama_stack.cli.utils' ``` ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-27 21:24:01 -05:00
Reid	94e2186bb8	chore: add subcommands description in help (#1219 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` before: $ llama usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download,remove} $ llama stack --help usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ... Operations for the Llama Stack / Distributions options: -h, --help show this help message and exit --version show program's version number and exit stack_subcommands: {build,list-apis,list-providers,run} =================== after: $ llama usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} model Work with llama models stack Operations for the Llama Stack / Distributions download Download a model from llama.meta.com or Hugging Face Hub verify-download Verify integrity of downloaded model files $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download,remove} download Download a model from llama.meta.com or Hugging Face Hub list Show available llama models prompt-format Show llama model message formats describe Show details about a llama model verify-download Verify the downloaded checkpoints' checksums for models downloaded from Meta remove Remove the downloaded llama model $ llama stack --help usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ... Operations for the Llama Stack / Distributions options: -h, --help show this help message and exit --version show program's version number and exit stack_subcommands: {build,list-apis,list-providers,run} build Build a Llama stack container list-apis List APIs part of the Llama Stack implementation list-providers Show available Llama Stack Providers for an API run Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-27 17:00:27 -08:00
Reid	56c1a50b86	fix: fix the describe table display issue (#1221 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] If not passed the `headers`, it will display empty for the first row, also might break the second row, make the `Model` row as `headers`. ``` Before: $ llama model describe -m Llama3.1-70B ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ <<<--------- ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Model │ Llama3.1-70B │ <<<--------- ├─────────────────────────────┼────────────────────────────────┤ │ Hugging Face ID │ meta-llama/Llama-3.1-70B │ ├─────────────────────────────┼────────────────────────────────┤ │ Description │ Llama 3.1 70b model │ ├─────────────────────────────┼────────────────────────────────┤ ...... after: $ llama model describe -m Llama3.1-70B ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ Llama3.1-70B ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Hugging Face ID │ meta-llama/Llama-3.1-70B │ ├─────────────────────────────┼────────────────────────────────┤ │ Description │ Llama 3.1 70b model │ ├─────────────────────────────┼────────────────────────────────┤ ...... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-25 21:34:53 -08:00
Reid	187524d4ae	feat: add substring search for model list (#1099 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `llama model list` or `llama model list --show-all` will list more or all for the models, so add the `search` option to simplify the output. ``` $ llama model list --help usage: llama model list [-h] [--show-all] [-s SEARCH] Show available llama models options: -h, --help show this help message and exit --show-all Show all models (not just defaults) -s SEARCH, --search SEARCH Search for the input string as a substring in the model descriptor(ID) $ llama model list -s 70b +-----------------------+-----------------------------------+----------------+ \| Model Descriptor(ID) \| Hugging Face Repo \| Context Length \| +-----------------------+-----------------------------------+----------------+ \| Llama3.1-70B \| meta-llama/Llama-3.1-70B \| 128K \| +-----------------------+-----------------------------------+----------------+ \| Llama3.1-70B-Instruct \| meta-llama/Llama-3.1-70B-Instruct \| 128K \| +-----------------------+-----------------------------------+----------------+ \| Llama3.3-70B-Instruct \| meta-llama/Llama-3.3-70B-Instruct \| 128K \| +-----------------------+-----------------------------------+----------------+ $ llama model list -s 3.1-8b +----------------------+----------------------------------+----------------+ \| Model Descriptor(ID) \| Hugging Face Repo \| Context Length \| +----------------------+----------------------------------+----------------+ \| Llama3.1-8B \| meta-llama/Llama-3.1-8B \| 128K \| +----------------------+----------------------------------+----------------+ \| Llama3.1-8B-Instruct \| meta-llama/Llama-3.1-8B-Instruct \| 128K \| +----------------------+----------------------------------+----------------+ $ llama model list --show-all -s pro +----------------------+-----------------------------+----------------+ \| Model Descriptor(ID) \| Hugging Face Repo \| Context Length \| +----------------------+-----------------------------+----------------+ \| Prompt-Guard-86M \| meta-llama/Prompt-Guard-86M \| 2K \| +----------------------+-----------------------------+----------------+ $ llama model list -s k Not found for search. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-21 16:38:10 -08:00
Reid	9898589f12	fix: convert back to model descriptor for model in list --downloaded (#1201 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Currently , `model` in `--downloaded` just use the directory(already replace `:`), so covert back to descriptor keep the same with ` llama model list`, and remove command also use `descriptor`. ``` before: $ llama model list --downloaded +-------------------------------------+----------+---------------------+ \| Model \| Size \| Modified Time \| +-------------------------------------+----------+---------------------+ \| Llama3.2-1B-Instruct-int4-qlora-eo8 \| 1.53 GB \| 2025-02-20 16:32:49 \| +-------------------------------------+----------+---------------------+ after: $ llama model list --downloaded +-------------------------------------+----------+---------------------+ \| Model \| Size \| Modified Time \| +-------------------------------------+----------+---------------------+ \| Llama3.2-1B-Instruct:int4-qlora-eo8 \| 1.53 GB \| 2025-02-20 16:32:49 \| +-------------------------------------+----------+---------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-21 08:10:34 -08:00
Reid	c9c4a3c921	feat: model remove cmd (#1128 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] add a subcommand, help to clean the unneeded model: ``` $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit $ llama model remove --help usage: llama model remove [-h] -m MODEL [-f] Remove the downloaded llama model options: -h, --help show this help message and exit -m MODEL, --model MODEL Specify the llama downloaded model name -f, --force Used to forcefully remove the llama model from the storage without further confirmation $ llama model remove -m Llama3.2-1B-Instruct:int4-qlora-eo8 Are you sure you want to remove Llama3.2-1B-Instruct:int4-qlora-eo8? (y/n): n Removal aborted. $ llama model remove -mLlama3.2-1B-Instruct:int4-qlora-eo8-f Llama3.2-1B-Instruct:int4-qlora-eo8 removed. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-21 08:05:12 -08:00
Reid	af377e844d	feat: add a option to list the downloaded models (#1127 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` $ llama model list --help usage: llama model list [-h] [--show-all] [--downloaded] Show available llama models options: -h, --help show this help message and exit --show-all Show all models (not just defaults) --downloaded List the downloaded models $ llama model list --downloaded +-------------+----------+---------------------+ \| Model \| Size \| Modified Time \| +-------------+----------+---------------------+ \| Llama3.2-1B \| 2.31 GB \| 2025-02-16 13:38:04 \| +-------------+----------+---------------------+ \| Llama3.1-8B \| 14.97 GB \| 2025-02-16 10:36:37 \| +-------------+----------+---------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-19 22:17:39 -08:00
Reid	4e76d312fa	fix: modify the model id title for model list (#1095 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Re-check and based on the doc, the download model id, actually is model descriptor(also without `meta-llama/`). https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html ``` $ llama download --source huggingface --model-id Llama-Guard-3-1B:int4 --hf-token xxx # model descriptor Fetching 8 files: 0%\| \| 0/8 [00:00<?, ?it/s] LICENSE.txt: 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████\| 7.71k/7.71k [00:00<00:00, 10.5MB/s] $ llama download --source huggingface --model-id Llama-Guard-3-1B-INT4 --hf-token xxxx # hugging face repo without meta-llama/ usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<--- $ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found $ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8 Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): ^CTraceback (most recent call last): $ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:26:41 -08:00
Reid	92aefec191	style: update verify-download help text (#1134 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Based on the code `6b1773d530/llama_stack/cli/download.py (L379)` and test, `verify-download` should only use in `downloaded from Meta`. ``` test: no checklist.chk file for hf download $ llama model download --source meta --model-id Llama3.2-1B Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00 Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00 Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00 Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00 before: $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID Verify the downloaded checkpoints' checksums options: -h, --help show this help message and exit --model-id MODEL_ID Model ID to verify after: $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID Verify the downloaded checkpoints' checksums for models downloaded from Meta options: -h, --help show this help message and exit --model-id MODEL_ID Model ID to verify (only for models downloaded from Meta) ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:15:26 -08:00
Ashwin Bharambe	314ee09ae3	chore: move all Llama Stack types from llama-models to llama-stack (#1098 ) llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ```	2025-02-14 09:10:59 -08:00
Reid	2f7268b790	fix: add the missed help description info (#1096 )	2025-02-13 21:31:36 -08:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Reid	47fccf0d03	style: update model id in model list title (#1072 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Since the subcommands used `MODEL_ID`, it would be better to use it in `model list` and make it easy to find it. ``` $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID << $ llama model describe --help usage: llama model describe [-h] -m MODEL_ID << $ llama download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models before: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ \| Model Descriptor \| Hugging Face Repo \| Context Length \| +-----------------------------------------+-----------------------------------------------------+----------------+ after: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ \| Model Descriptor \| Model ID \| Context Length \| +-----------------------------------------+-----------------------------------------------------+----------------+ \| Llama3.1-8B \| meta-llama/Llama-3.1-8B \| 128K \| +-----------------------------------------+-----------------------------------------------------+----------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-13 08:33:11 -08:00
Charlie Doern	5f88ff0b6a	fix: show proper help text (#1065 ) # What does this PR do? when executing a sub-command like `llama model` the improper help text, sub-commands, and flags are displayed. each command group needs to have `.set_defaults` to display this info properly before: ``` llama model usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} ``` after: ``` llama model usage: llama model [-h] {download,list,prompt-format,describe,verify-download} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download} ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-12 06:38:25 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Hardik Shah	a51c8b4efc	Convert `SamplingParams.strategy` to a union (#767 ) # What does this PR do? Cleans up how we provide sampling params. Earlier, strategy was an enum and all params (top_p, temperature, top_k) across all strategies were grouped. We now have a strategy union object with each strategy (greedy, top_p, top_k) having its corresponding params. Earlier, ``` class SamplingParams: strategy: enum () top_p, temperature, top_k and other params ``` However, the `strategy` field was not being used in any providers making it confusing to know the exact sampling behavior purely based on the params since you could pass temperature, top_p, top_k and how the provider would interpret those would not be clear. Hence we introduced -- a union where the strategy and relevant params are all clubbed together to avoid this confusion. Have updated all providers, tests, notebooks, readme and otehr places where sampling params was being used to use the new format. ## Test Plan `pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py` // inference on ollama, fireworks and together `with-proxy pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py ` // agents on fireworks `pytest -v -s -k 'fireworks and create_agent' --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/agents/test_agents.py --safety-shield="meta-llama/Llama-Guard-3-8B"` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [X] Wrote necessary unit or integration tests. --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-15 05:38:51 -08:00
Yuan Tang	9ec54dcbe7	Switch to use importlib instead of deprecated pkg_resources (#678 ) `pkg_resources` has been deprecated. This PR switches to use `importlib.resources`. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-13 20:20:02 -08:00
Xi Yan	3c72c034e6	[remove import ] clean up import 's (#689 ) # What does this PR do? - as title, cleaning up `import `'s - upgrade tests to make them more robust to bad model outputs - remove import 's in llama_stack/apis/* (skip __init__ modules) <img width="465" alt="image" src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2" /> - run `sh run_openapi_generator.sh`, no types gets affected ## Test Plan ### Providers Tests agents ``` pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8 ``` inference ```bash # meta-reference torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py # together pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py ``` safety ``` pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B ``` memory ``` pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384 ``` scoring ``` pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py ``` datasetio ``` pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py ``` eval ``` pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py ``` ### Client-SDK Tests ``` LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk ``` ### llama-stack-apps ``` PORT=5000 LOCALHOST=localhost python -m examples.agents.hello $LOCALHOST $PORT python -m examples.agents.inflation $LOCALHOST $PORT python -m examples.agents.podcast_transcript $LOCALHOST $PORT python -m examples.agents.rag_as_attachments $LOCALHOST $PORT python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT # Vision model python -m examples.interior_design_assistant.app python -m examples.agent_store.app $LOCALHOST $PORT ``` ### CLI ``` which llama llama model prompt-format -m Llama3.2-11B-Vision-Instruct llama model list llama stack list-apis llama stack list-providers inference llama stack build --template ollama --image-type conda ``` ### Distributions Tests ollama ``` llama stack build --template ollama --image-type conda ollama run llama3.2:1b-instruct-fp16 llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct ``` fireworks ``` llama stack build --template fireworks --image-type conda llama stack run ./llama_stack/templates/fireworks/run.yaml ``` together ``` llama stack build --template together --image-type conda llama stack run ./llama_stack/templates/together/run.yaml ``` tgi ``` llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-27 15:45:44 -08:00
Ashwin Bharambe	acbecbf8b3	Add a verify-download command to llama CLI (#457 ) # What does this PR do? It is important to verify large checkpoints downloaded via `llama model download` because subtle corruptions can easily happen with large file system writes. This PR adds a `verify-download` subcommand. Note that verification itself is a very time consuming process (and will take several minutes for the 405B model), hence this is a separate subcommand (and not part of the download which can already be time-consuming) and there are spinners and a bit of a "show" around it in the implementation. ## Test Plan <img width="1012" alt="image" src="https://github.com/user-attachments/assets/f82b0d42-2a15-4917-b85e-6d3cd7d31e55">	2024-11-14 11:47:51 -08:00
Ashwin Bharambe	546f05bd3f	No automatic pager	2024-10-02 12:26:09 -07:00
Ashwin Bharambe	cc5029a716	Add special case for prompt guard	2024-10-02 08:43:12 -07:00
Xi Yan	4ae8c63a2b	pre-commit lint	2024-09-28 16:04:41 -07:00
Mark Sze	3c99f08267	minor typo and HuggingFace -> Hugging Face (#113 )	2024-09-26 09:48:23 -07:00
Ashwin Bharambe	d82a9d94e3	Small fix to the prompt-format error message	2024-09-25 10:56:13 -07:00
Ashwin Bharambe	56aed59eb4	Support for Llama3.2 models and Swift SDK (#98 )	2024-09-25 10:29:58 -07:00
Xi Yan	c4534217c8	fix cli describe	2024-09-24 14:41:19 -07:00
Ashwin Bharambe	e617273d8c	attribute changed (model_args -> arch_args)	2024-09-23 21:44:26 -07:00
Ashwin Bharambe	9487ad8294	API Updates (#73 ) * API Keys passed from Client instead of distro configuration * delete distribution registry * Rename the "package" word away * Introduce a "Router" layer for providers Some providers need to be factorized and considered as thin routing layers on top of other providers. Consider two examples: - The inference API should be a routing layer over inference providers, routed using the "model" key - The memory banks API is another instance where various memory bank types will be provided by independent providers (e.g., a vector store is served by Chroma while a keyvalue memory can be served by Redis or PGVector) This commit introduces a generalized routing layer for this purpose. * update `apis_to_serve` * llama_toolchain -> llama_stack * Codemod from llama_toolchain -> llama_stack - added providers/registry - cleaned up api/ subdirectories and moved impls away - restructured api/api.py - from llama_stack.apis.<api> import foo should work now - update imports to do llama_stack.apis.<api> - update many other imports - added __init__, fixed some registry imports - updated registry imports - create_agentic_system -> create_agent - AgenticSystem -> Agent * Moved some stuff out of common/; re-generated OpenAPI spec * llama-toolchain -> llama-stack (hyphens) * add control plane API * add redis adapter + sqlite provider * move core -> distribution * Some more toolchain -> stack changes * small naming shenanigans * Removing custom tool and agent utilities and moving them client side * Move control plane to distribution server for now * Remove control plane from API list * no codeshield dependency randomly plzzzzz * Add "fire" as a dependency * add back event loggers * stack configure fixes * use brave instead of bing in the example client * add init file so it gets packaged * add init files so it gets packaged * Update MANIFEST * bug fix --------- Co-authored-by: Hardik Shah <hjshah@fb.com> Co-authored-by: Xi Yan <xiyan@meta.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com>	2024-09-17 19:51:35 -07:00

35 commits