llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Ashwin Bharambe	530d4bdfe1	refactor: move all llama code to models/llama out of meta reference (#1887 ) # What does this PR do? Move around bits. This makes the copies from llama-models _much_ easier to maintain and ensures we don't entangle meta-reference specific tidbits into llama-models code even by accident. Also, kills the meta-reference-quantized-gpu distro and rolls quantization deps into meta-reference-gpu. ## Test Plan ``` LLAMA_MODELS_DEBUG=1 \ with-proxy llama stack run meta-reference-gpu \ --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \ --env INFERENCE_CHECKPOINT_DIR=<DIR> \ --env MODEL_PARALLEL_SIZE=4 \ --env QUANTIZATION_TYPE=fp8_mixed ``` Start a server with and without quantization. Point integration tests to it using: ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-04-07 15:03:58 -07:00
Yuan Tang	ca0cbf4338	fix: Fix pre-commit check (#1628 ) # What does this PR do? Fixes pre-commit check failure after merging https://github.com/meta-llama/llama-stack/pull/1010: `3874877097` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-13 18:57:42 -07:00
Alina Ryan	c02464b635	fix: Clarify `llama model prompt-format` help text (#1010 ) # What does this PR do? Updates the help text for the `llama model prompt-format` command to clarify that users should provide a specific model name (e.g., Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the default value and field for `--model-name` to prevent users from mistakenly thinking a model family name is acceptable. Adds guidance to run `llama model list` to view valid model names. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Output of `llama model prompt-format -h` Before: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1 usage: llama model prompt-format [-h] [-m MODEL_NAME] llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct ``` Output of `llama model prompt-format -h` After: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Example: Llama3.1-8B or Llama3.2-11B-Vision, etc (Run `llama model list` to see a list of valid model names) Example: llama model prompt-format <options> ``` Signed-off-by: Alina Ryan <aliryan@redhat.com>	2025-03-13 20:47:09 -04:00
ehhuang	256448c14e	fix(cli): llama model prompt-format (#1481 ) Summary: + llama model prompt-format -m Llama3.2-11B-Vision-Instruct Traceback (most recent call last): File "/tmp/tmp.gCwyyCcjoA/.venv/bin/llama", line 10, in <module> sys.exit(main()) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 50, in main parser.run(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 44, in run args.func(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/model/prompt_format.py", line 59, in _run_model_template_cmd if args.list: AttributeError: 'Namespace' object has no attribute 'list' Test Plan: llama model prompt-format -m Llama3.2-11B-Vision-Instruct	2025-03-07 11:45:54 -08:00
Ashwin Bharambe	46b0a404e8	chore: remove straggler references to llama-models (#1345 ) Straggler references cleanup	2025-03-01 14:26:03 -08:00
Reid	3b57d8ee88	feat: add prompt-format list (#1222 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)` Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1, 3.2 are not all models can show it with `prompt-format`, so cannot refer to `llama model list`, only refer to list when enter a invalid model, so it would be nice to help to check the valid models: ``` llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8 usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct before: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> after: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) -l, --list List the valid supported models Example: llama model prompt-format <options> $ llama model prompt-format -l ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Llama3.1-8B │ ├──────────────────────────────┤ │ Llama3.1-70B │ ├──────────────────────────────┤ │ Llama3.1-405B │ ├──────────────────────────────┤ │ Llama3.1-8B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-70B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-405B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-1B │ ├──────────────────────────────┤ │ Llama3.2-3B │ ├──────────────────────────────┤ │ Llama3.2-1B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-3B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision-Instruct │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision-Instruct │ └──────────────────────────────┘ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 09:27:22 -08:00
Ashwin Bharambe	314ee09ae3	chore: move all Llama Stack types from llama-models to llama-stack (#1098 ) llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ```	2025-02-14 09:10:59 -08:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Yuan Tang	9ec54dcbe7	Switch to use importlib instead of deprecated pkg_resources (#678 ) `pkg_resources` has been deprecated. This PR switches to use `importlib.resources`. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-13 20:20:02 -08:00
Ashwin Bharambe	546f05bd3f	No automatic pager	2024-10-02 12:26:09 -07:00
Ashwin Bharambe	d82a9d94e3	Small fix to the prompt-format error message	2024-09-25 10:56:13 -07:00
Ashwin Bharambe	56aed59eb4	Support for Llama3.2 models and Swift SDK (#98 )	2024-09-25 10:29:58 -07:00

13 commits