Commit graph

147 commits

Author SHA1 Message Date
Ashwin Bharambe
530d4bdfe1
refactor: move all llama code to models/llama out of meta reference (#1887)
# What does this PR do?

Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.

Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.

## Test Plan

```
LLAMA_MODELS_DEBUG=1 \
  with-proxy llama stack run meta-reference-gpu \
  --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
   --env INFERENCE_CHECKPOINT_DIR=<DIR> \
   --env MODEL_PARALLEL_SIZE=4 \
   --env QUANTIZATION_TYPE=fp8_mixed
```

Start a server with and without quantization. Point integration tests to
it using:

```
pytest -s -v  tests/integration/inference/test_text_inference.py \
   --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
2025-04-07 15:03:58 -07:00
Ashwin Bharambe
b8f1561956
feat: introduce llama4 support (#1877)
As title says. Details in README, elsewhere.
2025-04-05 11:53:35 -07:00
Matthew Farrellee
a4c086cee0
fix: skip apis with no providers during llama stack build (#1835)
# What does this PR do?
closes #1834 

## Test Plan
`llama stack build` successfully
2025-03-29 08:39:35 -07:00
Ihar Hrachyshka
18bac27d4e
fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555)
# What does this PR do?

This is the second attempt to switch to system packages by default. Now
with a hack to detect conda environment - in which case conda image-type
is used.

Note: Conda will only be used when --image-name is unset *and*
CONDA_DEFAULT_ENV is set. This means that users without conda will
correctly fall back to using system packages when no --image-* arguments
are passed at all.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Uses virtualenv:

```
$ llama stack build --template ollama --image-type venv
$ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local
[...]
```

Uses system packages (virtualenv already initialized):

```
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
INFO     2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages.
[...]
```

Attempt to run from environment packages without necessary packages
installed:
```
$ python -m venv barebones
$ . ./barebones/bin/activate
$ pip install -e . # to install llama command
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
ModuleNotFoundError: No module named 'fastapi'
```

^ failed as expected because the environment doesn't have necessary
packages installed.

Now install some packages in the new environment:

```
$ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

Now see if setting CONDA_DEFAULT_ENV will change what happens by
default:

```
$ export CONDA_DEFAULT_ENV=base
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using conda environment: base
Conda environment base does not exist.
[...]
```

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-27 17:13:22 -04:00
Hardik Shah
cb2a9784ab
fix: multiple issues with getting_started notebook (#1795)
Fixes multiple issues 

1. llama stack build of dependencies was breaking with incompatible
numpy / pandas when importing datasets

Moved the notebook to start a local server instead of using library as a
client. This way the setup is cleaner since its all contained and by
using `uv run --with` we can test both the server setup process too in
CI and release time.

2. The change to [1] surfaced some other issues 
- running `llama stack run` was defaulting to conda env name 
- provider data was not being managed properly 
- Some notebook cells (telemetry for evals) were not updated with latest
changes

Fixed all the issues and update the notebook. 

### Test 

1. Manually run it all in local env 
2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`
2025-03-26 10:59:12 -07:00
Ashwin Bharambe
cb7b9dda6c fix: compare timezones correctly in download script 2025-03-21 11:46:57 -07:00
Sébastien Han
24fd06879e
refactor: simplify command execution and remove PTY handling (#1641)
# What does this PR do?

A PTY is unnecessary for interactive mode since `subprocess.run()`
already inherits the calling terminal’s stdin, stdout, and stderr,
allowing natural interaction. Using a PTY can introduce unwanted side
effects like buffering issues and inconsistent signal handling. Standard
input/output is sufficient for most interactive programs.

This commit simplifies the command execution by:

1. Removing PTY-based execution in favor of direct subprocess handling
2. Consolidating command execution into a single run_command function
3. Improving error handling with specific subprocess error types
4. Adding proper type hints and documentation
5. Maintaining Ctrl+C handling for graceful interruption

## Test Plan

```
llama stack run
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-17 15:03:14 -07:00
Yuan Tang
ca0cbf4338
fix: Fix pre-commit check (#1628)
# What does this PR do?

Fixes pre-commit check failure after merging
https://github.com/meta-llama/llama-stack/pull/1010:
3874877097

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-13 18:57:42 -07:00
Alina Ryan
c02464b635
fix: Clarify llama model prompt-format help text (#1010)
# What does this PR do?
Updates the help text for the `llama model prompt-format` command to
clarify that users should provide a specific model name (e.g.,
Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the
default value and field for `--model-name` to prevent users from
mistakenly thinking a model family name is acceptable. Adds guidance to
run `llama model list` to view valid model names.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Output of `llama model prompt-format -h` Before:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]

Show llama model message formats

options:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model-name MODEL_NAME
                        Model Family (llama3_1, llama3_X, etc.)

Example:
    llama model prompt-format <options>
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1
usage: llama model prompt-format [-h] [-m MODEL_NAME]
llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct
```

Output of `llama model prompt-format -h` After:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]

Show llama model message formats

options:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model-name MODEL_NAME
                        Example: Llama3.1-8B or Llama3.2-11B-Vision, etc
                        (Run `llama model list` to see a list of valid model names)

Example:
    llama model prompt-format <options>

```

Signed-off-by: Alina Ryan <aliryan@redhat.com>
2025-03-13 20:47:09 -04:00
Sébastien Han
98b1b15e0f
refactor: move all datetime.now() calls to UTC (#1589)
# What does this PR do?

Updated all instances of datetime.now() to use timezone.utc for
consistency in handling time across different systems. This ensures that
timestamps are always in Coordinated Universal Time (UTC), avoiding
issues with time zone discrepancies and promoting uniformity in
time-related data.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-13 15:34:53 -07:00
Ashwin Bharambe
e13c92f269
revert: feat(server): Use system packages for execution (#1551)
Reverts meta-llama/llama-stack#1252

The above PR breaks the following invocation:
```bash
llama stack run ~/.llama/distributions/together/together-run.yaml
```
2025-03-11 09:58:25 -07:00
Sébastien Han
21e39633d8
feat(server): Use system packages for execution (#1252)
# What does this PR do?

Users prefer to rely on the main CLI rather than invoking the server
through a Python module. Users interact with a high-level CLI rather
than needing to know internal module structures.

Now, when running llama stack run <path-to-config>, the server will
attempt to use the system package or a virtual environment if one is
active.

This also eliminates the current process dependency chain when running
from a virtual environment:

-> llama stack run
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> start_env.sh

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-> python -m server...

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive=2m &
llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6
```

Notice that the server starts and shutdowns normally.

[//]: # (## Documentation)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-10 16:01:03 -07:00
James Kunstle
735892cbd2
refactor: ImageType to LlamaStackImageType (#1500)
This disambiguates "Image" term from "container image" alternative usage
and allows for:

```python

if image_type == LlamaStackImagetype.venv:
	...

```

accesses rather than `ImageType.venv.value`

# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Changes enum use to comply with semantic python styling and naming
conventions.

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

Refactor was automated and small so simple run-through of creating
images was done.

Signed-off-by: James Kunstle <jkunstle@redhat.com>
2025-03-10 17:12:53 -04:00
ehhuang
256448c14e
fix(cli): llama model prompt-format (#1481)
Summary:

+ llama model prompt-format -m Llama3.2-11B-Vision-Instruct
Traceback (most recent call last):
  File "/tmp/tmp.gCwyyCcjoA/.venv/bin/llama", line 10, in <module>
    sys.exit(main())
File
"/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py",
line 50, in main
    parser.run(args)
File
"/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py",
line 44, in run
    args.func(args)
File
"/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/model/prompt_format.py",
line 59, in _run_model_template_cmd
    if args.list:
AttributeError: 'Namespace' object has no attribute 'list'

Test Plan:
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
2025-03-07 11:45:54 -08:00
Sébastien Han
7cf1e24c4e
feat(logging): implement category-based logging (#1362)
# What does this PR do?

This commit introduces a new logging system that allows loggers to be
assigned
a category while retaining the logger name based on the file name. The
log
format includes both the logger name and the category, producing output
like:

```
INFO     2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by
         tavily-search
```

Key features include:

- Category-based logging: Loggers can be assigned a category (e.g.,
  "core", "server") when programming. The logger can be loaded like
  this: `logger = get_logger(name=__name__, category="server")`
- Environment variable control: Log levels can be configured
per-category using the
  `LLAMA_STACK_LOGGING` environment variable. For example:
`LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for
the "server"
    and "core" categories.
- `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all
categories and
    third-party libraries.

This provides fine-grained control over logging levels while maintaining
a clean and
informative log format.

The formatter uses the rich library which provides nice colors better
stack traces like so:

```
ERROR    2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown
         task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at
         /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146>
         exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
         ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
         │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown                │
         │                                                                                                                │
         │   175 │   │   except asyncio.CancelledError:                                                                   │
         │   176 │   │   │   pass                                                                                         │
         │   177 │   │   finally:                                                                                         │
         │ ❱ 178 │   │   │   loop.stop()                                                                                  │
         │   179 │                                                                                                        │
         │   180 │   loop = asyncio.get_running_loop()                                                                    │
         │   181 │   loop.create_task(shutdown())                                                                         │
         ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         UnboundLocalError: local variable 'loop' referenced before assignment
```

Co-authored-by: Ashwin Bharambe <@ashwinb>
Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
INFO     2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml           
INFO     2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration:                                                 
INFO     2025-03-03 21:55:35,928 __main__:380 [server]: apis:                                                              
         - agents                                                     
``` 
[//]: # (## Documentation)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-07 11:34:30 -08:00
Charlie Doern
1097912054
refactor: display defaults in help text (#1480)
# What does this PR do?

using `formatter_class=argparse.ArgumentDefaultsHelpFormatter` displays
(default: DEFAULT_VALUE) for each flag. add this formatter class to
build and run to show users some default values like `conda`, `8321`,
etc

## Test Plan

ran locally with following output: 

before: 
```
llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
                       [--image-type {conda,container,venv}]
                       config

Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

positional arguments:
  config                Path to config file to use for the run

options:
  -h, --help            show this help message and exit
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current conda environment
  --disable-ipv6        Disable IPv6 support
  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times.
  --tls-keyfile TLS_KEYFILE
                        Path to TLS key file for HTTPS
  --tls-certfile TLS_CERTFILE
                        Path to TLS certificate file for HTTPS
  --image-type {conda,container,venv}
                        Image Type used during the build. This can be either conda or container or venv.
```

after:
```
llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
                       [--image-type {conda,container,venv}]
                       config

Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

positional arguments:
  config                Path to config file to use for the run

options:
  -h, --help            show this help message and exit
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current conda environment (default: None)
  --disable-ipv6        Disable IPv6 support (default: False)
  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: [])
  --tls-keyfile TLS_KEYFILE
                        Path to TLS key file for HTTPS (default: None)
  --tls-certfile TLS_CERTFILE
                        Path to TLS certificate file for HTTPS (default: None)
  --image-type {conda,container,venv}
                        Image Type used during the build. This can be either conda or container or venv. (default: conda)
```

[//]: # (## Documentation)

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-07 11:05:58 -08:00
Sébastien Han
4bbb4ddeae
fix: resolve pydantic warning on .dict() usage (#1445)
# What does this PR do?

The method "dict" in class "BaseModel" is deprecated we should use
model_dump instead.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-06 11:27:47 -08:00
Reid
a0d6b165b0
chore: remove unused build dir (#1379)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

- From old PR, it use `BUILDS_BASE_DIR` in
`llama_stack/cli/stack/configure.py`(removed).
https://github.com/meta-llama/llama-stack/pull/371/files
- Based on the current `build` code, it should only use
`DISTRIBS_BASE_DIR` to save it.

46b0a404e8/llama_stack/cli/stack/_build.py (L298)

46b0a404e8/llama_stack/cli/stack/_build.py (L301)
Pls correct me if I am understand incorrectly.
So it should no need to use in `run` now.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-05 15:40:00 -08:00
Ashwin Bharambe
dd0db8038b
refactor(test): unify vector_io tests and make them configurable (#1398)
## Test Plan


`LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec
pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2
--inference-model='' --vision-inference-model=''`

```
test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED
test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED
```

Same thing with:
- LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss
- LLAMA_STACK_CONFIG=fireworks

(Note that ergonomics will soon be improved re: cmd-line options and env
variables)
2025-03-04 13:37:45 -08:00
Ashwin Bharambe
55668d3c5b refactor: move a few tests to top-level tests/ directory 2025-03-03 17:33:39 -08:00
Reid
5c9d12a206
chore: improve --port help text (#1346)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

It would be better to tell user env var usage in help text.
```
before:
$ llama stack run --help
  --port PORT           Port to run the server on. Defaults to 8321

after
$ llama stack run --help
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-03 16:49:03 -08:00
Ashwin Bharambe
46b0a404e8
chore: remove straggler references to llama-models (#1345)
Straggler references cleanup
2025-03-01 14:26:03 -08:00
Reid
dc069025f5
chore: fix typo (#1343)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]


21ec67356c/distributions

It should missed the `s`.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-01 10:36:04 -08:00
Sébastien Han
6fa257b475
chore(lint): update Ruff ignores for project conventions and maintainability (#1184)
- Added new ignores from flake8-bugbear (`B007`, `B008`)
- Ignored `C901` (high function complexity) for now, pending review
- Maintained PyTorch conventions (`N812`, `N817`)
- Allowed `E731` (lambda assignments) for flexibility
- Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`)
- Documented rationale for each ignored rule

This keeps our linting aligned with project needs while tracking
potential fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 09:36:49 -08:00
Reid
3b57d8ee88
feat: add prompt-format list (#1222)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]


19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)

Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1,
3.2 are not all models can show it with `prompt-format`, so cannot refer
to `llama model list`,
only refer to list when enter a invalid model, so it would be nice to
help to check the valid models:
```
 llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]
llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct

before:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME]

Show llama model message formats

options:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model-name MODEL_NAME
                        Model Family (llama3_1, llama3_X, etc.)

Example:
    llama model prompt-format <options>


after:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]

Show llama model message formats

options:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model-name MODEL_NAME
                        Model Family (llama3_1, llama3_X, etc.)
  -l, --list            List the valid supported models

Example:
    llama model prompt-format <options>

$ llama model prompt-format -l
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model                        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Llama3.1-8B                  │
├──────────────────────────────┤
│ Llama3.1-70B                 │
├──────────────────────────────┤
│ Llama3.1-405B                │
├──────────────────────────────┤
│ Llama3.1-8B-Instruct         │
├──────────────────────────────┤
│ Llama3.1-70B-Instruct        │
├──────────────────────────────┤
│ Llama3.1-405B-Instruct       │
├──────────────────────────────┤
│ Llama3.2-1B                  │
├──────────────────────────────┤
│ Llama3.2-3B                  │
├──────────────────────────────┤
│ Llama3.2-1B-Instruct         │
├──────────────────────────────┤
│ Llama3.2-3B-Instruct         │
├──────────────────────────────┤
│ Llama3.2-11B-Vision          │
├──────────────────────────────┤
│ Llama3.2-90B-Vision          │
├──────────────────────────────┤
│ Llama3.2-11B-Vision-Instruct │
├──────────────────────────────┤
│ Llama3.2-90B-Vision-Instruct │
└──────────────────────────────┘
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-28 09:27:22 -08:00
Yuan Tang
a9f5c5bfca
fix: Incorrect import path for print_subcommand_description() (#1315)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 18:50:41 -08:00
Yuan Tang
f4df3a76d9
fix: Incorrect import path for print_subcommand_description() (#1314)
# What does this PR do?

Missed this one additional import in
https://github.com/meta-llama/llama-stack/pull/1313

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 18:35:49 -08:00
Yuan Tang
3567274183
fix: Incorrect import path for print_subcommand_description() (#1313)
# What does this PR do?

This fixes release build failure:
3796356500

```
+ llama model prompt-format -m Llama3.2-11B-Vision-Instruct
Traceback (most recent call last):
  File "/tmp/tmp.PXMDlmD0x5/.venv/bin/llama", line 4, in <module>
    from llama_stack.cli.llama import main
  File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 10, in <module>
    from .model import ModelParser
  File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/__init__.py", line 7, in <module>
    from .model import ModelParser  # noqa
  File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/model.py", line 16, in <module>
    from llama_stack.cli.utils import print_subcommand_description
ModuleNotFoundError: No module named 'llama_stack.cli.utils'
```

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 21:24:01 -05:00
Reid
94e2186bb8
chore: add subcommands description in help (#1219)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
before:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...

Welcome to the Llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {model,stack,download,verify-download}

$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

model_subcommands:
  {download,list,prompt-format,describe,verify-download,remove}

$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...

Operations for the Llama Stack / Distributions

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

stack_subcommands:
  {build,list-apis,list-providers,run}

===================
after:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...

Welcome to the Llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {model,stack,download,verify-download}

  model                 Work with llama models
  stack                 Operations for the Llama Stack / Distributions
  download              Download a model from llama.meta.com or Hugging Face Hub
  verify-download       Verify integrity of downloaded model files

$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

model_subcommands:
  {download,list,prompt-format,describe,verify-download,remove}

  download              Download a model from llama.meta.com or Hugging Face Hub
  list                  Show available llama models
  prompt-format         Show llama model message formats
  describe              Show details about a llama model
  verify-download       Verify the downloaded checkpoints' checksums for models downloaded from Meta
  remove                Remove the downloaded llama model

$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...

Operations for the Llama Stack / Distributions

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

stack_subcommands:
  {build,list-apis,list-providers,run}

  build                 Build a Llama stack container
  list-apis             List APIs part of the Llama Stack implementation
  list-providers        Show available Llama Stack Providers for an API
  run                   Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-27 17:00:27 -08:00
Ashwin Bharambe
c54164556a
fix: update notebooks to avoid using the nutsy --image-name __system__ thing (#1308)
The `--image-name __system__` thing was a hack and a bad one at that.
The actual intent was to somehow automatically detect the notebook
environment so we could avoid unnecessarily confusing things in the
llama stack build cmd-line. But I failed which led us to use the backup
`__system__` thing.

Let's just do the simple thing.

Note that `build_venv.sh` I haven't changed for now (so it still honors
the __system__ special name just that no new user should use it.)

## Test Plan

Open the notebooks from this branch in Colab (see example url below) and
ensure the builds work.


https://colab.research.google.com/github/meta-llama/llama-stack/blob/foo/docs/getting_started.ipynb

In the notebook, install llama-stack from this branch directly using:

```
!pip install -U https://github.com/meta-llama/llama-stack/archive/refs/heads/foo.zip
```

Verify that `!UV_SYSTEM_PYTHON=1 llama stack build --template together
--image-type venv` afterwards succeeds and the library client
initialization also works.
2025-02-27 16:39:04 -08:00
Yuan Tang
2ed2c0bd26
fix(cli): Missing default for --image-type in stack run command (#1274)
# What does this PR do?

I think this got accidentally removed as part of
https://github.com/meta-llama/llama-stack/pull/1250. cc @leseb

## Test Plan

After the change, this arg is no longer required.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-26 12:23:44 -08:00
Reid
3a002f6cf1
chore: update download error message (#1217)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Actually, the incorrect token also will hit `RepositoryNotFoundError`,
e.g.
```
$ llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx  ### xx is incorrect token
----RepositoryNotFoundError--->
usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID]
                            [--hf-token HF_TOKEN] [--meta-url META_URL]
                            [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS]
                            [--manifest-file MANIFEST_FILE]
llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub.

so update to:
 llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx
----RepositoryNotFoundError--->
usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID]
                            [--hf-token HF_TOKEN] [--meta-url META_URL]
                            [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS]
                            [--manifest-file MANIFEST_FILE]
llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub or incorrect Hugging Face token.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-25 21:38:10 -08:00
Reid
56c1a50b86
fix: fix the describe table display issue (#1221)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

If not passed the `headers`, it will display empty for the first row,
also might break the second row, make the `Model` row as `headers`.
```
Before:
$ llama model describe -m Llama3.1-70B
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                             ┃                                ┃ <<<---------
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Model             │ Llama3.1-70B         │   <<<---------
├─────────────────────────────┼────────────────────────────────┤
│ Hugging Face ID             │ meta-llama/Llama-3.1-70B       │
├─────────────────────────────┼────────────────────────────────┤
│ Description                 │ Llama 3.1 70b model            │
├─────────────────────────────┼────────────────────────────────┤
......

after:
$ llama model describe -m Llama3.1-70B
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model                       ┃ Llama3.1-70B                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Hugging Face ID             │ meta-llama/Llama-3.1-70B       │
├─────────────────────────────┼────────────────────────────────┤
│ Description                 │ Llama 3.1 70b model            │
├─────────────────────────────┼────────────────────────────────┤
......
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-25 21:34:53 -08:00
Sébastien Han
929c5f0842
refactor(server): replace print statements with logger (#1250)
# What does this PR do?

- Introduced logging in `StackRun` to replace print-based messages
- Improved error handling for config file loading and parsing
- Replaced `cprint` with `logger.error` for consistent error messaging
- Ensured logging is used in `server.py` for startup, shutdown, and
runtime messages
- Added missing exception handling for invalid providers

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-25 21:31:37 -08:00
Sébastien Han
c4987bc349
fix: avoid failure when no special pip deps and better exit (#1228)
# What does this PR do?

When building providers in a virtual environment or containers, special
pip dependencies may not always be provided (e.g., for Ollama). The
check should only fail if the required number of arguments is missing.
Currently, two arguments are mandatory:

1. Environment name
2. Pip dependencies

Additionally, return statements were replaced with sys.exit(1) in error
conditions to ensure immediate termination on critical failures. Error
handling in the stack build process was also improved to guarantee the
program exits with status 1 when facing configuration issues or build
failures.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

This command shouldn't fail:

```
llama stack build --template ollama --image-type venv
```

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-24 13:18:52 -05:00
Charlie Doern
34e3faa4e8
feat: add --run to llama stack build (#1156)
# What does this PR do?

--run runs the stack that was just build using the same arguments during
the build process (image-name, type, etc)

This simplifies the workflow a lot and makes the UX better for most
local users trying to get started rather than having to match the flags
of the two commands (build and then run)

Also, moved `ImageType` to distribution.utils since there were circular
import errors with its old location

## Test Plan

tested locally using the following command: 

`llama stack build --run --template ollama --image-type venv`

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-23 22:06:09 -05:00
Ashwin Bharambe
6227e1e3b9
fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier (#1225)
Make sure venv behaves like conda (no prefix is added to image_name) and
`--image-type venv` inside a notebook "just works" without any fiddling
2025-02-23 16:57:11 -08:00
Reid
187524d4ae
feat: add substring search for model list (#1099)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

`llama model list` or `llama model list --show-all` will list more or
all for the models, so add the `search` option to simplify the output.
```
$ llama model list --help
usage: llama model list [-h] [--show-all] [-s SEARCH]

Show available llama models

options:
  -h, --help            show this help message and exit
  --show-all            Show all models (not just defaults)
  -s SEARCH, --search SEARCH
                        Search for the input string as a substring in the model descriptor(ID)

$ llama model list -s 70b
+-----------------------+-----------------------------------+----------------+
| Model Descriptor(ID)  | Hugging Face Repo                 | Context Length |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B          | meta-llama/Llama-3.1-70B          | 128K           |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B-Instruct | meta-llama/Llama-3.1-70B-Instruct | 128K           |
+-----------------------+-----------------------------------+----------------+
| Llama3.3-70B-Instruct | meta-llama/Llama-3.3-70B-Instruct | 128K           |
+-----------------------+-----------------------------------+----------------+

$ llama model list -s 3.1-8b
+----------------------+----------------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo                | Context Length |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B          | meta-llama/Llama-3.1-8B          | 128K           |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B-Instruct | meta-llama/Llama-3.1-8B-Instruct | 128K           |
+----------------------+----------------------------------+----------------+

$ llama model list --show-all -s pro
+----------------------+-----------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo           | Context Length |
+----------------------+-----------------------------+----------------+
| Prompt-Guard-86M     | meta-llama/Prompt-Guard-86M | 2K             |
+----------------------+-----------------------------+----------------+

$ llama model list -s k
Not found for search.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 16:38:10 -08:00
Ashwin Bharambe
992f865b2e
chore: move embedding deps to RAG tool where they are needed (#1210)
`EMBEDDING_DEPS` were wrongly associated with `vector_io` providers.
They are needed by
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142
and related code and is used by the RAG tool and as such should only be
needed by the `inline::rag-runtime` provider.
2025-02-21 11:33:41 -08:00
Reid
9898589f12
fix: convert back to model descriptor for model in list --downloaded (#1201)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Currently , `model` in `--downloaded` just use the directory(already
replace `:`), so covert back to descriptor keep the same with ` llama
model list`, and remove command also use `descriptor`.
```
before:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model                               | Size     | Modified Time       |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct-int4-qlora-eo8 | 1.53 GB  | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+

after:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model                               | Size     | Modified Time       |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct:int4-qlora-eo8 | 1.53 GB  | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:10:34 -08:00
Reid
d2701b0d6a
chore: remove configure subcommand (#1202)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

When tried to use `configure`, and found it `DEPRECATED`, and found pr
https://github.com/meta-llama/llama-stack/pull/371 to remove it, not
sure why not remove the `configure.py`?
```
$ llama stack configure /tmp/test.yaml
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config
llama stack configure: error:
    DEPRECATED! llama stack configure has been deprecated.
    Please use llama stack run <path/to/run.yaml> instead.
    Please see example run.yaml in /distributions folder.
```

It would better better to tell when user check it how to use with
`--help` first:

```
before:
$ llama stack configure --help
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config

Configure a llama stack distribution

positional arguments:

after:
$ llama stack configure --help
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config

Configure a llama stack distribution

    DEPRECATED! llama stack configure has been deprecated.
    Please use llama stack run <path/to/run.yaml> instead.
    Please see example run.yaml in /distributions folder.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:06:25 -08:00
Reid
c9c4a3c921
feat: model remove cmd (#1128)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

add a subcommand, help to clean the unneeded model:
```
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

$ llama model remove --help
usage: llama model remove [-h] -m MODEL [-f]

Remove the downloaded llama model

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Specify the llama downloaded model name
  -f, --force           Used to forcefully remove the llama model from the storage without further confirmation

$ llama model remove -m Llama3.2-1B-Instruct:int4-qlora-eo8
Are you sure you want to remove Llama3.2-1B-Instruct:int4-qlora-eo8? (y/n): n
Removal aborted.

$ llama model remove -mLlama3.2-1B-Instruct:int4-qlora-eo8-f
Llama3.2-1B-Instruct:int4-qlora-eo8 removed.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:05:12 -08:00
Reid
af377e844d
feat: add a option to list the downloaded models (#1127)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
$ llama model list --help
usage: llama model list [-h] [--show-all] [--downloaded]

Show available llama models

options:
  -h, --help            show this help message and exit
  --show-all            Show all models (not just defaults)
  --downloaded          List the downloaded models

$ llama model list --downloaded
+-------------+----------+---------------------+
| Model       | Size     | Modified Time       |
+-------------+----------+---------------------+
| Llama3.2-1B | 2.31 GB  | 2025-02-16 13:38:04 |
+-------------+----------+---------------------+
| Llama3.1-8B | 14.97 GB | 2025-02-16 10:36:37 |
+-------------+----------+---------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-19 22:17:39 -08:00
Xi Yan
37cf60b732
style: remove prints in codebase (#1146)
# What does this PR do?
- replace prints in codebase with logger
- update print_table to use rich Table

## Test Plan
- library client script in
https://github.com/meta-llama/llama-stack/pull/1145

```
llama stack list-providers
```
<img width="1407" alt="image"
src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2"
/>


[//]: # (## Documentation)
2025-02-18 19:41:37 -08:00
Reid
4e76d312fa
fix: modify the model id title for model list (#1095)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Re-check and based on the doc, the download model id, actually is model
descriptor(also without `meta-llama/`).


https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html
```
$ llama download --source huggingface --model-id  Llama-Guard-3-1B:int4 --hf-token xxx  # model descriptor
Fetching 8 files:   0%|                                                                                                                   | 0/8 [00:00<?, ?it/s]
LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.71k/7.71k [00:00<00:00, 10.5MB/s]

$ llama download --source huggingface --model-id  Llama-Guard-3-1B-INT4 --hf-token xxxx  # hugging face repo without meta-llama/
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<---


$ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found

$ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8
Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): ^CTraceback (most recent call last):

$ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL]
                      [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:26:41 -08:00
Reid
d9f5beb15a
style: update download help text (#1135)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Based on the cade:

6b1773d530/llama_stack/cli/download.py (L454)
and the test, it can use comma to specify multiple model ids. So update
the usage.
```
$ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B
Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):

Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.5/2.5 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B

[Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B
Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  6.4/6.4 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B


$ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B
original%2Fparams.json: 100%|██████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 564kB/
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
...
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 9.09M/9.09M [00:00<00:00, 9.18MB/s]
Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B


before:
$ llama model download --help
 --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models

after:
$ llama model download --help
  --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:24:31 -08:00
Reid
92aefec191
style: update verify-download help text (#1134)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Based on the code
6b1773d530/llama_stack/cli/download.py (L379)
and test, `verify-download` should only use in `downloaded from Meta`.

```
test: no checklist.chk  file for hf download
$ llama model download --source meta --model-id Llama3.2-1B
Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.5/2.5 GB      -  0:00:00



before:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID

Verify the downloaded checkpoints' checksums

options:
  -h, --help           show this help message and exit
  --model-id MODEL_ID  Model ID to verify


after:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID

Verify the downloaded checkpoints' checksums for models downloaded from Meta

options:
  -h, --help           show this help message and exit
  --model-id MODEL_ID  Model ID to verify (only for models downloaded from Meta)
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:15:26 -08:00
Reid
89d37687dd
chore: remove --no-list-templates option (#1121)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

From the code and the usage, seems cannot see that need to use
`--no-list-templates` to handle, and also make the user confused from
the help text, so try to remove it.
```
$ llama stack build --no-list-templates
> Enter a name for your Llama Stack (e.g. my-local-stack):

$ llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack):

before:
$ llama stack build --help
  --list-templates, --no-list-templates
                        Show the available templates for building a Llama Stack distribution (default: False)

after:
  --list-templates      Show the available templates for building a Llama Stack distribution
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:13:46 -08:00
Reid
8dc1cac333
style: fix the capitalization issue (#1117)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
before:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
                       [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config

start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

After:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
                       [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config

Start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-14 17:16:26 -08:00
Reid
3d88b81ccf
fix: remove the empty line (#1097)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Remove the empty line from help
```
before:
$ llama model download --help
  --max-parallel MAX_PARALLEL
                        Maximum number of concurrent downloads
  --ignore-patterns IGNORE_PATTERNS
                      <<<<<<<<<empty line>>>>>>>>>>
                        For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
                        safetensors files to avoid downloading duplicate weights.

after:
$ llama model download --help
  --max-parallel MAX_PARALLEL
                        Maximum number of concurrent downloads
  --ignore-patterns IGNORE_PATTERNS
                        For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
                        safetensors files to avoid downloading duplicate weights.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-14 09:33:20 -08:00