llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Matthew Farrellee	8f9964f46b	fix: update llama stack build --run to use new start_stack.sh signature (#2191 ) # What does this PR do? fixes #2188 ## Test Plan `INFERENCE_MODEL=meta-llama/Llama-3.3-70B-Instruct llama stack build --image-name ollama --image-type conda --template ollama --run` without error	2025-05-16 14:32:02 -07:00
Charlie Doern	e46de23be6	feat: refactor external providers dir (#2049 ) # What does this PR do? currently the "default" dir for external providers is `/etc/llama-stack/providers.d` This dir is not used anywhere nor created. Switch to a more friendly `~/.llama/providers.d/` This allows external providers to actually create this dir and/or populate it upon installation, `pip` cannot create directories in `etc`. If a user does not specify a dir, default to this one see https://github.com/containers/ramalama-stack/issues/36 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-05-15 20:17:03 +02:00
Sébastien Han	6371bb1b33	chore(refact)!: simplify config management (#1105 ) # What does this PR do? We are dropping configuration via CLI flag almost entirely. If any server configuration has to be tweak it must be done through the server section in the run.yaml. This is unfortunately a breaking change for whover was using: * `--tls-` `--disable_ipv6` `--port` stays around and get a special treatment since we believe, it's common for user dev to change port for quick experimentations. Closes: https://github.com/meta-llama/llama-stack/issues/1076 ## Test Plan Simply do `llama stack run <config>` nothing should break :) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-07 09:18:12 -07:00
Sébastien Han	1a529705da	chore: more mypy fixes (#2029 ) # What does this PR do? Mainly tried to cover the entire llama_stack/apis directory, we only have one left. Some excludes were just noop. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-06 09:52:31 -07:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Nathan Weinberg	d897313e0b	feat: add additional logging to llama stack build (#1689 ) # What does this PR do? Partial revert of `fa68ded07c` this commit ensures users know where their new templates are generated and how to run the newly built distro locally discussion on Discord: `1351652390` ## Test Plan Did a local run - let me know if we want any unit testing covering this ![Screenshot from 2025-03-18 22-38-18](https://github.com/user-attachments/assets/6d5dac52-edad-4a84-992f-a3c23cda10c8) ## Documentation Updated "Zero to Hero" guide with new output --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-30 11:06:24 -07:00
Sébastien Han	2c7aba4158	fix: enforce stricter ASCII rules lint rules in Ruff (#2062 ) # What does this PR do? - Added new Ruff lint rules to detect ambiguous or non-ASCII characters: - Added per-file ignores where Unicode usage is still required. - Fixed whatever had to be fixed Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 18:05:27 +02:00
Ashwin Bharambe	4d0bfbf984	feat: add api.llama provider, llama-guard-4 model (#2058 ) This PR adds a llama-stack inference provider for `api.llama.com`, as well as adds entries for Llama-Guard-4 and updated Prompt-Guard models.	2025-04-29 10:07:41 -07:00
Roland Huß	121c73c2f5	feat(cli): add interactive tab completion for image type selection (#2027 ) # What does this PR do? Enhances the user experience in the `llama stack build` command by adding interactive TAB completion for image type selection. This ensures the UX consistency with other parts of the CLI that already support tab completion, such as provider selection, providing a more intuitive and discoverable interface for users. <img width="1531" alt="image" src="https://github.com/user-attachments/assets/12161d45-451d-4820-b34d-7ea4decf810f" />	2025-04-25 16:57:42 +02:00
Sébastien Han	14e60e3c02	feat: include run.yaml in the container image (#2005 ) As part of the build process, we now include the generated run.yaml (based of the provided build configuration file) into the container. We updated the entrypoint to use this run configuration as well. Given this simple distribution configuration: ``` # build.yaml version: '2' distribution_spec: description: Use (an external) Ollama server for running LLM inference providers: inference: - remote::ollama vector_io: - inline::faiss safety: - inline::llama-guard agents: - inline::meta-reference telemetry: - inline::meta-reference eval: - inline::meta-reference datasetio: - remote::huggingface - inline::localfs scoring: - inline::basic - inline::llm-as-judge - inline::braintrust tool_runtime: - remote::brave-search - remote::tavily-search - inline::code-interpreter - inline::rag-runtime - remote::model-context-protocol - remote::wolfram-alpha container_image: "registry.access.redhat.com/ubi9" image_type: container image_name: test ``` Build it: ``` llama stack build --config build.yaml ``` Run it: ``` podman run --rm \ -p 8321:8321 \ -e OLLAMA_URL=http://host.containers.internal:11434 \ --name llama-stack-server \ localhost/leseb-test:0.2.2 ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-24 11:29:53 +02:00
Sébastien Han	94f83382eb	feat: allow building distro with external providers (#1967 ) # What does this PR do? We can now build a distribution that includes external providers. Closes: https://github.com/meta-llama/llama-stack/issues/1948 ## Test Plan Build a distro with an external provider following the doc instructions. [//]: # (## Documentation) Added. Rendered: ![Screenshot 2025-04-18 at 11 26 39](https://github.com/user-attachments/assets/afcf3d50-8d30-48c3-8d24-06a4b3662881) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-18 17:18:28 +02:00
Alexey Rybak	8f57b08f2c	fix(build): always pass path when no template/config provided (#1982 ) # What does this PR do? Fixes a crash that occurred when building a stack as a container image via the interactive wizard without supplying --template or --config. - Root cause: template_or_config was None; only the container path relies on that parameter, which later reaches subprocess.run() and triggers `TypeError: expected str, bytes or os.PathLike object, not NoneType.` - Change: in `_run_stack_build_command_from_build_config` we now fall back to the freshly‑written build‑spec file whenever both optional sources are missing. Also adds a spy‑based unit test that asserts a valid string path is passed to build_image() for container builds. ### Closes #1976 ## Test Plan - New unit test: test_build_path.py. Monkey‑patches build_image, captures the fourth argument, and verifies it is a real path - Manual smoke test: ``` llama stack build --image-type container # answer wizard prompts ``` Build proceeds into Docker without raising the previous TypeError. ## Future Work Harmonise `build_image` arguments so every image type receives the same inputs, eliminating this asymmetric special‑case.	2025-04-17 10:20:43 +02:00
Sébastien Han	6ed92e03bc	fix: print traceback on build failure (#1966 ) # What does this PR do? Build failures are hard to read, sometimes we get errors like: ``` Error building stack: 'key' ``` Which are difficult to debug without a proper trace. ## Test Plan If `llama stack build` fails you get a traceback now. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-17 09:45:21 +02:00
Charlie Doern	83b5523e2d	feat: add `--providers` to llama stack build (#1718 ) # What does this PR do? allow users to specify only the providers they want in the llama stack build command. If a user wants a non-interactive build, but doesn't want to use a template, `--providers` allows someone to specify something like `--providers inference=remote::ollama` for a distro with JUST ollama ## Test Plan `llama stack build --providers inference=remote::ollama --image-type venv` <img width="1084" alt="Screenshot 2025-03-20 at 9 34 14 AM" src="https://github.com/user-attachments/assets/502b5fa2-edab-4267-a595-4f987204a6a9" /> `llama stack run --image-type venv /Users/charliedoern/projects/Documents/llama-stack/venv-run.yaml` <img width="1149" alt="Screenshot 2025-03-20 at 9 35 19 AM" src="https://github.com/user-attachments/assets/433765f3-6b7f-4383-9241-dad085b69228" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-04-15 14:17:03 +02:00
Nathan Weinberg	854c2ad264	fix: misleading help text for 'llama stack build' and 'llama stack run' (#1910 ) # What does this PR do? current text for 'llama stack build' and 'llama stack run' says that if no argument is passed to '--image-name' that the active Conda environment will be used in reality, the active enviroment is used whether it is from conda, virtualenv, etc. ## Test Plan N/A ## Documentation N/A Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-12 01:19:11 -07:00
Ashwin Bharambe	530d4bdfe1	refactor: move all llama code to models/llama out of meta reference (#1887 ) # What does this PR do? Move around bits. This makes the copies from llama-models _much_ easier to maintain and ensures we don't entangle meta-reference specific tidbits into llama-models code even by accident. Also, kills the meta-reference-quantized-gpu distro and rolls quantization deps into meta-reference-gpu. ## Test Plan ``` LLAMA_MODELS_DEBUG=1 \ with-proxy llama stack run meta-reference-gpu \ --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \ --env INFERENCE_CHECKPOINT_DIR=<DIR> \ --env MODEL_PARALLEL_SIZE=4 \ --env QUANTIZATION_TYPE=fp8_mixed ``` Start a server with and without quantization. Point integration tests to it using: ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-04-07 15:03:58 -07:00
Ashwin Bharambe	b8f1561956	feat: introduce llama4 support (#1877 ) As title says. Details in README, elsewhere.	2025-04-05 11:53:35 -07:00
Matthew Farrellee	a4c086cee0	fix: skip apis with no providers during `llama stack build` (#1835 ) # What does this PR do? closes #1834 ## Test Plan `llama stack build` successfully	2025-03-29 08:39:35 -07:00
Ihar Hrachyshka	18bac27d4e	fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555 ) # What does this PR do? This is the second attempt to switch to system packages by default. Now with a hack to detect conda environment - in which case conda image-type is used. Note: Conda will only be used when --image-name is unset and CONDA_DEFAULT_ENV is set. This means that users without conda will correctly fall back to using system packages when no --image-* arguments are passed at all. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Uses virtualenv: ``` $ llama stack build --template ollama --image-type venv $ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml [...] Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local [...] ``` Uses system packages (virtualenv already initialized): ``` $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] INFO 2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages. [...] ``` Attempt to run from environment packages without necessary packages installed: ``` $ python -m venv barebones $ . ./barebones/bin/activate $ pip install -e . # to install llama command $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] ModuleNotFoundError: No module named 'fastapi' ``` ^ failed as expected because the environment doesn't have necessary packages installed. Now install some packages in the new environment: ``` $ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` Now see if setting CONDA_DEFAULT_ENV will change what happens by default: ``` $ export CONDA_DEFAULT_ENV=base $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Using conda environment: base Conda environment base does not exist. [...] ``` --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-27 17:13:22 -04:00
Hardik Shah	cb2a9784ab	fix: multiple issues with getting_started notebook (#1795 ) Fixes multiple issues 1. llama stack build of dependencies was breaking with incompatible numpy / pandas when importing datasets Moved the notebook to start a local server instead of using library as a client. This way the setup is cleaner since its all contained and by using `uv run --with` we can test both the server setup process too in CI and release time. 2. The change to [1] surfaced some other issues - running `llama stack run` was defaulting to conda env name - provider data was not being managed properly - Some notebook cells (telemetry for evals) were not updated with latest changes Fixed all the issues and update the notebook. ### Test 1. Manually run it all in local env 2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`	2025-03-26 10:59:12 -07:00
Ashwin Bharambe	cb7b9dda6c	fix: compare timezones correctly in download script	2025-03-21 11:46:57 -07:00
Sébastien Han	24fd06879e	refactor: simplify command execution and remove PTY handling (#1641 ) # What does this PR do? A PTY is unnecessary for interactive mode since `subprocess.run()` already inherits the calling terminal’s stdin, stdout, and stderr, allowing natural interaction. Using a PTY can introduce unwanted side effects like buffering issues and inconsistent signal handling. Standard input/output is sufficient for most interactive programs. This commit simplifies the command execution by: 1. Removing PTY-based execution in favor of direct subprocess handling 2. Consolidating command execution into a single run_command function 3. Improving error handling with specific subprocess error types 4. Adding proper type hints and documentation 5. Maintaining Ctrl+C handling for graceful interruption ## Test Plan ``` llama stack run ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-17 15:03:14 -07:00
Yuan Tang	ca0cbf4338	fix: Fix pre-commit check (#1628 ) # What does this PR do? Fixes pre-commit check failure after merging https://github.com/meta-llama/llama-stack/pull/1010: `3874877097` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-13 18:57:42 -07:00
Alina Ryan	c02464b635	fix: Clarify `llama model prompt-format` help text (#1010 ) # What does this PR do? Updates the help text for the `llama model prompt-format` command to clarify that users should provide a specific model name (e.g., Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the default value and field for `--model-name` to prevent users from mistakenly thinking a model family name is acceptable. Adds guidance to run `llama model list` to view valid model names. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Output of `llama model prompt-format -h` Before: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1 usage: llama model prompt-format [-h] [-m MODEL_NAME] llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct ``` Output of `llama model prompt-format -h` After: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Example: Llama3.1-8B or Llama3.2-11B-Vision, etc (Run `llama model list` to see a list of valid model names) Example: llama model prompt-format <options> ``` Signed-off-by: Alina Ryan <aliryan@redhat.com>	2025-03-13 20:47:09 -04:00
Sébastien Han	98b1b15e0f	refactor: move all datetime.now() calls to UTC (#1589 ) # What does this PR do? Updated all instances of datetime.now() to use timezone.utc for consistency in handling time across different systems. This ensures that timestamps are always in Coordinated Universal Time (UTC), avoiding issues with time zone discrepancies and promoting uniformity in time-related data. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-13 15:34:53 -07:00
Ashwin Bharambe	e13c92f269	revert: feat(server): Use system packages for execution (#1551 ) Reverts meta-llama/llama-stack#1252 The above PR breaks the following invocation: ```bash llama stack run ~/.llama/distributions/together/together-run.yaml ```	2025-03-11 09:58:25 -07:00
Sébastien Han	21e39633d8	feat(server): Use system packages for execution (#1252 ) # What does this PR do? Users prefer to rely on the main CLI rather than invoking the server through a Python module. Users interact with a high-level CLI rather than needing to know internal module structures. Now, when running llama stack run <path-to-config>, the server will attempt to use the system package or a virtual environment if one is active. This also eliminates the current process dependency chain when running from a virtual environment: -> llama stack run        -> start_env.sh              -> python -m server... Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run: ``` ollama run llama3.2:3b-instruct-fp16 --keepalive=2m & llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6 ``` Notice that the server starts and shutdowns normally. [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-10 16:01:03 -07:00
James Kunstle	735892cbd2	refactor: `ImageType` to `LlamaStackImageType` (#1500 ) This disambiguates "Image" term from "container image" alternative usage and allows for: ```python if image_type == LlamaStackImagetype.venv: ... ``` accesses rather than `ImageType.venv.value` # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Changes enum use to comply with semantic python styling and naming conventions. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Refactor was automated and small so simple run-through of creating images was done. Signed-off-by: James Kunstle <jkunstle@redhat.com>	2025-03-10 17:12:53 -04:00
ehhuang	256448c14e	fix(cli): llama model prompt-format (#1481 ) Summary: + llama model prompt-format -m Llama3.2-11B-Vision-Instruct Traceback (most recent call last): File "/tmp/tmp.gCwyyCcjoA/.venv/bin/llama", line 10, in <module> sys.exit(main()) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 50, in main parser.run(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 44, in run args.func(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/model/prompt_format.py", line 59, in _run_model_template_cmd if args.list: AttributeError: 'Namespace' object has no attribute 'list' Test Plan: llama model prompt-format -m Llama3.2-11B-Vision-Instruct	2025-03-07 11:45:54 -08:00
Sébastien Han	7cf1e24c4e	feat(logging): implement category-based logging (#1362 ) # What does this PR do? This commit introduces a new logging system that allows loggers to be assigned a category while retaining the logger name based on the file name. The log format includes both the logger name and the category, producing output like: ``` INFO 2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by tavily-search ``` Key features include: - Category-based logging: Loggers can be assigned a category (e.g., "core", "server") when programming. The logger can be loaded like this: `logger = get_logger(name=__name__, category="server")` - Environment variable control: Log levels can be configured per-category using the `LLAMA_STACK_LOGGING` environment variable. For example: `LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for the "server" and "core" categories. - `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all categories and third-party libraries. This provides fine-grained control over logging levels while maintaining a clean and informative log format. The formatter uses the rich library which provides nice colors better stack traces like so: ``` ERROR 2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146> exception=UnboundLocalError("local variable 'loop' referenced before assignment")> ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮ │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown │ │ │ │ 175 │ │ except asyncio.CancelledError: │ │ 176 │ │ │ pass │ │ 177 │ │ finally: │ │ ❱ 178 │ │ │ loop.stop() │ │ 179 │ │ │ 180 │ loop = asyncio.get_running_loop() │ │ 181 │ loop.create_task(shutdown()) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ UnboundLocalError: local variable 'loop' referenced before assignment ``` Co-authored-by: Ashwin Bharambe <@ashwinb> Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration: INFO 2025-03-03 21:55:35,928 __main__:380 [server]: apis: - agents ``` [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-07 11:34:30 -08:00
Charlie Doern	1097912054	refactor: display defaults in help text (#1480 ) # What does this PR do? using `formatter_class=argparse.ArgumentDefaultsHelpFormatter` displays (default: DEFAULT_VALUE) for each flag. add this formatter class to build and run to show users some default values like `conda`, `8321`, etc ## Test Plan ran locally with following output: before: ``` llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. positional arguments: config Path to config file to use for the run options: -h, --help show this help message and exit --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321 --image-name IMAGE_NAME Name of the image to run. Defaults to the current conda environment --disable-ipv6 Disable IPv6 support --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. --tls-keyfile TLS_KEYFILE Path to TLS key file for HTTPS --tls-certfile TLS_CERTFILE Path to TLS certificate file for HTTPS --image-type {conda,container,venv} Image Type used during the build. This can be either conda or container or venv. ``` after: ``` llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. positional arguments: config Path to config file to use for the run options: -h, --help show this help message and exit --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321) --image-name IMAGE_NAME Name of the image to run. Defaults to the current conda environment (default: None) --disable-ipv6 Disable IPv6 support (default: False) --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: []) --tls-keyfile TLS_KEYFILE Path to TLS key file for HTTPS (default: None) --tls-certfile TLS_CERTFILE Path to TLS certificate file for HTTPS (default: None) --image-type {conda,container,venv} Image Type used during the build. This can be either conda or container or venv. (default: conda) ``` [//]: # (## Documentation) Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-07 11:05:58 -08:00
Sébastien Han	4bbb4ddeae	fix: resolve pydantic warning on .dict() usage (#1445 ) # What does this PR do? The method "dict" in class "BaseModel" is deprecated we should use model_dump instead. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-06 11:27:47 -08:00
Reid	a0d6b165b0	chore: remove unused build dir (#1379 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] - From old PR, it use `BUILDS_BASE_DIR` in `llama_stack/cli/stack/configure.py`(removed). https://github.com/meta-llama/llama-stack/pull/371/files - Based on the current `build` code, it should only use `DISTRIBS_BASE_DIR` to save it. `46b0a404e8/llama_stack/cli/stack/_build.py (L298)` `46b0a404e8/llama_stack/cli/stack/_build.py (L301)` Pls correct me if I am understand incorrectly. So it should no need to use in `run` now. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-05 15:40:00 -08:00
Ashwin Bharambe	dd0db8038b	refactor(test): unify vector_io tests and make them configurable (#1398 ) ## Test Plan `LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2 --inference-model='' --vision-inference-model=''` ``` test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED ``` Same thing with: - LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss - LLAMA_STACK_CONFIG=fireworks (Note that ergonomics will soon be improved re: cmd-line options and env variables)	2025-03-04 13:37:45 -08:00
Ashwin Bharambe	55668d3c5b	refactor: move a few tests to top-level tests/ directory	2025-03-03 17:33:39 -08:00
Reid	5c9d12a206	chore: improve --port help text (#1346 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It would be better to tell user env var usage in help text. ``` before: $ llama stack run --help --port PORT Port to run the server on. Defaults to 8321 after $ llama stack run --help --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321 ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-03 16:49:03 -08:00
Ashwin Bharambe	46b0a404e8	chore: remove straggler references to llama-models (#1345 ) Straggler references cleanup	2025-03-01 14:26:03 -08:00
Reid	dc069025f5	chore: fix typo (#1343 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `21ec67356c/distributions` It should missed the `s`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 10:36:04 -08:00
Sébastien Han	6fa257b475	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 ) - Added new ignores from flake8-bugbear (`B007`, `B008`) - Ignored `C901` (high function complexity) for now, pending review - Maintained PyTorch conventions (`N812`, `N817`) - Allowed `E731` (lambda assignments) for flexibility - Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`) - Documented rationale for each ignored rule This keeps our linting aligned with project needs while tracking potential fixes. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 09:36:49 -08:00
Reid	3b57d8ee88	feat: add prompt-format list (#1222 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)` Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1, 3.2 are not all models can show it with `prompt-format`, so cannot refer to `llama model list`, only refer to list when enter a invalid model, so it would be nice to help to check the valid models: ``` llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8 usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct before: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> after: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) -l, --list List the valid supported models Example: llama model prompt-format <options> $ llama model prompt-format -l ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Llama3.1-8B │ ├──────────────────────────────┤ │ Llama3.1-70B │ ├──────────────────────────────┤ │ Llama3.1-405B │ ├──────────────────────────────┤ │ Llama3.1-8B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-70B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-405B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-1B │ ├──────────────────────────────┤ │ Llama3.2-3B │ ├──────────────────────────────┤ │ Llama3.2-1B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-3B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision-Instruct │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision-Instruct │ └──────────────────────────────┘ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 09:27:22 -08:00
Yuan Tang	a9f5c5bfca	fix: Incorrect import path for print_subcommand_description() (#1315 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-27 18:50:41 -08:00
Yuan Tang	f4df3a76d9	fix: Incorrect import path for print_subcommand_description() (#1314 ) # What does this PR do? Missed this one additional import in https://github.com/meta-llama/llama-stack/pull/1313 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-27 18:35:49 -08:00
Yuan Tang	3567274183	fix: Incorrect import path for print_subcommand_description() (#1313 ) # What does this PR do? This fixes release build failure: `3796356500` ``` + llama model prompt-format -m Llama3.2-11B-Vision-Instruct Traceback (most recent call last): File "/tmp/tmp.PXMDlmD0x5/.venv/bin/llama", line 4, in <module> from llama_stack.cli.llama import main File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 10, in <module> from .model import ModelParser File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/__init__.py", line 7, in <module> from .model import ModelParser # noqa File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/model.py", line 16, in <module> from llama_stack.cli.utils import print_subcommand_description ModuleNotFoundError: No module named 'llama_stack.cli.utils' ``` ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-27 21:24:01 -05:00
Reid	94e2186bb8	chore: add subcommands description in help (#1219 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` before: $ llama usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download,remove} $ llama stack --help usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ... Operations for the Llama Stack / Distributions options: -h, --help show this help message and exit --version show program's version number and exit stack_subcommands: {build,list-apis,list-providers,run} =================== after: $ llama usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} model Work with llama models stack Operations for the Llama Stack / Distributions download Download a model from llama.meta.com or Hugging Face Hub verify-download Verify integrity of downloaded model files $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download,remove} download Download a model from llama.meta.com or Hugging Face Hub list Show available llama models prompt-format Show llama model message formats describe Show details about a llama model verify-download Verify the downloaded checkpoints' checksums for models downloaded from Meta remove Remove the downloaded llama model $ llama stack --help usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ... Operations for the Llama Stack / Distributions options: -h, --help show this help message and exit --version show program's version number and exit stack_subcommands: {build,list-apis,list-providers,run} build Build a Llama stack container list-apis List APIs part of the Llama Stack implementation list-providers Show available Llama Stack Providers for an API run Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-27 17:00:27 -08:00
Ashwin Bharambe	c54164556a	fix: update notebooks to avoid using the nutsy --image-name __system__ thing (#1308 ) The `--image-name __system__` thing was a hack and a bad one at that. The actual intent was to somehow automatically detect the notebook environment so we could avoid unnecessarily confusing things in the llama stack build cmd-line. But I failed which led us to use the backup `__system__` thing. Let's just do the simple thing. Note that `build_venv.sh` I haven't changed for now (so it still honors the __system__ special name just that no new user should use it.) ## Test Plan Open the notebooks from this branch in Colab (see example url below) and ensure the builds work. https://colab.research.google.com/github/meta-llama/llama-stack/blob/foo/docs/getting_started.ipynb In the notebook, install llama-stack from this branch directly using: ``` !pip install -U https://github.com/meta-llama/llama-stack/archive/refs/heads/foo.zip ``` Verify that `!UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv` afterwards succeeds and the library client initialization also works.	2025-02-27 16:39:04 -08:00
Yuan Tang	2ed2c0bd26	fix(cli): Missing default for --image-type in stack run command (#1274 ) # What does this PR do? I think this got accidentally removed as part of https://github.com/meta-llama/llama-stack/pull/1250. cc @leseb ## Test Plan After the change, this arg is no longer required. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-26 12:23:44 -08:00
Reid	3a002f6cf1	chore: update download error message (#1217 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Actually, the incorrect token also will hit `RepositoryNotFoundError`, e.g. ``` $ llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx ### xx is incorrect token ----RepositoryNotFoundError---> usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub. so update to: llama model download --source huggingface --model-id Llama3.2-1B-Instruct:int4-qlora-eo8 --hf-token xx ----RepositoryNotFoundError---> usage: llama model download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama model download: error: Repository 'meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' not found on the Hugging Face Hub or incorrect Hugging Face token. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-25 21:38:10 -08:00
Reid	56c1a50b86	fix: fix the describe table display issue (#1221 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] If not passed the `headers`, it will display empty for the first row, also might break the second row, make the `Model` row as `headers`. ``` Before: $ llama model describe -m Llama3.1-70B ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ <<<--------- ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Model │ Llama3.1-70B │ <<<--------- ├─────────────────────────────┼────────────────────────────────┤ │ Hugging Face ID │ meta-llama/Llama-3.1-70B │ ├─────────────────────────────┼────────────────────────────────┤ │ Description │ Llama 3.1 70b model │ ├─────────────────────────────┼────────────────────────────────┤ ...... after: $ llama model describe -m Llama3.1-70B ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ Llama3.1-70B ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Hugging Face ID │ meta-llama/Llama-3.1-70B │ ├─────────────────────────────┼────────────────────────────────┤ │ Description │ Llama 3.1 70b model │ ├─────────────────────────────┼────────────────────────────────┤ ...... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-25 21:34:53 -08:00
Sébastien Han	929c5f0842	refactor(server): replace print statements with logger (#1250 ) # What does this PR do? - Introduced logging in `StackRun` to replace print-based messages - Improved error handling for config file loading and parsing - Replaced `cprint` with `logger.error` for consistent error messaging - Ensured logging is used in `server.py` for startup, shutdown, and runtime messages - Added missing exception handling for invalid providers Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-25 21:31:37 -08:00
Sébastien Han	c4987bc349	fix: avoid failure when no special pip deps and better exit (#1228 ) # What does this PR do? When building providers in a virtual environment or containers, special pip dependencies may not always be provided (e.g., for Ollama). The check should only fail if the required number of arguments is missing. Currently, two arguments are mandatory: 1. Environment name 2. Pip dependencies Additionally, return statements were replaced with sys.exit(1) in error conditions to ensure immediate termination on critical failures. Error handling in the stack build process was also improved to guarantee the program exits with status 1 when facing configuration issues or build failures. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan This command shouldn't fail: ``` llama stack build --template ollama --image-type venv ``` [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-24 13:18:52 -05:00

1 2 3 4

162 commits