llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Hardik Shah	cb2a9784ab	fix: multiple issues with getting_started notebook (#1795 ) Fixes multiple issues 1. llama stack build of dependencies was breaking with incompatible numpy / pandas when importing datasets Moved the notebook to start a local server instead of using library as a client. This way the setup is cleaner since its all contained and by using `uv run --with` we can test both the server setup process too in CI and release time. 2. The change to [1] surfaced some other issues - running `llama stack run` was defaulting to conda env name - provider data was not being managed properly - Some notebook cells (telemetry for evals) were not updated with latest changes Fixed all the issues and update the notebook. ### Test 1. Manually run it all in local env 2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`	2025-03-26 10:59:12 -07:00
Sébastien Han	24fd06879e	refactor: simplify command execution and remove PTY handling (#1641 ) # What does this PR do? A PTY is unnecessary for interactive mode since `subprocess.run()` already inherits the calling terminal’s stdin, stdout, and stderr, allowing natural interaction. Using a PTY can introduce unwanted side effects like buffering issues and inconsistent signal handling. Standard input/output is sufficient for most interactive programs. This commit simplifies the command execution by: 1. Removing PTY-based execution in favor of direct subprocess handling 2. Consolidating command execution into a single run_command function 3. Improving error handling with specific subprocess error types 4. Adding proper type hints and documentation 5. Maintaining Ctrl+C handling for graceful interruption ## Test Plan ``` llama stack run ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-17 15:03:14 -07:00
Ashwin Bharambe	e13c92f269	revert: feat(server): Use system packages for execution (#1551 ) Reverts meta-llama/llama-stack#1252 The above PR breaks the following invocation: ```bash llama stack run ~/.llama/distributions/together/together-run.yaml ```	2025-03-11 09:58:25 -07:00
Sébastien Han	21e39633d8	feat(server): Use system packages for execution (#1252 ) # What does this PR do? Users prefer to rely on the main CLI rather than invoking the server through a Python module. Users interact with a high-level CLI rather than needing to know internal module structures. Now, when running llama stack run <path-to-config>, the server will attempt to use the system package or a virtual environment if one is active. This also eliminates the current process dependency chain when running from a virtual environment: -> llama stack run        -> start_env.sh              -> python -m server... Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run: ``` ollama run llama3.2:3b-instruct-fp16 --keepalive=2m & llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6 ``` Notice that the server starts and shutdowns normally. [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-10 16:01:03 -07:00
James Kunstle	735892cbd2	refactor: `ImageType` to `LlamaStackImageType` (#1500 ) This disambiguates "Image" term from "container image" alternative usage and allows for: ```python if image_type == LlamaStackImagetype.venv: ... ``` accesses rather than `ImageType.venv.value` # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Changes enum use to comply with semantic python styling and naming conventions. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Refactor was automated and small so simple run-through of creating images was done. Signed-off-by: James Kunstle <jkunstle@redhat.com>	2025-03-10 17:12:53 -04:00
Sébastien Han	7cf1e24c4e	feat(logging): implement category-based logging (#1362 ) # What does this PR do? This commit introduces a new logging system that allows loggers to be assigned a category while retaining the logger name based on the file name. The log format includes both the logger name and the category, producing output like: ``` INFO 2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by tavily-search ``` Key features include: - Category-based logging: Loggers can be assigned a category (e.g., "core", "server") when programming. The logger can be loaded like this: `logger = get_logger(name=__name__, category="server")` - Environment variable control: Log levels can be configured per-category using the `LLAMA_STACK_LOGGING` environment variable. For example: `LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for the "server" and "core" categories. - `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all categories and third-party libraries. This provides fine-grained control over logging levels while maintaining a clean and informative log format. The formatter uses the rich library which provides nice colors better stack traces like so: ``` ERROR 2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146> exception=UnboundLocalError("local variable 'loop' referenced before assignment")> ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮ │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown │ │ │ │ 175 │ │ except asyncio.CancelledError: │ │ 176 │ │ │ pass │ │ 177 │ │ finally: │ │ ❱ 178 │ │ │ loop.stop() │ │ 179 │ │ │ 180 │ loop = asyncio.get_running_loop() │ │ 181 │ loop.create_task(shutdown()) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ UnboundLocalError: local variable 'loop' referenced before assignment ``` Co-authored-by: Ashwin Bharambe <@ashwinb> Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration: INFO 2025-03-03 21:55:35,928 __main__:380 [server]: apis: - agents ``` [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-07 11:34:30 -08:00
Charlie Doern	1097912054	refactor: display defaults in help text (#1480 ) # What does this PR do? using `formatter_class=argparse.ArgumentDefaultsHelpFormatter` displays (default: DEFAULT_VALUE) for each flag. add this formatter class to build and run to show users some default values like `conda`, `8321`, etc ## Test Plan ran locally with following output: before: ``` llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. positional arguments: config Path to config file to use for the run options: -h, --help show this help message and exit --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321 --image-name IMAGE_NAME Name of the image to run. Defaults to the current conda environment --disable-ipv6 Disable IPv6 support --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. --tls-keyfile TLS_KEYFILE Path to TLS key file for HTTPS --tls-certfile TLS_CERTFILE Path to TLS certificate file for HTTPS --image-type {conda,container,venv} Image Type used during the build. This can be either conda or container or venv. ``` after: ``` llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. positional arguments: config Path to config file to use for the run options: -h, --help show this help message and exit --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321) --image-name IMAGE_NAME Name of the image to run. Defaults to the current conda environment (default: None) --disable-ipv6 Disable IPv6 support (default: False) --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: []) --tls-keyfile TLS_KEYFILE Path to TLS key file for HTTPS (default: None) --tls-certfile TLS_CERTFILE Path to TLS certificate file for HTTPS (default: None) --image-type {conda,container,venv} Image Type used during the build. This can be either conda or container or venv. (default: conda) ``` [//]: # (## Documentation) Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-07 11:05:58 -08:00
Reid	a0d6b165b0	chore: remove unused build dir (#1379 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] - From old PR, it use `BUILDS_BASE_DIR` in `llama_stack/cli/stack/configure.py`(removed). https://github.com/meta-llama/llama-stack/pull/371/files - Based on the current `build` code, it should only use `DISTRIBS_BASE_DIR` to save it. `46b0a404e8/llama_stack/cli/stack/_build.py (L298)` `46b0a404e8/llama_stack/cli/stack/_build.py (L301)` Pls correct me if I am understand incorrectly. So it should no need to use in `run` now. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-05 15:40:00 -08:00
Ashwin Bharambe	dd0db8038b	refactor(test): unify vector_io tests and make them configurable (#1398 ) ## Test Plan `LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2 --inference-model='' --vision-inference-model=''` ``` test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED ``` Same thing with: - LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss - LLAMA_STACK_CONFIG=fireworks (Note that ergonomics will soon be improved re: cmd-line options and env variables)	2025-03-04 13:37:45 -08:00
Reid	5c9d12a206	chore: improve --port help text (#1346 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It would be better to tell user env var usage in help text. ``` before: $ llama stack run --help --port PORT Port to run the server on. Defaults to 8321 after $ llama stack run --help --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321 ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-03 16:49:03 -08:00
Reid	dc069025f5	chore: fix typo (#1343 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `21ec67356c/distributions` It should missed the `s`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 10:36:04 -08:00
Sébastien Han	6fa257b475	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 ) - Added new ignores from flake8-bugbear (`B007`, `B008`) - Ignored `C901` (high function complexity) for now, pending review - Maintained PyTorch conventions (`N812`, `N817`) - Allowed `E731` (lambda assignments) for flexibility - Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`) - Documented rationale for each ignored rule This keeps our linting aligned with project needs while tracking potential fixes. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 09:36:49 -08:00
Yuan Tang	f4df3a76d9	fix: Incorrect import path for print_subcommand_description() (#1314 ) # What does this PR do? Missed this one additional import in https://github.com/meta-llama/llama-stack/pull/1313 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-27 18:35:49 -08:00
Reid	94e2186bb8	chore: add subcommands description in help (#1219 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` before: $ llama usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download,remove} $ llama stack --help usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ... Operations for the Llama Stack / Distributions options: -h, --help show this help message and exit --version show program's version number and exit stack_subcommands: {build,list-apis,list-providers,run} =================== after: $ llama usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} model Work with llama models stack Operations for the Llama Stack / Distributions download Download a model from llama.meta.com or Hugging Face Hub verify-download Verify integrity of downloaded model files $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download,remove} download Download a model from llama.meta.com or Hugging Face Hub list Show available llama models prompt-format Show llama model message formats describe Show details about a llama model verify-download Verify the downloaded checkpoints' checksums for models downloaded from Meta remove Remove the downloaded llama model $ llama stack --help usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ... Operations for the Llama Stack / Distributions options: -h, --help show this help message and exit --version show program's version number and exit stack_subcommands: {build,list-apis,list-providers,run} build Build a Llama stack container list-apis List APIs part of the Llama Stack implementation list-providers Show available Llama Stack Providers for an API run Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-27 17:00:27 -08:00
Ashwin Bharambe	c54164556a	fix: update notebooks to avoid using the nutsy --image-name __system__ thing (#1308 ) The `--image-name __system__` thing was a hack and a bad one at that. The actual intent was to somehow automatically detect the notebook environment so we could avoid unnecessarily confusing things in the llama stack build cmd-line. But I failed which led us to use the backup `__system__` thing. Let's just do the simple thing. Note that `build_venv.sh` I haven't changed for now (so it still honors the __system__ special name just that no new user should use it.) ## Test Plan Open the notebooks from this branch in Colab (see example url below) and ensure the builds work. https://colab.research.google.com/github/meta-llama/llama-stack/blob/foo/docs/getting_started.ipynb In the notebook, install llama-stack from this branch directly using: ``` !pip install -U https://github.com/meta-llama/llama-stack/archive/refs/heads/foo.zip ``` Verify that `!UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv` afterwards succeeds and the library client initialization also works.	2025-02-27 16:39:04 -08:00
Yuan Tang	2ed2c0bd26	fix(cli): Missing default for --image-type in stack run command (#1274 ) # What does this PR do? I think this got accidentally removed as part of https://github.com/meta-llama/llama-stack/pull/1250. cc @leseb ## Test Plan After the change, this arg is no longer required. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-26 12:23:44 -08:00
Sébastien Han	929c5f0842	refactor(server): replace print statements with logger (#1250 ) # What does this PR do? - Introduced logging in `StackRun` to replace print-based messages - Improved error handling for config file loading and parsing - Replaced `cprint` with `logger.error` for consistent error messaging - Ensured logging is used in `server.py` for startup, shutdown, and runtime messages - Added missing exception handling for invalid providers Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-25 21:31:37 -08:00
Sébastien Han	c4987bc349	fix: avoid failure when no special pip deps and better exit (#1228 ) # What does this PR do? When building providers in a virtual environment or containers, special pip dependencies may not always be provided (e.g., for Ollama). The check should only fail if the required number of arguments is missing. Currently, two arguments are mandatory: 1. Environment name 2. Pip dependencies Additionally, return statements were replaced with sys.exit(1) in error conditions to ensure immediate termination on critical failures. Error handling in the stack build process was also improved to guarantee the program exits with status 1 when facing configuration issues or build failures. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan This command shouldn't fail: ``` llama stack build --template ollama --image-type venv ``` [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-24 13:18:52 -05:00
Charlie Doern	34e3faa4e8	feat: add --run to llama stack build (#1156 ) # What does this PR do? --run runs the stack that was just build using the same arguments during the build process (image-name, type, etc) This simplifies the workflow a lot and makes the UX better for most local users trying to get started rather than having to match the flags of the two commands (build and then run) Also, moved `ImageType` to distribution.utils since there were circular import errors with its old location ## Test Plan tested locally using the following command: `llama stack build --run --template ollama --image-type venv` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-23 22:06:09 -05:00
Ashwin Bharambe	6227e1e3b9	fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier (#1225 ) Make sure venv behaves like conda (no prefix is added to image_name) and `--image-type venv` inside a notebook "just works" without any fiddling	2025-02-23 16:57:11 -08:00
Ashwin Bharambe	992f865b2e	chore: move embedding deps to RAG tool where they are needed (#1210 ) `EMBEDDING_DEPS` were wrongly associated with `vector_io` providers. They are needed by https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142 and related code and is used by the RAG tool and as such should only be needed by the `inline::rag-runtime` provider.	2025-02-21 11:33:41 -08:00
Reid	d2701b0d6a	chore: remove configure subcommand (#1202 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] When tried to use `configure`, and found it `DEPRECATED`, and found pr https://github.com/meta-llama/llama-stack/pull/371 to remove it, not sure why not remove the `configure.py`? ``` $ llama stack configure /tmp/test.yaml usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config llama stack configure: error: DEPRECATED! llama stack configure has been deprecated. Please use llama stack run <path/to/run.yaml> instead. Please see example run.yaml in /distributions folder. ``` It would better better to tell when user check it how to use with `--help` first: ``` before: $ llama stack configure --help usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config Configure a llama stack distribution positional arguments: after: $ llama stack configure --help usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config Configure a llama stack distribution DEPRECATED! llama stack configure has been deprecated. Please use llama stack run <path/to/run.yaml> instead. Please see example run.yaml in /distributions folder. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-21 08:06:25 -08:00
Reid	89d37687dd	chore: remove --no-list-templates option (#1121 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] From the code and the usage, seems cannot see that need to use `--no-list-templates` to handle, and also make the user confused from the help text, so try to remove it. ``` $ llama stack build --no-list-templates > Enter a name for your Llama Stack (e.g. my-local-stack): $ llama stack build > Enter a name for your Llama Stack (e.g. my-local-stack): before: $ llama stack build --help --list-templates, --no-list-templates Show the available templates for building a Llama Stack distribution (default: False) after: --list-templates Show the available templates for building a Llama Stack distribution ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:13:46 -08:00
Reid	8dc1cac333	style: fix the capitalization issue (#1117 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` before: $ llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. After: $ llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-14 17:16:26 -08:00
Sébastien Han	369cc513cb	fix: improve stack build on venv (#980 ) # What does this PR do? Added a pre_run_checks function to ensure a smooth environment setup by verifying prerequisites. It checks for an existing virtual environment, ensures uv is installed, and deactivates any active environment if necessary. Run the full build inside a venv created by 'uv'. Improved string handling in printf statements and added shellcheck suppressions for expected word splitting in pip commands. These enhancements improve robustness, prevent conflicts, and ensure a seamless setup process. Signed-off-by: Sébastien Han <seb@redhat.com> - [ ] Addresses issue (#issue) ## Test Plan Run the following command on either Linux or MacOS: ``` llama stack build --template ollama --image-type venv --image-name foo + build_name=foo + env_name=llamastack-foo + pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' + RED='\033[0;31m' + NC='\033[0m' + ENVNAME= +++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh + SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution + source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh + pre_run_checks llamastack-foo + local env_name=llamastack-foo + is_command_available uv + command -v uv + '[' -d llamastack-foo ']' + run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' + local env_name=llamastack-foo + local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' + local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' + echo 'Creating new virtual environment llamastack-foo' Creating new virtual environment llamastack-foo + uv venv llamastack-foo Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13 Creating virtual environment at: llamastack-foo Activate with: source llamastack-foo/bin/activate + source llamastack-foo/bin/activate ++ '[' -n x ']' ++ SCRIPT_PATH=llamastack-foo/bin/activate ++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']' ++ deactivate nondestructive ++ unset -f pydoc ++ '[' -z '' ']' ++ '[' -z '' ']' ++ hash -r ++ '[' -z '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo ++ '[' darwin24 = cygwin ']' ++ '[' darwin24 = msys ']' ++ export VIRTUAL_ENV ++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand' ++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand' ++ export PATH ++ '[' x '!=' x ']' +++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo ++ VIRTUAL_ENV_PROMPT='(llamastack-foo) ' ++ export VIRTUAL_ENV_PROMPT ++ '[' -z '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(llamastack-foo) ' ++ export PS1 ++ alias pydoc ++ true ++ hash -r + '[' -n '' ']' + '[' -n '' ']' + uv pip install --no-cache-dir llama-stack Using Python 3.13.1 environment at: llamastack-foo Resolved 50 packages in 1.25s Built fire==0.7.0 Prepared 50 packages in 1.22s Installed 50 packages in 126ms + annotated-types==0.7.0 + anyio==4.8.0 + blobfile==3.0.0 + certifi==2025.1.31 + charset-normalizer==3.4.1 + click==8.1.8 + distro==1.9.0 + filelock==3.17.0 + fire==0.7.0 + fsspec==2025.2.0 + h11==0.14.0 + httpcore==1.0.7 + httpx==0.28.1 + huggingface-hub==0.28.1 + idna==3.10 + jinja2==3.1.5 + llama-models==0.1.2 + llama-stack==0.1.2 + llama-stack-client==0.1.2 + lxml==5.3.1 + markdown-it-py==3.0.0 + markupsafe==3.0.2 + mdurl==0.1.2 + numpy==2.2.2 + packaging==24.2 + pandas==2.2.3 + pillow==11.1.0 + prompt-toolkit==3.0.50 + pyaml==25.1.0 + pycryptodomex==3.21.0 + pydantic==2.10.6 + pydantic-core==2.27.2 + pygments==2.19.1 + python-dateutil==2.9.0.post0 + python-dotenv==1.0.1 + pytz==2025.1 + pyyaml==6.0.2 + regex==2024.11.6 + requests==2.32.3 + rich==13.9.4 + setuptools==75.8.0 + six==1.17.0 + sniffio==1.3.1 + termcolor==2.5.0 + tiktoken==0.8.0 + tqdm==4.67.1 + typing-extensions==4.12.2 + tzdata==2025.1 + urllib3==2.3.0 + wcwidth==0.2.13 + '[' -n '' ']' + printf 'Installing pip dependencies\n' Installing pip dependencies + uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn Using Python 3.13.1 environment at: llamastack-foo Resolved 105 packages in 37ms Uninstalled 2 packages in 65ms Installed 72 packages in 195ms + aiohappyeyeballs==2.4.6 + aiohttp==3.11.12 + aiosignal==1.3.2 + aiosqlite==0.21.0 + attrs==25.1.0 + autoevals==0.0.119 + backoff==2.2.1 + braintrust-core==0.0.58 + chardet==5.2.0 + chevron==0.14.0 + chromadb-client==0.6.3 + contourpy==1.3.1 + cycler==0.12.1 + datasets==3.2.0 + deprecated==1.2.18 + dill==0.3.8 + faiss-cpu==1.10.0 + fastapi==0.115.8 + fonttools==4.56.0 + frozenlist==1.5.0 - fsspec==2025.2.0 + fsspec==2024.9.0 + googleapis-common-protos==1.66.0 + grpcio==1.70.0 + importlib-metadata==8.5.0 + jiter==0.8.2 + joblib==1.4.2 + jsonschema==4.23.0 + jsonschema-specifications==2024.10.1 + kiwisolver==1.4.8 + levenshtein==0.26.1 + matplotlib==3.10.0 + monotonic==1.6 + multidict==6.1.0 + multiprocess==0.70.16 + nltk==3.9.1 - numpy==2.2.2 + numpy==1.26.4 + ollama==0.4.7 + openai==1.61.1 + opentelemetry-api==1.30.0 + opentelemetry-exporter-otlp-proto-common==1.30.0 + opentelemetry-exporter-otlp-proto-grpc==1.30.0 + opentelemetry-exporter-otlp-proto-http==1.30.0 + opentelemetry-proto==1.30.0 + opentelemetry-sdk==1.30.0 + opentelemetry-semantic-conventions==0.51b0 + orjson==3.10.15 + overrides==7.7.0 + posthog==3.12.0 + propcache==0.2.1 + protobuf==5.29.3 + psycopg2-binary==2.9.10 + pyarrow==19.0.0 + pyparsing==3.2.1 + pypdf==5.3.0 + rapidfuzz==3.12.1 + redis==5.2.1 + referencing==0.36.2 + rpds-py==0.22.3 + safetensors==0.5.2 + scikit-learn==1.6.1 + scipy==1.15.1 + sentencepiece==0.2.0 + starlette==0.45.3 + tenacity==9.0.0 + threadpoolctl==3.5.0 + tokenizers==0.21.0 + transformers==4.48.3 + uvicorn==0.34.0 + wrapt==1.17.2 + xxhash==3.5.0 + yarl==1.18.3 + zipp==3.21.0 + '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' ']' + IFS='#' + read -ra parts + for part in '"${parts[@]}"' + echo 'sentence-transformers --no-deps' sentence-transformers --no-deps + uv pip install sentence-transformers --no-deps Using Python 3.13.1 environment at: llamastack-foo Resolved 1 package in 141ms Installed 1 package in 6ms + sentence-transformers==3.4.1 + for part in '"${parts[@]}"' + echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu' torch torchvision --index-url https://download.pytorch.org/whl/cpu + uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu Using Python 3.13.1 environment at: llamastack-foo Resolved 13 packages in 2.15s Installed 5 packages in 324ms + mpmath==1.3.0 + networkx==3.3 + sympy==1.13.1 + torch==2.6.0 + torchvision==0.21.0 Build Successful! ``` Run: ``` $ source llamastack-foo/bin/activate $ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001 Using config file: llama_stack/templates/ollama/run.yaml Run configuration: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io container_image: null datasets: [] eval_tasks: [] image_name: ollama metadata_store: db_path: /Users/leseb/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-3B-Instruct model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - llm provider_id: ollama provider_model_id: null - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: {} provider_id: huggingface provider_type: remote::huggingface - config: {} provider_id: localfs provider_type: inline::localfs eval: - config: {} provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '******' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: llama-stack sinks: console,sqlite sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: code-interpreter provider_type: inline::code-interpreter - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime vector_io: - config: kvstore: db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 tls_certfile: null tls_keyfile: null shields: [] tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag - args: null mcp_endpoint: null provider_id: code-interpreter toolgroup_id: builtin::code_interpreter vector_dbs: [] version: '2' Warning: `bwrap` is not available. Code interpreter tool will not work correctly. modules.json: 100%\|███████████████████████████████████████████████████████████\| 349/349 [00:00<00:00, 485kB/s] config_sentence_transformers.json: 100%\|██████████████████████████████████████\| 116/116 [00:00<00:00, 498kB/s] README.md: 100%\|█████████████████████████████████████████████████████████\| 10.7k/10.7k [00:00<00:00, 20.5MB/s] sentence_bert_config.json: 100%\|████████████████████████████████████████████\| 53.0/53.0 [00:00<00:00, 583kB/s] config.json: 100%\|███████████████████████████████████████████████████████████\| 612/612 [00:00<00:00, 4.63MB/s] model.safetensors: 100%\|█████████████████████████████████████████████████\| 90.9M/90.9M [00:02<00:00, 36.6MB/s] tokenizer_config.json: 100%\|█████████████████████████████████████████████████\| 350/350 [00:00<00:00, 4.27MB/s] vocab.txt: 100%\|███████████████████████████████████████████████████████████\| 232k/232k [00:00<00:00, 1.90MB/s] tokenizer.json: 100%\|██████████████████████████████████████████████████████\| 466k/466k [00:00<00:00, 2.23MB/s] special_tokens_map.json: 100%\|███████████████████████████████████████████████\| 112/112 [00:00<00:00, 1.47MB/s] 1_Pooling/config.json: 100%\|██████████████████████████████████████████████████\| 190/190 [00:00<00:00, 841kB/s] Serving API tool_groups GET /v1/tools/{tool_name} GET /v1/toolgroups/{toolgroup_id} GET /v1/toolgroups GET /v1/tools POST /v1/toolgroups DELETE /v1/toolgroups/{toolgroup_id} Serving API tool_runtime POST /v1/tool-runtime/invoke GET /v1/tool-runtime/list-tools POST /v1/tool-runtime/rag-tool/insert POST /v1/tool-runtime/rag-tool/query Serving API vector_io POST /v1/vector-io/insert POST /v1/vector-io/query Serving API telemetry GET /v1/telemetry/traces/{trace_id}/spans/{span_id} GET /v1/telemetry/spans/{span_id}/tree GET /v1/telemetry/traces/{trace_id} POST /v1/telemetry/events GET /v1/telemetry/spans GET /v1/telemetry/traces POST /v1/telemetry/spans/export Serving API models GET /v1/models/{model_id} GET /v1/models POST /v1/models DELETE /v1/models/{model_id} Serving API eval POST /v1/eval/tasks/{task_id}/evaluations DELETE /v1/eval/tasks/{task_id}/jobs/{job_id} GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result GET /v1/eval/tasks/{task_id}/jobs/{job_id} POST /v1/eval/tasks/{task_id}/jobs Serving API datasets GET /v1/datasets/{dataset_id} GET /v1/datasets POST /v1/datasets DELETE /v1/datasets/{dataset_id} Serving API scoring_functions GET /v1/scoring-functions/{scoring_fn_id} GET /v1/scoring-functions POST /v1/scoring-functions Serving API inspect GET /v1/health GET /v1/inspect/providers GET /v1/inspect/routes GET /v1/version Serving API scoring POST /v1/scoring/score POST /v1/scoring/score-batch Serving API shields GET /v1/shields/{identifier} GET /v1/shields POST /v1/shields Serving API vector_dbs GET /v1/vector-dbs/{vector_db_id} GET /v1/vector-dbs POST /v1/vector-dbs DELETE /v1/vector-dbs/{vector_db_id} Serving API eval_tasks GET /v1/eval-tasks/{eval_task_id} GET /v1/eval-tasks POST /v1/eval-tasks Serving API agents POST /v1/agents POST /v1/agents/{agent_id}/session POST /v1/agents/{agent_id}/session/{session_id}/turn DELETE /v1/agents/{agent_id} DELETE /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id} Serving API inference POST /v1/inference/chat-completion POST /v1/inference/completion POST /v1/inference/embeddings Serving API datasetio POST /v1/datasetio/rows GET /v1/datasetio/rows Serving API safety POST /v1/safety/run-shield Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [39145] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-14 09:22:03 -08:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Ihar Hrachyshka	cc700b2f68	feat: support listing all for `llama stack list-providers` (#1056 ) # What does this PR do? Support listing all for `llama stack list-providers`. For ease of reading, sort the output rows by type. Before the change. ```  llama stack list-providers usage: llama stack list-providers [-h] {inference,safety,agents,vector_io,datasetio,scoring,eval,post_training,tool_runtime,telemetry} llama stack list-providers: error: the following arguments are required: api ``` After the change. ``` +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| API Type \| Provider Type \| PIP Package Dependencies \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| agents \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| datasetio \| inline::localfs \| pandas \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| datasetio \| remote::huggingface \| datasets \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| eval \| inline::meta-reference \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::meta-reference \| accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- \| \| \| \| enforcer,sentence-transformers \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::meta-reference-quantized \| accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- \| \| \| \| enforcer,sentence-transformers,fbgemm-gpu,torchao==0.5.0 \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::sentence-transformers \| sentence-transformers \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| inline::vllm \| vllm \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::bedrock \| boto3 \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::cerebras \| cerebras_cloud_sdk \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::databricks \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::fireworks \| fireworks-ai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::groq \| groq \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::hf::endpoint \| huggingface_hub,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::hf::serverless \| huggingface_hub,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::nvidia \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::ollama \| ollama,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::runpod \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::sambanova \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::tgi \| huggingface_hub,aiohttp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::together \| together \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| inference \| remote::vllm \| openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| post_training \| inline::torchtune \| torch,torchtune==0.5.0,torchao==0.8.0,numpy \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::code-scanner \| codeshield \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::llama-guard \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::meta-reference \| transformers,torch --index-url https://download.pytorch.org/whl/cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| inline::prompt-guard \| transformers,torch --index-url https://download.pytorch.org/whl/cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| safety \| remote::bedrock \| boto3 \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| scoring \| inline::basic \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| scoring \| inline::braintrust \| autoevals,openai \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| scoring \| inline::llm-as-judge \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| telemetry \| inline::meta-reference \| opentelemetry-sdk,opentelemetry-exporter-otlp-proto-http \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| inline::code-interpreter \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| inline::rag-runtime \| \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::bing-search \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::brave-search \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::model-context-protocol \| mcp \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::tavily-search \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| tool_runtime \| remote::wolfram-alpha \| requests \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| inline::chromadb \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| inline::faiss \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| inline::meta-reference \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::chromadb \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb- \| \| \| \| client \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::pgvector \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no- \| \| \| \| deps,psycopg2-binary \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::qdrant \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,qdrant- \| \| \| \| client \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ \| vector_io \| remote::weaviate \| blobfile,chardet,pypdf,tqdm,numpy,scikit- \| \| \| \| learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url \| \| \| \| https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,weaviate- \| \| \| \| client \| +---------------+----------------------------------+----------------------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-12 22:03:28 -08:00
Charlie Doern	025f615868	feat: add support for running in a venv (#1018 ) # What does this PR do? add --image-type to `llama stack run`. Which takes conda, container or venv also add start_venv.sh which start the stack using a venv resolves #1007 ## Test Plan running locally: `llama stack build --template ollama --image-type venv` `llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml` ... ``` llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml Using run configuration: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml + python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321 Using config file: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml Run configuration: apis: - agents - datasetio ... ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-12 11:13:04 -05:00
Charlie Doern	5f88ff0b6a	fix: show proper help text (#1065 ) # What does this PR do? when executing a sub-command like `llama model` the improper help text, sub-commands, and flags are displayed. each command group needs to have `.set_defaults` to display this info properly before: ``` llama model usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} ``` after: ``` llama model usage: llama model [-h] {download,list,prompt-format,describe,verify-download} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download} ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-12 06:38:25 -08:00
Ihar Hrachyshka	24385cfd03	fix: filter out remote::sample providers when listing (#1057 ) # What does this PR do? Before: ```  llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ \| Provider Type \| PIP Package Dependencies \| +------------------------+-----------------------------------------------------------------------+ \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +------------------------+-----------------------------------------------------------------------+ \| remote::sample \| \| +------------------------+-----------------------------------------------------------------------+ ``` After: ```  llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ \| Provider Type \| PIP Package Dependencies \| +------------------------+-----------------------------------------------------------------------+ \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +------------------------+-----------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-11 16:12:46 -08:00
Ashwin Bharambe	f8f2f7f9bb	feat: Add HTTPS serving option (#1000 ) # What does this PR do? Enables HTTPS option for Llama Stack. While doing so, introduces a `ServerConfig` sub-structure to house all server related configuration (port, ssl, etc.) Also simplified the `start_container.sh` entrypoint to simply be `python` instead of a complex bash command line. ## Test Plan Conda: Run: ```bash $ llama stack build --template together $ llama stack run --port 8322 # ensure server starts $ llama-stack-client configure --endpoint http://localhost:8322 $ llama-stack-client models list ``` Create a self-signed SSL key / cert pair. Then, using a local checkout of `llama-stack-client-python`, change https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/_base_client.py#L759 to add `kwargs.setdefault("verify", False)` so SSL verification is disabled. Then: ```bash $ llama stack run --port 8322 --tls-keyfile <KEYFILE> --tls-certfile <CERTFILE> $ llama-stack-client configure --endpoint https://localhost:8322 # notice the `https` $ llama-stack-client models list ``` Also tested with containers (but of course one needs to make sure the cert and key files are appropriately provided to the container.)	2025-02-07 09:39:08 -08:00
Yuan Tang	3f9764d50c	fix: List providers command prints out non-existing APIs from registry. Fixes #966 (#969 ) Fixes #966. Verified that: 1. Correct list of APIs are printed out when running `llama stack list-providers` 2. `llama stack list-providers <api>` works as expected. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-07 09:02:15 -08:00
Charlie Doern	f5e4bf2edf	chore: remove unused argument (#987 ) # What does this PR do? very small fix I noticed some unused arguments, but this seems like the easiest one to remove since its passed in explicitly. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-06 10:05:35 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Ashwin Bharambe	5b1e69e58e	Use `uv pip install` instead of `pip install` (#921 ) ## What does this PR do? See issue: #747 -- `uv` is just plain better. This PR does the bare minimum of replacing `pip install` by `uv pip install` and ensuring `uv` exists in the environment. ## Test Plan First: create new conda, `uv pip install -e .` on `llama-stack` -- all is good. Next: run `llama stack build --template together` followed by `llama stack run together` -- all good Next: run `llama stack build --template together --image-name yoyo` followed by `llama stack run together --image-name yoyo` -- all good Next: fresh conda and `uv pip install -e .` and `llama stack build --template together --image-type venv` -- all good. Docker: `llama stack build --template together --image-type container` works!	2025-01-31 22:29:41 -08:00
Ashwin Bharambe	216cde5ee8	Add --print-deps-only for computing dependencies	2025-01-31 14:33:51 -08:00
Dmitry Rogozhkin	80f2032485	Fix running stack built with base conda environment (#903 ) Fixes: #902 For the test verified that llama stack can run if built: * With default "base" conda environment * With new custom conda environment using `--image-name XXX` option In both cases llama stack starts fine (was failing with "base") before this patch. CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-29 21:24:22 -08:00
Yuan Tang	53721e91ad	Fix validator of "container" image type (#901 ) This was missed in https://github.com/meta-llama/llama-stack/pull/802 somehow.	2025-01-29 09:36:52 -08:00
Ashwin Bharambe	aee6237685	Small refactor for run_with_pty	2025-01-28 09:32:33 -08:00
Vladislav Bronzov	8332ea23ad	Add run win command for stack (#890 ) # What does this PR do? Add win platform run command for stack - [x] Addresses issue (#issue) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. https://github.com/meta-llama/llama-stack/pull/889 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-28 08:04:28 -08:00
Ashwin Bharambe	891bf704eb	Ensure llama stack build --config <> --image-type <> works (#879 ) Fix the issues brought up in https://github.com/meta-llama/llama-stack/issues/870 Test all combinations of (conda, container) vs. (template, config) combos.	2025-01-25 11:13:36 -08:00
Yuan Tang	6da3053c0e	More generic image type for OCI-compliant container technologies (#802 ) It's a more generic term and applicable to alternatives of Docker, such as Podman or other OCI-compliant technologies. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-17 16:37:42 -08:00
Ashwin Bharambe	03ac84a829	Update default port from 5000 -> 8321	2025-01-16 15:26:48 -08:00
Ashwin Bharambe	cee3816609	Make llama stack build not create a new conda by default (#788 ) ## What does this PR do? So far `llama stack build` has always created a separate conda environment for packaging the dependencies of a distribution. The main reason to do so is isolation -- distributions are composed of providers which can have a variety of potentially conflicting dependencies. That said, this has created significant annoyance for new users since it is not at all transparent. The fact that `llama stack run` is actually running the code in some other conda is very surprising. This PR tries to make things better. - Both `llama stack build` and `llama stack run` now accept an `--image-name` argument which represents the (conda, docker, virtualenv) image you want to operate upon. - For the default (conda) mode, the script checks if a current conda environment exists. If one exists, it uses it. - If `--image-name` is provided, that option is used. In this case, an environment is created if needed. - There is no automatic `llamastack-` prefixing of the environment names done anymore. ## Test Plan Start in a conda environment, run `llama stack build --template fireworks`; verify that it successfully built into the current environment and stored the build file at `$CONDA_PREFIX/llamastack-build.yaml`. Run `llama stack run fireworks` which started correctly in the current environment. Ran the same build command outside of conda. It failed asking for `--image-name`. Ran it with `llama stack build --template fireworks --image-name foo`. This successfully created a conda environment called `foo` and installed deps. Ran `llama stack run fireworks` outside conda which failed. Activated a different conda, ran again, it failed saying it did not find the `llamastack-build.yaml` file. Then used `--image-name foo` option and it ran successfully.	2025-01-16 13:44:53 -08:00
Xi Yan	32d3abe964	[CICD] Github workflow for publishing Docker images (#764 ) # What does this PR do? - Add Github workflow for publishing docker images. - Manual Inputs - We can use a (1) TestPyPi version / (2) build via released PyPi version Notes - Keep this workflow manually triggered as we don't want to publish nightly docker images Additional Changes - Resolve issue with running llama stack build in non-terminal device ``` File "/home/runner/.local/lib/python3.12/site-packages/llama_stack/distribution/utils/exec.py", line 25, in run_with_pty old_settings = termios.tcgetattr(sys.stdin) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ termios.error: (25, 'Inappropriate ioctl for device') ``` - Modified build_container.sh to work in non-terminal environment ## Test Plan - Triggered workflow: `3562217878` <img width="1076" alt="image" src="https://github.com/user-attachments/assets/f1b5cef6-05ab-49c7-b405-53abc9264734" /> - Tested published docker image <img width="702" alt="image" src="https://github.com/user-attachments/assets/e7135189-65c8-45d8-86f9-9f3be70e380b" /> - /tools API endpoints are served so that docker is correctly using the TestPyPi package <img width="296" alt="image" src="https://github.com/user-attachments/assets/bbcaa7fe-c0a4-4d22-b600-90e3c254bbfd" /> - Published tagged images: https://hub.docker.com/repositories/llamastack <img width="947" alt="image" src="https://github.com/user-attachments/assets/2a0a0494-4d45-4643-bc29-72154ecc54a5" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-15 09:01:33 -08:00
Dinesh Yeduguru	a174938fbd	Fix telemetry to work on reinstantiating new lib cli (#761 ) # What does this PR do? Since we maintain global state in our telemetry pipeline, reinstantiating lib cli will cause us to add duplicate span processors causing sqlite to lock out because of constraint violations since we now have two span processor writing to sqlite. This PR changes the telemetry adapter for otel to only instantiate the provider once and add the span processsors only once. Also fixes an issue llama stack build ## Test Plan tested with notebook at https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d#scrollTo=9496f75c	2025-01-14 11:31:50 -08:00
Yuan Tang	9ec54dcbe7	Switch to use importlib instead of deprecated pkg_resources (#678 ) `pkg_resources` has been deprecated. This PR switches to use `importlib.resources`. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-13 20:20:02 -08:00
raghotham	ff182ff6de	rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars (#744 ) # What does this PR do? Rename environment var for consistency ## Test Plan No regressions ## Sources ## Before submitting - [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-10 11:09:49 -08:00
Yuan Tang	24fa1adc2f	Expose LLAMASTACK_PORT in cli.stack.run (#722 ) This was missed in https://github.com/meta-llama/llama-stack/pull/706. I tested `llama_stack.distribution.server.server` but didn't test `llama stack run`. cc @ashwinb Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-01-10 09:13:49 -08:00
Xi Yan	596afc6497	add --version to llama stack CLI & /version endpoint (#732 ) # What does this PR do? - add --version to llama stack CLI - add /version endpoint - run OpenAPI generator for the new endpoint ## Test Plan CLI <img width="184" alt="image" src="https://github.com/user-attachments/assets/3acb1d22-453e-4b79-baf6-e98e88d0671c" /> endpoint <img width="430" alt="image" src="https://github.com/user-attachments/assets/79cdd670-493b-40cf-8f9e-28a4ac0988ac" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-08 16:30:06 -08:00

1 2

99 commits