Commit graph

97 commits

Author SHA1 Message Date
Ashwin Bharambe
e13c92f269
revert: feat(server): Use system packages for execution (#1551)
Reverts meta-llama/llama-stack#1252

The above PR breaks the following invocation:
```bash
llama stack run ~/.llama/distributions/together/together-run.yaml
```
2025-03-11 09:58:25 -07:00
Sébastien Han
21e39633d8
feat(server): Use system packages for execution (#1252)
# What does this PR do?

Users prefer to rely on the main CLI rather than invoking the server
through a Python module. Users interact with a high-level CLI rather
than needing to know internal module structures.

Now, when running llama stack run <path-to-config>, the server will
attempt to use the system package or a virtual environment if one is
active.

This also eliminates the current process dependency chain when running
from a virtual environment:

-> llama stack run
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> start_env.sh

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-> python -m server...

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive=2m &
llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6
```

Notice that the server starts and shutdowns normally.

[//]: # (## Documentation)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-10 16:01:03 -07:00
James Kunstle
735892cbd2
refactor: ImageType to LlamaStackImageType (#1500)
This disambiguates "Image" term from "container image" alternative usage
and allows for:

```python

if image_type == LlamaStackImagetype.venv:
	...

```

accesses rather than `ImageType.venv.value`

# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Changes enum use to comply with semantic python styling and naming
conventions.

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

Refactor was automated and small so simple run-through of creating
images was done.

Signed-off-by: James Kunstle <jkunstle@redhat.com>
2025-03-10 17:12:53 -04:00
Sébastien Han
7cf1e24c4e
feat(logging): implement category-based logging (#1362)
# What does this PR do?

This commit introduces a new logging system that allows loggers to be
assigned
a category while retaining the logger name based on the file name. The
log
format includes both the logger name and the category, producing output
like:

```
INFO     2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by
         tavily-search
```

Key features include:

- Category-based logging: Loggers can be assigned a category (e.g.,
  "core", "server") when programming. The logger can be loaded like
  this: `logger = get_logger(name=__name__, category="server")`
- Environment variable control: Log levels can be configured
per-category using the
  `LLAMA_STACK_LOGGING` environment variable. For example:
`LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for
the "server"
    and "core" categories.
- `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all
categories and
    third-party libraries.

This provides fine-grained control over logging levels while maintaining
a clean and
informative log format.

The formatter uses the rich library which provides nice colors better
stack traces like so:

```
ERROR    2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown
         task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at
         /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146>
         exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
         ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
         │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown                │
         │                                                                                                                │
         │   175 │   │   except asyncio.CancelledError:                                                                   │
         │   176 │   │   │   pass                                                                                         │
         │   177 │   │   finally:                                                                                         │
         │ ❱ 178 │   │   │   loop.stop()                                                                                  │
         │   179 │                                                                                                        │
         │   180 │   loop = asyncio.get_running_loop()                                                                    │
         │   181 │   loop.create_task(shutdown())                                                                         │
         ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         UnboundLocalError: local variable 'loop' referenced before assignment
```

Co-authored-by: Ashwin Bharambe <@ashwinb>
Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
INFO     2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml           
INFO     2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration:                                                 
INFO     2025-03-03 21:55:35,928 __main__:380 [server]: apis:                                                              
         - agents                                                     
``` 
[//]: # (## Documentation)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-07 11:34:30 -08:00
Charlie Doern
1097912054
refactor: display defaults in help text (#1480)
# What does this PR do?

using `formatter_class=argparse.ArgumentDefaultsHelpFormatter` displays
(default: DEFAULT_VALUE) for each flag. add this formatter class to
build and run to show users some default values like `conda`, `8321`,
etc

## Test Plan

ran locally with following output: 

before: 
```
llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
                       [--image-type {conda,container,venv}]
                       config

Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

positional arguments:
  config                Path to config file to use for the run

options:
  -h, --help            show this help message and exit
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current conda environment
  --disable-ipv6        Disable IPv6 support
  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times.
  --tls-keyfile TLS_KEYFILE
                        Path to TLS key file for HTTPS
  --tls-certfile TLS_CERTFILE
                        Path to TLS certificate file for HTTPS
  --image-type {conda,container,venv}
                        Image Type used during the build. This can be either conda or container or venv.
```

after:
```
llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
                       [--image-type {conda,container,venv}]
                       config

Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

positional arguments:
  config                Path to config file to use for the run

options:
  -h, --help            show this help message and exit
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current conda environment (default: None)
  --disable-ipv6        Disable IPv6 support (default: False)
  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: [])
  --tls-keyfile TLS_KEYFILE
                        Path to TLS key file for HTTPS (default: None)
  --tls-certfile TLS_CERTFILE
                        Path to TLS certificate file for HTTPS (default: None)
  --image-type {conda,container,venv}
                        Image Type used during the build. This can be either conda or container or venv. (default: conda)
```

[//]: # (## Documentation)

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-07 11:05:58 -08:00
Reid
a0d6b165b0
chore: remove unused build dir (#1379)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

- From old PR, it use `BUILDS_BASE_DIR` in
`llama_stack/cli/stack/configure.py`(removed).
https://github.com/meta-llama/llama-stack/pull/371/files
- Based on the current `build` code, it should only use
`DISTRIBS_BASE_DIR` to save it.

46b0a404e8/llama_stack/cli/stack/_build.py (L298)

46b0a404e8/llama_stack/cli/stack/_build.py (L301)
Pls correct me if I am understand incorrectly.
So it should no need to use in `run` now.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-05 15:40:00 -08:00
Ashwin Bharambe
dd0db8038b
refactor(test): unify vector_io tests and make them configurable (#1398)
## Test Plan


`LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec
pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2
--inference-model='' --vision-inference-model=''`

```
test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED
test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED
test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED
```

Same thing with:
- LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss
- LLAMA_STACK_CONFIG=fireworks

(Note that ergonomics will soon be improved re: cmd-line options and env
variables)
2025-03-04 13:37:45 -08:00
Reid
5c9d12a206
chore: improve --port help text (#1346)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

It would be better to tell user env var usage in help text.
```
before:
$ llama stack run --help
  --port PORT           Port to run the server on. Defaults to 8321

after
$ llama stack run --help
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-03 16:49:03 -08:00
Reid
dc069025f5
chore: fix typo (#1343)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]


21ec67356c/distributions

It should missed the `s`.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-01 10:36:04 -08:00
Sébastien Han
6fa257b475
chore(lint): update Ruff ignores for project conventions and maintainability (#1184)
- Added new ignores from flake8-bugbear (`B007`, `B008`)
- Ignored `C901` (high function complexity) for now, pending review
- Maintained PyTorch conventions (`N812`, `N817`)
- Allowed `E731` (lambda assignments) for flexibility
- Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`)
- Documented rationale for each ignored rule

This keeps our linting aligned with project needs while tracking
potential fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 09:36:49 -08:00
Yuan Tang
f4df3a76d9
fix: Incorrect import path for print_subcommand_description() (#1314)
# What does this PR do?

Missed this one additional import in
https://github.com/meta-llama/llama-stack/pull/1313

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 18:35:49 -08:00
Reid
94e2186bb8
chore: add subcommands description in help (#1219)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
before:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...

Welcome to the Llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {model,stack,download,verify-download}

$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

model_subcommands:
  {download,list,prompt-format,describe,verify-download,remove}

$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...

Operations for the Llama Stack / Distributions

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

stack_subcommands:
  {build,list-apis,list-providers,run}

===================
after:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...

Welcome to the Llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {model,stack,download,verify-download}

  model                 Work with llama models
  stack                 Operations for the Llama Stack / Distributions
  download              Download a model from llama.meta.com or Hugging Face Hub
  verify-download       Verify integrity of downloaded model files

$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

model_subcommands:
  {download,list,prompt-format,describe,verify-download,remove}

  download              Download a model from llama.meta.com or Hugging Face Hub
  list                  Show available llama models
  prompt-format         Show llama model message formats
  describe              Show details about a llama model
  verify-download       Verify the downloaded checkpoints' checksums for models downloaded from Meta
  remove                Remove the downloaded llama model

$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...

Operations for the Llama Stack / Distributions

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

stack_subcommands:
  {build,list-apis,list-providers,run}

  build                 Build a Llama stack container
  list-apis             List APIs part of the Llama Stack implementation
  list-providers        Show available Llama Stack Providers for an API
  run                   Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-27 17:00:27 -08:00
Ashwin Bharambe
c54164556a
fix: update notebooks to avoid using the nutsy --image-name __system__ thing (#1308)
The `--image-name __system__` thing was a hack and a bad one at that.
The actual intent was to somehow automatically detect the notebook
environment so we could avoid unnecessarily confusing things in the
llama stack build cmd-line. But I failed which led us to use the backup
`__system__` thing.

Let's just do the simple thing.

Note that `build_venv.sh` I haven't changed for now (so it still honors
the __system__ special name just that no new user should use it.)

## Test Plan

Open the notebooks from this branch in Colab (see example url below) and
ensure the builds work.


https://colab.research.google.com/github/meta-llama/llama-stack/blob/foo/docs/getting_started.ipynb

In the notebook, install llama-stack from this branch directly using:

```
!pip install -U https://github.com/meta-llama/llama-stack/archive/refs/heads/foo.zip
```

Verify that `!UV_SYSTEM_PYTHON=1 llama stack build --template together
--image-type venv` afterwards succeeds and the library client
initialization also works.
2025-02-27 16:39:04 -08:00
Yuan Tang
2ed2c0bd26
fix(cli): Missing default for --image-type in stack run command (#1274)
# What does this PR do?

I think this got accidentally removed as part of
https://github.com/meta-llama/llama-stack/pull/1250. cc @leseb

## Test Plan

After the change, this arg is no longer required.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-26 12:23:44 -08:00
Sébastien Han
929c5f0842
refactor(server): replace print statements with logger (#1250)
# What does this PR do?

- Introduced logging in `StackRun` to replace print-based messages
- Improved error handling for config file loading and parsing
- Replaced `cprint` with `logger.error` for consistent error messaging
- Ensured logging is used in `server.py` for startup, shutdown, and
runtime messages
- Added missing exception handling for invalid providers

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-25 21:31:37 -08:00
Sébastien Han
c4987bc349
fix: avoid failure when no special pip deps and better exit (#1228)
# What does this PR do?

When building providers in a virtual environment or containers, special
pip dependencies may not always be provided (e.g., for Ollama). The
check should only fail if the required number of arguments is missing.
Currently, two arguments are mandatory:

1. Environment name
2. Pip dependencies

Additionally, return statements were replaced with sys.exit(1) in error
conditions to ensure immediate termination on critical failures. Error
handling in the stack build process was also improved to guarantee the
program exits with status 1 when facing configuration issues or build
failures.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

This command shouldn't fail:

```
llama stack build --template ollama --image-type venv
```

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-24 13:18:52 -05:00
Charlie Doern
34e3faa4e8
feat: add --run to llama stack build (#1156)
# What does this PR do?

--run runs the stack that was just build using the same arguments during
the build process (image-name, type, etc)

This simplifies the workflow a lot and makes the UX better for most
local users trying to get started rather than having to match the flags
of the two commands (build and then run)

Also, moved `ImageType` to distribution.utils since there were circular
import errors with its old location

## Test Plan

tested locally using the following command: 

`llama stack build --run --template ollama --image-type venv`

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-23 22:06:09 -05:00
Ashwin Bharambe
6227e1e3b9
fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier (#1225)
Make sure venv behaves like conda (no prefix is added to image_name) and
`--image-type venv` inside a notebook "just works" without any fiddling
2025-02-23 16:57:11 -08:00
Ashwin Bharambe
992f865b2e
chore: move embedding deps to RAG tool where they are needed (#1210)
`EMBEDDING_DEPS` were wrongly associated with `vector_io` providers.
They are needed by
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142
and related code and is used by the RAG tool and as such should only be
needed by the `inline::rag-runtime` provider.
2025-02-21 11:33:41 -08:00
Reid
d2701b0d6a
chore: remove configure subcommand (#1202)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

When tried to use `configure`, and found it `DEPRECATED`, and found pr
https://github.com/meta-llama/llama-stack/pull/371 to remove it, not
sure why not remove the `configure.py`?
```
$ llama stack configure /tmp/test.yaml
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config
llama stack configure: error:
    DEPRECATED! llama stack configure has been deprecated.
    Please use llama stack run <path/to/run.yaml> instead.
    Please see example run.yaml in /distributions folder.
```

It would better better to tell when user check it how to use with
`--help` first:

```
before:
$ llama stack configure --help
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config

Configure a llama stack distribution

positional arguments:

after:
$ llama stack configure --help
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config

Configure a llama stack distribution

    DEPRECATED! llama stack configure has been deprecated.
    Please use llama stack run <path/to/run.yaml> instead.
    Please see example run.yaml in /distributions folder.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:06:25 -08:00
Reid
89d37687dd
chore: remove --no-list-templates option (#1121)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

From the code and the usage, seems cannot see that need to use
`--no-list-templates` to handle, and also make the user confused from
the help text, so try to remove it.
```
$ llama stack build --no-list-templates
> Enter a name for your Llama Stack (e.g. my-local-stack):

$ llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack):

before:
$ llama stack build --help
  --list-templates, --no-list-templates
                        Show the available templates for building a Llama Stack distribution (default: False)

after:
  --list-templates      Show the available templates for building a Llama Stack distribution
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:13:46 -08:00
Reid
8dc1cac333
style: fix the capitalization issue (#1117)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
before:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
                       [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config

start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

After:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
                       [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config

Start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-14 17:16:26 -08:00
Sébastien Han
369cc513cb
fix: improve stack build on venv (#980)
# What does this PR do?

Added a pre_run_checks function to ensure a smooth environment setup by
verifying prerequisites. It checks for an existing virtual environment,
ensures uv is installed, and deactivates any active environment if
necessary.

Run the full build inside a venv created by 'uv'.

Improved string handling in printf statements and added shellcheck
suppressions for expected word splitting in pip commands.

These enhancements improve robustness, prevent
conflicts, and ensure a seamless setup process.

Signed-off-by: Sébastien Han <seb@redhat.com>

- [ ] Addresses issue (#issue)


## Test Plan

Run the following command on either Linux or MacOS:

```
llama stack build --template ollama --image-type venv --image-name foo
+ build_name=foo
+ env_name=llamastack-foo
+ pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ RED='\033[0;31m'
+ NC='\033[0m'
+ ENVNAME=
+++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
+ SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution
+ source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh
+ pre_run_checks llamastack-foo
+ local env_name=llamastack-foo
+ is_command_available uv
+ command -v uv
+ '[' -d llamastack-foo ']'
+ run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ local env_name=llamastack-foo
+ local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ echo 'Creating new virtual environment llamastack-foo'
Creating new virtual environment llamastack-foo
+ uv venv llamastack-foo
Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: llamastack-foo
Activate with: source llamastack-foo/bin/activate
+ source llamastack-foo/bin/activate
++ '[' -n x ']'
++ SCRIPT_PATH=llamastack-foo/bin/activate
++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ '[' darwin24 = cygwin ']'
++ '[' darwin24 = msys ']'
++ export VIRTUAL_ENV
++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ export PATH
++ '[' x '!=' x ']'
+++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ VIRTUAL_ENV_PROMPT='(llamastack-foo) '
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(llamastack-foo) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ '[' -n '' ']'
+ '[' -n '' ']'
+ uv pip install --no-cache-dir llama-stack
Using Python 3.13.1 environment at: llamastack-foo
Resolved 50 packages in 1.25s
   Built fire==0.7.0
Prepared 50 packages in 1.22s
Installed 50 packages in 126ms
 + annotated-types==0.7.0
 + anyio==4.8.0
 + blobfile==3.0.0
 + certifi==2025.1.31
 + charset-normalizer==3.4.1
 + click==8.1.8
 + distro==1.9.0
 + filelock==3.17.0
 + fire==0.7.0
 + fsspec==2025.2.0
 + h11==0.14.0
 + httpcore==1.0.7
 + httpx==0.28.1
 + huggingface-hub==0.28.1
 + idna==3.10
 + jinja2==3.1.5
 + llama-models==0.1.2
 + llama-stack==0.1.2
 + llama-stack-client==0.1.2
 + lxml==5.3.1
 + markdown-it-py==3.0.0
 + markupsafe==3.0.2
 + mdurl==0.1.2
 + numpy==2.2.2
 + packaging==24.2
 + pandas==2.2.3
 + pillow==11.1.0
 + prompt-toolkit==3.0.50
 + pyaml==25.1.0
 + pycryptodomex==3.21.0
 + pydantic==2.10.6
 + pydantic-core==2.27.2
 + pygments==2.19.1
 + python-dateutil==2.9.0.post0
 + python-dotenv==1.0.1
 + pytz==2025.1
 + pyyaml==6.0.2
 + regex==2024.11.6
 + requests==2.32.3
 + rich==13.9.4
 + setuptools==75.8.0
 + six==1.17.0
 + sniffio==1.3.1
 + termcolor==2.5.0
 + tiktoken==0.8.0
 + tqdm==4.67.1
 + typing-extensions==4.12.2
 + tzdata==2025.1
 + urllib3==2.3.0
 + wcwidth==0.2.13
+ '[' -n '' ']'
+ printf 'Installing pip dependencies\n'
Installing pip dependencies
+ uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn
Using Python 3.13.1 environment at: llamastack-foo
Resolved 105 packages in 37ms
Uninstalled 2 packages in 65ms
Installed 72 packages in 195ms
 + aiohappyeyeballs==2.4.6
 + aiohttp==3.11.12
 + aiosignal==1.3.2
 + aiosqlite==0.21.0
 + attrs==25.1.0
 + autoevals==0.0.119
 + backoff==2.2.1
 + braintrust-core==0.0.58
 + chardet==5.2.0
 + chevron==0.14.0
 + chromadb-client==0.6.3
 + contourpy==1.3.1
 + cycler==0.12.1
 + datasets==3.2.0
 + deprecated==1.2.18
 + dill==0.3.8
 + faiss-cpu==1.10.0
 + fastapi==0.115.8
 + fonttools==4.56.0
 + frozenlist==1.5.0
 - fsspec==2025.2.0
 + fsspec==2024.9.0
 + googleapis-common-protos==1.66.0
 + grpcio==1.70.0
 + importlib-metadata==8.5.0
 + jiter==0.8.2
 + joblib==1.4.2
 + jsonschema==4.23.0
 + jsonschema-specifications==2024.10.1
 + kiwisolver==1.4.8
 + levenshtein==0.26.1
 + matplotlib==3.10.0
 + monotonic==1.6
 + multidict==6.1.0
 + multiprocess==0.70.16
 + nltk==3.9.1
 - numpy==2.2.2
 + numpy==1.26.4
 + ollama==0.4.7
 + openai==1.61.1
 + opentelemetry-api==1.30.0
 + opentelemetry-exporter-otlp-proto-common==1.30.0
 + opentelemetry-exporter-otlp-proto-grpc==1.30.0
 + opentelemetry-exporter-otlp-proto-http==1.30.0
 + opentelemetry-proto==1.30.0
 + opentelemetry-sdk==1.30.0
 + opentelemetry-semantic-conventions==0.51b0
 + orjson==3.10.15
 + overrides==7.7.0
 + posthog==3.12.0
 + propcache==0.2.1
 + protobuf==5.29.3
 + psycopg2-binary==2.9.10
 + pyarrow==19.0.0
 + pyparsing==3.2.1
 + pypdf==5.3.0
 + rapidfuzz==3.12.1
 + redis==5.2.1
 + referencing==0.36.2
 + rpds-py==0.22.3
 + safetensors==0.5.2
 + scikit-learn==1.6.1
 + scipy==1.15.1
 + sentencepiece==0.2.0
 + starlette==0.45.3
 + tenacity==9.0.0
 + threadpoolctl==3.5.0
 + tokenizers==0.21.0
 + transformers==4.48.3
 + uvicorn==0.34.0
 + wrapt==1.17.2
 + xxhash==3.5.0
 + yarl==1.18.3
 + zipp==3.21.0
+ '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' ']'
+ IFS='#'
+ read -ra parts
+ for part in '"${parts[@]}"'
+ echo 'sentence-transformers --no-deps'
sentence-transformers --no-deps
+ uv pip install sentence-transformers --no-deps
Using Python 3.13.1 environment at: llamastack-foo
Resolved 1 package in 141ms
Installed 1 package in 6ms
 + sentence-transformers==3.4.1
+ for part in '"${parts[@]}"'
+ echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu'
torch torchvision --index-url https://download.pytorch.org/whl/cpu
+ uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Using Python 3.13.1 environment at: llamastack-foo
Resolved 13 packages in 2.15s
Installed 5 packages in 324ms
 + mpmath==1.3.0
 + networkx==3.3
 + sympy==1.13.1
 + torch==2.6.0
 + torchvision==0.21.0
Build Successful!
```

Run:

```
$ source llamastack-foo/bin/activate
$ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001 
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
  db_path: /Users/leseb/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-3B-Instruct
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - llm
  provider_id: ollama
  provider_model_id: null
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - embedding
  provider_id: sentence-transformers
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio:
  - config: {}
    provider_id: huggingface
    provider_type: remote::huggingface
  - config: {}
    provider_id: localfs
    provider_type: inline::localfs
  eval:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  - config: {}
    provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  scoring:
  - config: {}
    provider_id: basic
    provider_type: inline::basic
  - config: {}
    provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
  - config:
      openai_api_key: '********'
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config:
      service_name: llama-stack
      sinks: console,sqlite
      sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config:
      api_key: '********'
      max_results: 3
    provider_id: brave-search
    provider_type: remote::brave-search
  - config:
      api_key: '********'
      max_results: 3
    provider_id: tavily-search
    provider_type: remote::tavily-search
  - config: {}
    provider_id: code-interpreter
    provider_type: inline::code-interpreter
  - config: {}
    provider_id: rag-runtime
    provider_type: inline::rag-runtime
  vector_io:
  - config:
      kvstore:
        db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
scoring_fns: []
server:
  port: 8321
  tls_certfile: null
  tls_keyfile: null
shields: []
tool_groups:
- args: null
  mcp_endpoint: null
  provider_id: tavily-search
  toolgroup_id: builtin::websearch
- args: null
  mcp_endpoint: null
  provider_id: rag-runtime
  toolgroup_id: builtin::rag
- args: null
  mcp_endpoint: null
  provider_id: code-interpreter
  toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'

Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
modules.json: 100%|███████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 485kB/s]
config_sentence_transformers.json: 100%|██████████████████████████████████████| 116/116 [00:00<00:00, 498kB/s]
README.md: 100%|█████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 20.5MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 583kB/s]
config.json: 100%|███████████████████████████████████████████████████████████| 612/612 [00:00<00:00, 4.63MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████| 90.9M/90.9M [00:02<00:00, 36.6MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████| 350/350 [00:00<00:00, 4.27MB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.90MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.23MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████| 112/112 [00:00<00:00, 1.47MB/s]
1_Pooling/config.json: 100%|██████████████████████████████████████████████████| 190/190 [00:00<00:00, 841kB/s]
Serving API tool_groups
 GET /v1/tools/{tool_name}
 GET /v1/toolgroups/{toolgroup_id}
 GET /v1/toolgroups
 GET /v1/tools
 POST /v1/toolgroups
 DELETE /v1/toolgroups/{toolgroup_id}
Serving API tool_runtime
 POST /v1/tool-runtime/invoke
 GET /v1/tool-runtime/list-tools
 POST /v1/tool-runtime/rag-tool/insert
 POST /v1/tool-runtime/rag-tool/query
Serving API vector_io
 POST /v1/vector-io/insert
 POST /v1/vector-io/query
Serving API telemetry
 GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
 GET /v1/telemetry/spans/{span_id}/tree
 GET /v1/telemetry/traces/{trace_id}
 POST /v1/telemetry/events
 GET /v1/telemetry/spans
 GET /v1/telemetry/traces
 POST /v1/telemetry/spans/export
Serving API models
 GET /v1/models/{model_id}
 GET /v1/models
 POST /v1/models
 DELETE /v1/models/{model_id}
Serving API eval
 POST /v1/eval/tasks/{task_id}/evaluations
 DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}
 POST /v1/eval/tasks/{task_id}/jobs
Serving API datasets
 GET /v1/datasets/{dataset_id}
 GET /v1/datasets
 POST /v1/datasets
 DELETE /v1/datasets/{dataset_id}
Serving API scoring_functions
 GET /v1/scoring-functions/{scoring_fn_id}
 GET /v1/scoring-functions
 POST /v1/scoring-functions
Serving API inspect
 GET /v1/health
 GET /v1/inspect/providers
 GET /v1/inspect/routes
 GET /v1/version
Serving API scoring
 POST /v1/scoring/score
 POST /v1/scoring/score-batch
Serving API shields
 GET /v1/shields/{identifier}
 GET /v1/shields
 POST /v1/shields
Serving API vector_dbs
 GET /v1/vector-dbs/{vector_db_id}
 GET /v1/vector-dbs
 POST /v1/vector-dbs
 DELETE /v1/vector-dbs/{vector_db_id}
Serving API eval_tasks
 GET /v1/eval-tasks/{eval_task_id}
 GET /v1/eval-tasks
 POST /v1/eval-tasks
Serving API agents
 POST /v1/agents
 POST /v1/agents/{agent_id}/session
 POST /v1/agents/{agent_id}/session/{session_id}/turn
 DELETE /v1/agents/{agent_id}
 DELETE /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API inference
 POST /v1/inference/chat-completion
 POST /v1/inference/completion
 POST /v1/inference/embeddings
Serving API datasetio
 POST /v1/datasetio/rows
 GET /v1/datasetio/rows
Serving API safety
 POST /v1/safety/run-shield

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [39145]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 09:22:03 -08:00
Sébastien Han
e4a1579e63
build: format codebase imports using ruff linter (#1028)
# What does this PR do?

- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 10:06:21 -08:00
Ihar Hrachyshka
cc700b2f68
feat: support listing all for llama stack list-providers (#1056)
# What does this PR do?
Support listing all for `llama stack list-providers`.

For ease of reading, sort the output rows by type.

Before the change.

```
 llama stack list-providers
usage: llama stack list-providers [-h] {inference,safety,agents,vector_io,datasetio,scoring,eval,post_training,tool_runtime,telemetry}
llama stack list-providers: error: the following arguments are required: api
```

After the change.

```
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| API Type      | Provider Type                    | PIP Package Dependencies                                                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| agents        | inline::meta-reference           | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis            |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| datasetio     | inline::localfs                  | pandas                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| datasetio     | remote::huggingface              | datasets                                                                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| eval          | inline::meta-reference           |                                                                                  |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | inline::meta-reference           | accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format-      |
|               |                                  | enforcer,sentence-transformers                                                   |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | inline::meta-reference-quantized | accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format-      |
|               |                                  | enforcer,sentence-transformers,fbgemm-gpu,torchao==0.5.0                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | inline::sentence-transformers    | sentence-transformers                                                            |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | inline::vllm                     | vllm                                                                             |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::bedrock                  | boto3                                                                            |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::cerebras                 | cerebras_cloud_sdk                                                               |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::databricks               | openai                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::fireworks                | fireworks-ai                                                                     |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::groq                     | groq                                                                             |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::hf::endpoint             | huggingface_hub,aiohttp                                                          |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::hf::serverless           | huggingface_hub,aiohttp                                                          |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::nvidia                   | openai                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::ollama                   | ollama,aiohttp                                                                   |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::runpod                   | openai                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::sambanova                | openai                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::tgi                      | huggingface_hub,aiohttp                                                          |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::together                 | together                                                                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| inference     | remote::vllm                     | openai                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| post_training | inline::torchtune                | torch,torchtune==0.5.0,torchao==0.8.0,numpy                                      |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| safety        | inline::code-scanner             | codeshield                                                                       |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| safety        | inline::llama-guard              |                                                                                  |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| safety        | inline::meta-reference           | transformers,torch --index-url https://download.pytorch.org/whl/cpu              |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| safety        | inline::prompt-guard             | transformers,torch --index-url https://download.pytorch.org/whl/cpu              |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| safety        | remote::bedrock                  | boto3                                                                            |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| scoring       | inline::basic                    |                                                                                  |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| scoring       | inline::braintrust               | autoevals,openai                                                                 |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| scoring       | inline::llm-as-judge             |                                                                                  |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| telemetry     | inline::meta-reference           | opentelemetry-sdk,opentelemetry-exporter-otlp-proto-http                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| tool_runtime  | inline::code-interpreter         |                                                                                  |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| tool_runtime  | inline::rag-runtime              |                                                                                  |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| tool_runtime  | remote::bing-search              | requests                                                                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| tool_runtime  | remote::brave-search             | requests                                                                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| tool_runtime  | remote::model-context-protocol   | mcp                                                                              |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| tool_runtime  | remote::tavily-search            | requests                                                                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| tool_runtime  | remote::wolfram-alpha            | requests                                                                         |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| vector_io     | inline::chromadb                 | blobfile,chardet,pypdf,tqdm,numpy,scikit-                                        |
|               |                                  | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url        |
|               |                                  | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb    |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| vector_io     | inline::faiss                    | blobfile,chardet,pypdf,tqdm,numpy,scikit-                                        |
|               |                                  | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url        |
|               |                                  | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu   |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| vector_io     | inline::meta-reference           | blobfile,chardet,pypdf,tqdm,numpy,scikit-                                        |
|               |                                  | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url        |
|               |                                  | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu   |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| vector_io     | remote::chromadb                 | blobfile,chardet,pypdf,tqdm,numpy,scikit-                                        |
|               |                                  | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url        |
|               |                                  | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb-   |
|               |                                  | client                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| vector_io     | remote::pgvector                 | blobfile,chardet,pypdf,tqdm,numpy,scikit-                                        |
|               |                                  | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url        |
|               |                                  | https://download.pytorch.org/whl/cpu,sentence-transformers --no-                 |
|               |                                  | deps,psycopg2-binary                                                             |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| vector_io     | remote::qdrant                   | blobfile,chardet,pypdf,tqdm,numpy,scikit-                                        |
|               |                                  | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url        |
|               |                                  | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,qdrant-     |
|               |                                  | client                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
| vector_io     | remote::weaviate                 | blobfile,chardet,pypdf,tqdm,numpy,scikit-                                        |
|               |                                  | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url        |
|               |                                  | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,weaviate-   |
|               |                                  | client                                                                           |
+---------------+----------------------------------+----------------------------------------------------------------------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Manually.

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-12 22:03:28 -08:00
Charlie Doern
025f615868
feat: add support for running in a venv (#1018)
# What does this PR do?

add --image-type to `llama stack run`. Which takes conda, container or
venv also add start_venv.sh which start the stack using a venv

resolves #1007

## Test Plan

running locally:

`llama stack build --template ollama --image-type venv`
`llama stack run --image-type venv
~/.llama/distributions/ollama/ollama-run.yaml`
...
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Using run configuration: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Using config file: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml
Run configuration:
apis:
- agents
- datasetio
...
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-12 11:13:04 -05:00
Charlie Doern
5f88ff0b6a
fix: show proper help text (#1065)
# What does this PR do?
when executing a sub-command like `llama model` the improper help text,
sub-commands, and flags are displayed. each command group needs to have
`.set_defaults` to display this info properly

before: 

```
llama model
usage: llama [-h] {model,stack,download,verify-download} ...

Welcome to the Llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {model,stack,download,verify-download}
```

after:

```
llama model
usage: llama model [-h] {download,list,prompt-format,describe,verify-download} ...

Work with llama models

options:
  -h, --help            show this help message and exit

model_subcommands:
  {download,list,prompt-format,describe,verify-download}
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-12 06:38:25 -08:00
Ihar Hrachyshka
24385cfd03
fix: filter out remote::sample providers when listing (#1057)
# What does this PR do?

Before:

```
 llama stack list-providers agents
+------------------------+-----------------------------------------------------------------------+
| Provider Type          | PIP Package Dependencies                                              |
+------------------------+-----------------------------------------------------------------------+
| inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis |
+------------------------+-----------------------------------------------------------------------+
| remote::sample         |                                                                       |
+------------------------+-----------------------------------------------------------------------+
```

After:

```
 llama stack list-providers agents
+------------------------+-----------------------------------------------------------------------+
| Provider Type          | PIP Package Dependencies                                              |
+------------------------+-----------------------------------------------------------------------+
| inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis |
+------------------------+-----------------------------------------------------------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Manually.

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-11 16:12:46 -08:00
Ashwin Bharambe
f8f2f7f9bb
feat: Add HTTPS serving option (#1000)
# What does this PR do?

Enables HTTPS option for Llama Stack. 

While doing so, introduces a `ServerConfig` sub-structure to house all
server related configuration (port, ssl, etc.)

Also simplified the `start_container.sh` entrypoint to simply be
`python` instead of a complex bash command line.

## Test Plan

Conda: 

Run:
```bash
$ llama stack build --template together
$ llama stack run --port 8322        # ensure server starts 

$ llama-stack-client configure --endpoint http://localhost:8322
$ llama-stack-client models list
```

Create a self-signed SSL key / cert pair. Then, using a local checkout
of `llama-stack-client-python`, change
https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/_base_client.py#L759
to add `kwargs.setdefault("verify", False)` so SSL verification is
disabled. Then:

```bash
$ llama stack run --port 8322 --tls-keyfile <KEYFILE> --tls-certfile <CERTFILE>
$ llama-stack-client configure --endpoint https://localhost:8322  # notice the `https`
$ llama-stack-client models list
```

Also tested with containers (but of course one needs to make sure the
cert and key files are appropriately provided to the container.)
2025-02-07 09:39:08 -08:00
Yuan Tang
3f9764d50c
fix: List providers command prints out non-existing APIs from registry. Fixes #966 (#969)
Fixes #966.

Verified that:
1. Correct list of APIs are printed out when running `llama stack
list-providers`
2. `llama stack list-providers <api>` works as expected.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-07 09:02:15 -08:00
Charlie Doern
f5e4bf2edf
chore: remove unused argument (#987)
# What does this PR do?

very small fix I noticed some unused arguments, but this seems like the
easiest one to remove since its passed in explicitly.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-06 10:05:35 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Ashwin Bharambe
5b1e69e58e
Use uv pip install instead of pip install (#921)
## What does this PR do? 

See issue: #747 -- `uv` is just plain better. This PR does the bare
minimum of replacing `pip install` by `uv pip install` and ensuring `uv`
exists in the environment.

## Test Plan 

First: create new conda, `uv pip install -e .` on `llama-stack` -- all
is good.
Next: run `llama stack build --template together` followed by `llama
stack run together` -- all good
Next: run `llama stack build --template together --image-name yoyo`
followed by `llama stack run together --image-name yoyo` -- all good
Next: fresh conda and `uv pip install -e .` and `llama stack build
--template together --image-type venv` -- all good.

Docker: `llama stack build --template together --image-type container`
works!
2025-01-31 22:29:41 -08:00
Ashwin Bharambe
216cde5ee8 Add --print-deps-only for computing dependencies 2025-01-31 14:33:51 -08:00
Dmitry Rogozhkin
80f2032485
Fix running stack built with base conda environment (#903)
Fixes: #902

For the test verified that llama stack can run if built:
* With default "base" conda environment
* With new custom conda environment using `--image-name XXX` option
In both cases llama stack starts fine (was failing with "base") before
this patch.

CC: @ashwinb

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-01-29 21:24:22 -08:00
Yuan Tang
53721e91ad
Fix validator of "container" image type (#901)
This was missed in https://github.com/meta-llama/llama-stack/pull/802
somehow.
2025-01-29 09:36:52 -08:00
Ashwin Bharambe
aee6237685 Small refactor for run_with_pty 2025-01-28 09:32:33 -08:00
Vladislav Bronzov
8332ea23ad
Add run win command for stack (#890)
# What does this PR do?

Add win platform run command for stack

- [x] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.
https://github.com/meta-llama/llama-stack/pull/889


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-28 08:04:28 -08:00
Ashwin Bharambe
891bf704eb
Ensure llama stack build --config <> --image-type <> works (#879)
Fix the issues brought up in
https://github.com/meta-llama/llama-stack/issues/870

Test all combinations of (conda, container) vs. (template, config)
combos.
2025-01-25 11:13:36 -08:00
Yuan Tang
6da3053c0e
More generic image type for OCI-compliant container technologies (#802)
It's a more generic term and applicable to alternatives of Docker, such
as Podman or other OCI-compliant technologies.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 16:37:42 -08:00
Ashwin Bharambe
03ac84a829 Update default port from 5000 -> 8321 2025-01-16 15:26:48 -08:00
Ashwin Bharambe
cee3816609
Make llama stack build not create a new conda by default (#788)
## What does this PR do?

So far `llama stack build` has always created a separate conda
environment for packaging the dependencies of a distribution. The main
reason to do so is isolation -- distributions are composed of providers
which can have a variety of potentially conflicting dependencies. That
said, this has created significant annoyance for new users since it is
not at all transparent. The fact that `llama stack run` is actually
running the code in some other conda is very surprising.

This PR tries to make things better. 

- Both `llama stack build` and `llama stack run` now accept an
`--image-name` argument which represents the (conda, docker, virtualenv)
image you want to operate upon.
- For the default (conda) mode, the script checks if a current conda
environment exists. If one exists, it uses it.
- If `--image-name` is provided, that option is used. In this case, an
environment is created if needed.
- There is no automatic `llamastack-` prefixing of the environment names
done anymore.


## Test Plan

Start in a conda environment, run `llama stack build --template
fireworks`; verify that it successfully built into the current
environment and stored the build file at
`$CONDA_PREFIX/llamastack-build.yaml`. Run `llama stack run fireworks`
which started correctly in the current environment.

Ran the same build command outside of conda. It failed asking for
`--image-name`. Ran it with `llama stack build --template fireworks
--image-name foo`. This successfully created a conda environment called
`foo` and installed deps. Ran `llama stack run fireworks` outside conda
which failed. Activated a different conda, ran again, it failed saying
it did not find the `llamastack-build.yaml` file. Then used
`--image-name foo` option and it ran successfully.
2025-01-16 13:44:53 -08:00
Xi Yan
32d3abe964
[CICD] Github workflow for publishing Docker images (#764)
# What does this PR do?

- Add Github workflow for publishing docker images. 
- Manual Inputs
- We can use a (1) TestPyPi version / (2) build via released PyPi
version

**Notes**
- Keep this workflow manually triggered as we don't want to publish
nightly docker images

**Additional Changes**
- Resolve issue with running llama stack build in non-terminal device
```
  File "/home/runner/.local/lib/python3.12/site-packages/llama_stack/distribution/utils/exec.py", line 25, in run_with_pty
    old_settings = termios.tcgetattr(sys.stdin)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
termios.error: (25, 'Inappropriate ioctl for device')
```
- Modified build_container.sh to work in non-terminal environment


## Test Plan

- Triggered workflow:
3562217878
<img width="1076" alt="image"
src="https://github.com/user-attachments/assets/f1b5cef6-05ab-49c7-b405-53abc9264734"
/>


- Tested published docker image
<img width="702" alt="image"
src="https://github.com/user-attachments/assets/e7135189-65c8-45d8-86f9-9f3be70e380b"
/>


- /tools API endpoints are served so that docker is correctly using the
TestPyPi package
<img width="296" alt="image"
src="https://github.com/user-attachments/assets/bbcaa7fe-c0a4-4d22-b600-90e3c254bbfd"
/>

- Published tagged images:
https://hub.docker.com/repositories/llamastack
<img width="947" alt="image"
src="https://github.com/user-attachments/assets/2a0a0494-4d45-4643-bc29-72154ecc54a5"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-15 09:01:33 -08:00
Dinesh Yeduguru
a174938fbd
Fix telemetry to work on reinstantiating new lib cli (#761)
# What does this PR do?

Since we maintain global state in our telemetry pipeline,
reinstantiating lib cli will cause us to add duplicate span processors
causing sqlite to lock out because of constraint violations since we now
have two span processor writing to sqlite. This PR changes the telemetry
adapter for otel to only instantiate the provider once and add the span
processsors only once.

Also fixes an issue llama stack build


## Test Plan

tested with notebook at
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d#scrollTo=9496f75c
2025-01-14 11:31:50 -08:00
Yuan Tang
9ec54dcbe7
Switch to use importlib instead of deprecated pkg_resources (#678)
`pkg_resources` has been deprecated. This PR switches to use
`importlib.resources`.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-13 20:20:02 -08:00
raghotham
ff182ff6de
rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars (#744)
# What does this PR do?

Rename environment var for consistency

## Test Plan

No regressions

## Sources

## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [X] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-10 11:09:49 -08:00
Yuan Tang
24fa1adc2f
Expose LLAMASTACK_PORT in cli.stack.run (#722)
This was missed in https://github.com/meta-llama/llama-stack/pull/706. I
tested `llama_stack.distribution.server.server` but didn't test `llama
stack run`. cc @ashwinb

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-10 09:13:49 -08:00
Xi Yan
596afc6497
add --version to llama stack CLI & /version endpoint (#732)
# What does this PR do?

- add --version to llama stack CLI 
- add /version endpoint
- run OpenAPI generator for the new endpoint

## Test Plan

**CLI**
<img width="184" alt="image"
src="https://github.com/user-attachments/assets/3acb1d22-453e-4b79-baf6-e98e88d0671c"
/>



**endpoint**
<img width="430" alt="image"
src="https://github.com/user-attachments/assets/79cdd670-493b-40cf-8f9e-28a4ac0988ac"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-08 16:30:06 -08:00
Xi Yan
3c72c034e6
[remove import *] clean up import *'s (#689)
# What does this PR do?

- as title, cleaning up `import *`'s
- upgrade tests to make them more robust to bad model outputs
- remove import *'s in llama_stack/apis/* (skip __init__ modules)
<img width="465" alt="image"
src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2"
/>

- run `sh run_openapi_generator.sh`, no types gets affected

## Test Plan

### Providers Tests

**agents**
```
pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8
```

**inference**
```bash
# meta-reference
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

# together
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py 
```

**safety**
```
pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B
```

**memory**
```
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384
```

**scoring**
```
pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
```


**datasetio**
```
pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py
pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py
```


**eval**
```
pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py
```

### Client-SDK Tests
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```

### llama-stack-apps
```
PORT=5000
LOCALHOST=localhost

python -m examples.agents.hello $LOCALHOST $PORT
python -m examples.agents.inflation $LOCALHOST $PORT
python -m examples.agents.podcast_transcript $LOCALHOST $PORT
python -m examples.agents.rag_as_attachments $LOCALHOST $PORT
python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT
python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT
python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT

# Vision model
python -m examples.interior_design_assistant.app
python -m examples.agent_store.app $LOCALHOST $PORT
```

### CLI
```
which llama
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
llama model list
llama stack list-apis
llama stack list-providers inference

llama stack build --template ollama --image-type conda
```

### Distributions Tests
**ollama**
```
llama stack build --template ollama --image-type conda
ollama run llama3.2:1b-instruct-fp16
llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct
```

**fireworks**
```
llama stack build --template fireworks --image-type conda
llama stack run ./llama_stack/templates/fireworks/run.yaml
```

**together**
```
llama stack build --template together --image-type conda
llama stack run ./llama_stack/templates/together/run.yaml
```

**tgi**
```
llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-27 15:45:44 -08:00
Yuan Tang
987e651755
Add missing venv option in --image-type (#677)
"venv" option is supported but not mentioned in the prompt.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-21 21:10:13 -08:00