Commit graph

504 commits

Author SHA1 Message Date
Kai Wu
4e19f15bca add mcp 2025-08-02 15:54:52 -07:00
Kai Wu
3c24be8273 kind of working 2025-07-31 15:19:46 -07:00
Kai Wu
31a15332c4 NIM not working yet
Some checks failed
Installer CI / smoke-test-on-dev (push) Failing after 5s
Installer CI / lint (push) Failing after 9s
2025-07-29 14:26:58 -07:00
Ashwin Bharambe
08b4a1deb3
feat(tests): introduce inference record/replay to increase test reliability (#2941)
Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.

For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.

As expected, tests become much much faster (more than 3x in just
inference testing.)

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)
2025-07-29 12:41:31 -07:00
ehhuang
4019027070
chore: revert #2855 (#2939)
# What does this PR do?
revert https://github.com/meta-llama/llama-stack/pull/2855 to unblock
release (running out of disk space)

Error here:
4689354931

## Test Plan
2025-07-28 15:30:25 -07:00
Charlie Doern
46e2989312
fix: switch refresh to debug log (#2933)
# What does this PR do?
the server logs have a persistent `core: refreshing registry` log that
clogs up the output. Switch it to debug

this is what it looked like:

<img width="1126" height="1028" alt="Screenshot 2025-07-28 at 9 56
44 AM"
src="https://github.com/user-attachments/assets/a1880fd3-7fc7-4a97-bfb8-89a62e4c5c19"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-28 10:02:54 -07:00
Ashwin Bharambe
9583f468f8
feat(starter)!: simplify starter distro; litellm model registry changes (#2916) 2025-07-25 15:02:04 -07:00
Charlie Doern
3344d8a9e5
fix: separate build and run provider types (#2917)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Integration Tests / test-matrix (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do?

in #2637, I combined the run and build config provider types to both use
`Provider`

since this includes a provider_id, a user must now specify this when
writing a build yaml. This is not very clear because all a user should
care about upon build is the code to be installed (the module and the
provider_type)

introduce `BuildProvider` and fixup the parts of the code impacted by
this

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-25 12:39:26 -07:00
Ashwin Bharambe
ed07a58b50
fix(registry): ensure clean shutdown (#2901)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 57s
Avoid the error message:

```
INFO     2025-07-24 21:51:54,530 __main__:598 server: Received interrupt signal, shutting down gracefully...                                          
ERROR    2025-07-24 21:51:54,692 asyncio:1826 uncategorized: Task was destroyed but it is pending!                                                    
         task: <Task pending name='Task-15' coro=<refresh_registry() running at                                                                       
         /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/stack.py:356> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=>  
```
2025-07-25 09:44:31 -04:00
Charlie Doern
de6919ecdd
refactor: install external providers from module (#2637)
# What does this PR do?

Today, external providers are installed via the `external_providers_dir`
in the config. This necessitates users to understand the `ProviderSpec`
and set up their directories accordingly. This process splits up the
config for the stack across multiple files, directories, and formats.

Most (if not all) external providers today have a
[get_provider_spec](559cb18fbb/src/ramalama_stack/provider.py (L9))
method that sits unused. Utilizing this method rather than the
providers.d route allows for a much easier installation process for
external providers and limits the amount of extra configuration a
regular user has to do to get their stack off the ground.

To accomplish this and wire it throughout the build process, Introduce
the concept of a `module` for users to specify for an external provider
upon build time. In order to facilitate this, align the build and run
spec to use `Provider` class rather than the stringified provider_type
that build currently uses.

For example, say this is in your build config:

```
- provider_id: ramalama
  provider_type: remote::ramalama
  module: ramalama_stack
```

during build (in the various `build_...` scripts), additionally to
installing any pip dependencies we will also install this module and use
the `get_provider_spec` method to retrieve the ProviderSpec that is
currently specified using `providers.d`.

In production so far, providing instructions for installing external
providers for users has been difficult: they need to install the module
as a pre-req, create the providers.d directory, copy in the provider
spec, and also copy in the necessary build/run yaml files. Accessing an
external provider should be as easy as possible, and pointing to its
installable module aligns more with the rest of our build and dependency
management process.

For now, `external_providers_dir` still exists as an alternate more
declarative method of using external providers.

## Test Plan

added an integration test installing an external provider from module
and more unit test coverage for `get_provider_registry`


( the warning in yellow is expected, the module is installed inside of
the build env, not where we are running the command)
<img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30
48 AM"
src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d"
/>

<img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31
14 AM"
src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc"
/>

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-25 15:41:26 +02:00
ehhuang
21bae296f2
feat(auth): API access control (#2822)
# What does this PR do?
- Added ability to specify `required_scope` when declaring an API. This
is part of the `@webmethod` decorator.
- If auth is enabled, a user can access an API only if
`user.attributes['scope']` includes the `required_scope`
- We add `required_scope='telemetry.read'` to the telemetry read APIs.

## Test Plan
CI with added tests

1. Enable server.auth with github token
2. Observe `client.telemetry.query_traces()` returns 403
2025-07-24 15:30:48 -07:00
Sébastien Han
632cf9eb72
feat: Bring Your Own API (BYOA) (#2228)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Installer CI / lint (push) Failing after 3s
Integration Tests / discover-tests (push) Successful in 3s
Installer CI / smoke-test-on-dev (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?

Prototype on a new feature to allow new APIs to be plugged in Llama
Stack. Opened for early feedback on the approach and test appetite on
the functionality.

@ashwinb @raghotham open for early feedback, thanks!

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-24 13:41:14 -07:00
ehhuang
cbe89d2bdd
chore: return webmethod from find_matching_route (#2883)
This will be used to support API access control, i.e. Webmethod would
have a `required_scope` attribute, and we need access to that in the
middleware.
2025-07-24 11:37:21 -07:00
Ashwin Bharambe
1463b79218
feat(registry): make the Stack query providers for model listing (#2862)
This flips #2823 and #2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.

In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
2025-07-24 10:39:53 -07:00
Stefan Thaler
537dc693ee
chore: add mypy coverage to inspect.py and library_client.py in /distribution (#2707)
# What does this PR do?
Adds type guards in /distribution/inspect.py and ignores a valid-type
mypy error in library_client.py. This PR is part of issue #2647 . I'm
rather unsure whether ignoring the valid-type error is correct in this
case. It appears that args[0] is interpreted as [any] but I didn't find
any way to specify the type.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->


<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-24 09:51:46 -07:00
Derek Higgins
fd2aab8582
fix: prevent shell redirection issues with pip dependencies (#2867)
- Use printf to to escape special characters (e.g. < > )
- Apply escaping to pip_dependencies and special_pip_deps

Resolves shell interpretation of >= operators as redirections that were
causing build failing to respect versions and unexpected file creation
in /app directory.

Closes: #2866

## Test Plan
Manually tested, will also be tested by existing CI

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-23 21:43:33 +02:00
Derek Higgins
427136bb63
fix: cleanup after build_container.sh (#2869)
- rm TEMP_DIR when build_container.sh succeeds
- prevents multiple temp directories with Containerfile being left in
/tmp

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-23 11:54:54 -07:00
Sébastien Han
c0563c0560
fix: honour deprecation of --config and --template (#2856)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Python Package Build Test / build (3.13) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Integration Tests / test-matrix (push) Failing after 12s
Test Llama Stack Build / build (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?

https://github.com/meta-llama/llama-stack/pull/2716/ broke commands
like:

```
 python -m llama_stack.distribution.server.server --config
 llama_stack/templates/starter/run.yaml
 ```

 And will fail with:

 ```
 Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 626, in <module>
    main()
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 402, in main
    config_file = resolve_config_or_template(args.config, Mode.RUN)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/utils/config_resolution.py", line 43, in resolve_config_or_template
    config_path = Path(config_or_template)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 1162, in __init__
    super().__init__(*args)
  File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 373, in __init__
    raise TypeError(
TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'NoneType'
```

Complaining that no positional arguments are present. We now honour the
deprecation until --config and --template are removed completely.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Both ` python -m llama_stack.distribution.server.server --config
llama_stack/templates/starter/run.yaml` and ` python -m
llama_stack.distribution.server.server
llama_stack/templates/starter/run.yaml` should run the server. Same for
`--template starter`.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-22 20:48:23 -07:00
Derek Higgins
340448e0aa
fix: optimize container build by enabling uv cache (#2855)
- Remove --no-cache flags from uv pip install commands to enable caching
- Mount host uv cache directory to container for persistent caching
- Set UV_LINK_MODE=copy to prevent uv using hardlinks
- When building the starter image
o Build time reduced from ~4:45 to ~3:05 on subsequent builds
(environment specific)
  o Eliminates re-downloading of 3G+ of data on each build 
  o Cache size: ~6.2G (when building starter image)

Fixes excessive data downloads during distro container builds.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-22 16:51:52 -07:00
Ashwin Bharambe
3b83032555
feat(registry): more flexible model lookup (#2859)
This PR updates model registration and lookup behavior to be slightly
more general / flexible. See
https://github.com/meta-llama/llama-stack/issues/2843 for more details.

Note that this change is backwards compatible given the design of the
`lookup_model()` method.

## Test Plan

Added unit tests
2025-07-22 15:22:48 -07:00
Francisco Arceo
c8f274347d
chore: Adding Access Control for OpenAI Vector Stores methods (#2772)
# What does this PR do?

Refactors the vector store routing logic by moving OpenAI-compatible
vector store operations from the `VectorIORouter` to the
`VectorDBsRoutingTable`.

Closes https://github.com/meta-llama/llama-stack/issues/2761

## Test Plan

Added unit tests to cover new routing logic and ACL checks.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-21 16:22:44 -04:00
ehhuang
0d7a90b8bc
chore: merge --config and --template in server.py (#2716)
# What does this PR do?
Part of #2696 

## Test Plan
Run `llama stack run starter`

Error:
```

myenv ❯ llama stack run starters
WARNING  2025-07-10 12:12:43,052 llama_stack.cli.stack.run:82 server: Conda detected. Using conda environment myenv for the run.
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
                       [--image-type {conda,venv}] [--enable-ui]
                       [config | template]
llama stack run: error: Could not resolve config or template 'starters'.

Tried the following locations:
  1. As file path: /Users/erichuang/projects/llama-stack-git/starters
  2. As template: /Users/erichuang/projects/llama-stack-git/llama_stack/templates/starters/run.yaml
  3. As built distribution: (/Users/erichuang/.llama/distributions/llamastack-starters/starters-run.yaml, /Users/erichuang/.llama/distributions/starters/starters-run.yaml)

Available templates: dell, test-env, vllm-gpu, test-template, cerebras, openai-api-verification, sambanova, passthrough, direct-config, together, openai, fireworks, meta-reference-gpu, __pycache__, dev, ollama, watsonx, remote-vllm, llama_api, groq, dummy, oracle, nvidia, ci-tests, postgres-demo, test-stack, bedrock, starter, hf-serverless, hf-endpoint, tgi, open-benchmark, verification

Did you mean one of these templates?
  - starter
  - together
  - postgres-demo
```
2025-07-21 13:19:27 -07:00
Charlie Doern
9a03526672
fix: uvicorn respect log_config (#2842)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 9s
Coverage Badge / unit-tests (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Test External Providers / test-external-providers (venv) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Integration Tests / test-matrix (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m7s
2025-07-21 12:50:39 -07:00
Sébastien Han
019ddda138
fix: graceful SIGINT on server (#2831)
# What does this PR do?

After https://github.com/meta-llama/llama-stack/pull/2818, SIGINT will
print a stack trace. This is because uvicorn re-raises SIGINT and it
gets converted by Python internal signal handler (default handles
SIGINT) to KeyboardInterrupt exception. We know simply catch the
exception to get a clean exit, this is not changing the behavior on
SIGINT.

## Test Plan

Run the server, hit Ctrl+C or `kill -2 <server pid>` and expect a clean
exit with no stack trace.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-21 11:35:15 -07:00
Ashwin Bharambe
199f859eec
feat(vllm): periodically refresh models (#2823)
Just like #2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.

## Test Plan

Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:

```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml
```

Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
2025-07-18 15:53:09 -07:00
Ashwin Bharambe
68a2dfbad7
feat(ollama): periodically refresh models (#2805)
For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.

_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._

## Test Plan

Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:

```
LLAMA_STACK_LOGGING=all=debug \
  ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```

See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
2025-07-18 12:20:36 -07:00
ehhuang
6d55f2f137
feat: enable ls client for files tests (#2769)
# What does this PR do?
titled

## Test Plan
CI
2025-07-18 12:10:30 -07:00
Charlie Doern
d994305f0a
fix: remove disabled providers from model dump (#2784)
# What does this PR do?

currently when running `llama stack run --template starter...` the
__disabled__ providers, their models, etc are printed alongside the
enabled ones making the output really confusing

in server.py add a utility `remove_disabled_providers` which
post-processes the model_dump output to remove any dict with
`provider_id: __disabled__`

we also have `debug` logs printing the disabled providers, so I think
its safe to say that is the only indicator we need when using starter.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

before (output truncated because it was huge):


```
...
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-11B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-11B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-90B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-90B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Scout-17B-16E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Scout-17B-16E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Maverick-17B-128E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Maverick-17B-128E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Meta-Llama-Guard-3-8B
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Meta-Llama-Guard-3-8B
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-Guard-3-8B
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Meta-Llama-Guard-3-8B
         - metadata:
             embedding_dimension: 384
           model_id: all-MiniLM-L6-v2
           model_type: embedding
           provider_id: sentence-transformers
           provider_model_id: null
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           datasetio:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /Users/charliedoern/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               api_key: '********'
               base_url: https://api.cerebras.ai
             provider_id: __disabled__
             provider_type: remote::cerebras
           - config:
               url: http://localhost:11434
             provider_id: ollama
             provider_type: remote::ollama
           - config:
               api_token: '********'
               max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
               tls_verify: ${env.VLLM_TLS_VERIFY:=true}
               url: ${env.VLLM_URL}
             provider_id: __disabled__
             provider_type: remote::vllm
           - config:
               url: ${env.TGI_URL}
             provider_id: __disabled__
             provider_type: remote::tgi
           - config:
               api_token: '********'
               huggingface_repo: ${env.INFERENCE_MODEL}
             provider_id: __disabled__
             provider_type: remote::hf::serverless
           - config:
               api_token: '********'
               endpoint_name: ${env.INFERENCE_ENDPOINT_NAME}
             provider_id: __disabled__
             provider_type: remote::hf::endpoint
           - config:
               api_key: '********'
               url: https://api.fireworks.ai/inference/v1
             provider_id: __disabled__
             provider_type: remote::fireworks
           - config:
               api_key: '********'
               url: https://api.together.xyz/v1
             provider_id: __disabled__
             provider_type: remote::together
           - config: {}
             provider_id: __disabled__
             provider_type: remote::bedrock
           - config:
               api_token: '********'
               url: ${env.DATABRICKS_URL}
             provider_id: __disabled__
             provider_type: remote::databricks
           - config:
               api_key: '********'
               append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
               url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
             provider_id: __disabled__
             provider_type: remote::nvidia
           - config:
               api_token: '********'
               url: ${env.RUNPOD_URL:=}
             provider_id: __disabled__
             provider_type: remote::runpod
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::openai
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::anthropic
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::gemini
           - config:
               api_key: '********'
               url: https://api.groq.com
             provider_id: __disabled__
             provider_type: remote::groq
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.fireworks.ai/inference/v1
             provider_id: __disabled__
             provider_type: remote::fireworks-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.llama.com/compat/v1/
             provider_id: __disabled__
             provider_type: remote::llama-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.together.xyz/v1
             provider_id: __disabled__
             provider_type: remote::together-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.groq.com/openai/v1
             provider_id: __disabled__
             provider_type: remote::groq-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.sambanova.ai/v1
             provider_id: __disabled__
             provider_type: remote::sambanova-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.cerebras.ai/v1
             provider_id: __disabled__
             provider_type: remote::cerebras-openai-compat
           - config:
               api_key: '********'
               url: https://api.sambanova.ai/v1
             provider_id: __disabled__
             provider_type: remote::sambanova
           - config:
               api_key: '********'
               url: ${env.PASSTHROUGH_URL}
             provider_id: __disabled__
             provider_type: remote::passthrough
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: huggingface
               device: cpu
               distributed_backend: null
             provider_id: huggingface
             provider_type: inline::huggingface
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               otel_exporter_otlp_endpoint: null
               service_name: "\u200B"
               sinks: console,sqlite
               sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
           - config:
               db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec_registry.db
                 type: sqlite
             provider_id: __disabled__
             provider_type: inline::sqlite-vec
           - config:
               db_path: ${env.MILVUS_DB_PATH:=~/.llama/distributions/starter}/milvus.db
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/milvus_registry.db
                 type: sqlite
             provider_id: __disabled__
             provider_type: inline::milvus
           - config:
               url: ${env.CHROMADB_URL:=}
             provider_id: __disabled__
             provider_type: remote::chromadb
           - config:
               db: ${env.PGVECTOR_DB:=}
               host: ${env.PGVECTOR_HOST:=localhost}
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/pgvector_registry.db
                 type: sqlite
               password: '********'
               port: ${env.PGVECTOR_PORT:=5432}
               user: ${env.PGVECTOR_USER:=}
             provider_id: __disabled__
             provider_type: remote::pgvector
         scoring_fns: []
         server:
           auth: null
           host: null
           port: 8321
           quota: null
           tls_cafile: null
           tls_certfile: null
           tls_keyfile: null
         shields:
         - params: null
           provider_id: null
           provider_shield_id: ollama/__disabled__
           shield_id: __disabled__
         tool_groups:
         - args: null
           mcp_endpoint: null
           provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - args: null
           mcp_endpoint: null
           provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2

```

after:

```
INFO     2025-07-16 13:00:32,604 __main__:448 server: Run configuration:
INFO     2025-07-16 13:00:32,606 __main__:450 server: apis:
         - agents
         - datasetio
         - eval
         - files
         - inference
         - post_training
         - safety
         - scoring
         - telemetry
         - tool_runtime
         - vector_io
         benchmarks: []
         datasets: []
         image_name: starter
         inference_store:
           db_path: /Users/charliedoern/.llama/distributions/starter/inference_store.db
           type: sqlite
         metadata_store:
           db_path: /Users/charliedoern/.llama/distributions/starter/registry.db
           type: sqlite
         models:
         - metadata: {}
           model_id: ollama/llama3.2:3b
           model_type: llm
           provider_id: ollama
           provider_model_id: llama3.2:3b
         - metadata:
             embedding_dimension: 384
           model_id: all-MiniLM-L6-v2
           model_type: embedding
           provider_id: sentence-transformers
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           datasetio:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /Users/charliedoern/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               url: http://localhost:11434
             provider_id: ollama
             provider_type: remote::ollama
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: huggingface
               device: cpu
             provider_id: huggingface
             provider_type: inline::huggingface
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               service_name: "\u200B"
               sinks: console,sqlite
               sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
         scoring_fns: []
         server:
           port: 8321
         shields: []
         tool_groups:
         - provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-18 10:44:35 -07:00
Ashwin Bharambe
9e3ae50306
feat(server): construct the stack in a persistent event loop (#2818)
When we call `construct_stack()`, providers are instantiated and
`initialize()` is called. This call can end up doing _anything_ at all
-- specifically, providers are free to create long running background
tasks as part of this. If we wrapped this within a `asyncio.run()` as in
the current code, these tasks get canceled when the stack construction
finishes. This is not correct. The PR addresses the issue by creating a
persistent event loop which is used for both the stack as well as for
running the uvicorn server. In other words, the lifetime of the
providers (and downstream async code) is now the same as the lifetime of
the uvicorn server.

## Test Plan

This should not affect any current code since we don't have background
tasks created right now. However,
https://github.com/meta-llama/llama-stack/pull/2805 will start using
this functionality.
2025-07-18 10:29:19 -07:00
Charlie Doern
6c516d391b
fix: de-clutter llama stack run logs (#2783)
# What does this PR do?

currently each disabled provider is printed as a warning, switch to
debug. This level of verbosity isn't necessary, especially if we intend
to grow the list of providers over time that can be in a single run yaml


## Test Plan

before:

<img width="1144" height="667" alt="Screenshot 2025-07-16 at 12 37
18 PM"
src="https://github.com/user-attachments/assets/d14dbf76-6e40-4996-8a27-111e6a987d71"
/>

after:
<img width="925" height="141" alt="Screenshot 2025-07-16 at 12 37 42 PM"
src="https://github.com/user-attachments/assets/81efdbe1-923c-4c5f-9731-f89729043920"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-16 09:44:26 -07:00
Nathan Weinberg
b3d86ca926
fix: stop image_name from being cast to an integer (#2759)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 2m1s
# What does this PR do?

https://github.com/meta-llama/llama-stack/pull/2490 introduced a new
function for type conversion of strings.

However, a side effect of this is that it will cast any string that can
be cast to an integer if possible, which for something like `image_name`
is not desired as we only accept strings for this field in the
`StackRunConfig`

This PR introduces logic to ensure that `image_name` remains a string 

Closes #2749

## Test Plan

You can run the original step to reproduce from the bug to verify this
manually
```bash
OPENAI_API_KEY=bogus llama stack build --image-type venv --image-name 2745 --providers inference=remote::openai --run
```

I have also added an additional unit test to prevent any future
regression here

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-15 09:44:21 -07:00
Francisco Arceo
31b088978a
fix: Fix /vector-stores/create API when vector store with duplicate name (#2617)
# What does this PR do?

Resolves https://github.com/meta-llama/llama-stack/issues/2735

Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).

This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.

Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.

NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.

## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv

LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```

Unit tests and test script below 👇 

<details> 
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>

```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os

# Initialize colorama for color support in terminal
init(autoreset=True)

# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2


def colored_print(color, text):
    """Prints text to the console with the specified color."""
    print(f"{color}{text}{Style.RESET_ALL}")


def log_and_print(color, message, level=logging.INFO):
    """Logs a message and prints it to the console with the specified color."""
    logging.log(level, message)
    colored_print(color, message)


def run_tests(client, prefix="openai"):
    """
    Runs all tests using the provided OpenAI client and saves the output
    to JSON files with the given prefix.
    """
    # Create the directory if it doesn't exist
    os.makedirs('openai_testing', exist_ok=True)

    # Default values in case tests fail
    global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
    DEMO_VECTOR_STORE_ID = None
    DEMO_VECTOR_STORE_ID2 = None

    def test_idempotent_vector_store_creation():
        """
        Test that creating a vector store with the same name is idempotent.
        """
        log_and_print(Fore.BLUE, "Starting vector store creation test...")
        try:
            vector_store = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Attempt to create the same vector store again
            vector_store2 = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Check instead of assert
            if vector_store2.id != vector_store.id:
                log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)

            global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
            DEMO_VECTOR_STORE_ID = vector_store.id
            DEMO_VECTOR_STORE_ID2 = vector_store2.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
        except Exception as e:
            log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            # Create a fallback vector store ID if needed
            if 'vector_store' in locals() and vector_store:
                DEMO_VECTOR_STORE_ID = vector_store.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2

    def test_vector_store_list():
        """
        Test listing vector stores.
        """
        log_and_print(Fore.BLUE, "Starting vector store list test...")
        try:
            vector_stores = client.vector_stores.list()

            # Check instead of assert
            if not isinstance(vector_stores, pagination.SyncCursorPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Vector store list test passed!")

            vector_stores_data = vector_stores.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
                json.dump(vector_stores_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_retrieve_vector_store():
        """
        Test retrieving a specific vector store.
        """
        log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            vector_store = client.vector_stores.retrieve(
                vector_store_id=DEMO_VECTOR_STORE_ID,
            )

            # Check instead of assert
            if vector_store.id != DEMO_VECTOR_STORE_ID:
                log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Retrieve vector store test passed!")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_modify_vector_store():
        """
        Test modifying a vector store.
        """
        log_and_print(Fore.BLUE, "Starting modify vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            updated_vector_store = client.vector_stores.update(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                name="Updated Support FAQ FJA",
            )

            # Check instead of assert
            if updated_vector_store.name != "Updated Support FAQ FJA":
                log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Modify vector store test passed!")

            updated_vector_store_data = updated_vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
                json.dump(updated_vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_delete_vector_store():
        """
        Test deleting a vector store.
        """
        log_and_print(Fore.BLUE, "Starting delete vector store test...")
        if not DEMO_VECTOR_STORE_ID2:
            log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
                          level=logging.WARNING)
            return

        try:
            response = client.vector_stores.delete(
                vector_store_id=DEMO_VECTOR_STORE_ID2,
            )

            log_and_print(Fore.GREEN, "Delete vector store test passed!")

            response_data = response.to_dict()
            log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
                json.dump(response_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_create_vector_store_file():
        log_and_print(Fore.BLUE, "Starting create vector store file test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            # create jsonl of files as an example
            with open("mydata.jsonl", "w") as f:
                f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')

            # Create a simple text file if my_data_small.txt doesn't exist
            if not os.path.exists("my_data_small.txt"):
                with open("my_data_small.txt", "w") as f:
                    f.write("This is a test file for vector store testing.\n")

            created_file = client.files.create(
                file=open("my_data_small.txt", "rb"),
                purpose="assistants",
            )

            created_file_data = created_file.to_dict()
            log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
                json.dump(created_file_data, f, indent=2)

            retrieved_files = client.files.retrieve(created_file.id)
            retrieved_files_data = retrieved_files.to_dict()
            log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
                json.dump(retrieved_files_data, f, indent=2)

            vector_store_file = client.vector_stores.files.create(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                file_id=created_file.id,
            )
            log_and_print(Fore.GREEN, "Create vector store file test passed!")
        except Exception as e:
            log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_search_vector_store():
        """
        Test searching a vector store.
        """
        log_and_print(Fore.BLUE, "Starting search vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            query = "What is the banana policy?"
            search_results = client.vector_stores.search(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                query=query,
                max_num_results=10,
                ranking_options={
                    'ranker': 'default-2024-11-15',
                    'score_threshold': 0.0,
                },
                rewrite_query=False,
            )

            # Check instead of assert
            if not isinstance(search_results, pagination.SyncPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Search vector store test passed!")

            search_results_dict = search_results.to_dict()
            log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
            with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
                json.dump(search_results_dict, f, indent=2)

            log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
        except Exception as e:
            log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    # Run all tests in sequence, even if some fail
    test_results = []

    try:
        result = test_idempotent_vector_store_creation()
        if result and len(result) == 2:
            DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
        test_results.append(True)
    except Exception as e:
        log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
        test_results.append(False)

    for test_func in [
        test_vector_store_list,
        test_retrieve_vector_store,
        test_modify_vector_store,
        test_delete_vector_store,
        test_create_vector_store_file,
        test_search_vector_store
    ]:
        try:
            test_func()
            test_results.append(True)
        except Exception as e:
            log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            test_results.append(False)

    if all(test_results):
        log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
    else:
        failed_count = test_results.count(False)
        log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
    parser.add_argument(
        "--provider",
        type=str,
        default="llama",
        choices=["openai", "llama", "both"],
        help="Specify which environment to test: openai, llama, or both. Default is both.",
    )
    args = parser.parse_args()

    try:
        if args.provider in ("openai", "both"):
            openai_client = OpenAI()
            run_tests(openai_client, prefix="openai")

        if args.provider in ("llama", "both"):
            llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
            run_tests(llama_client, prefix="llama")

        log_and_print(Fore.GREEN, "All tests completed!")

    except Exception as e:
        log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
```
</details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 11:24:41 -04:00
Sébastien Han
2e4eedce14
fix: container build on podman (#2723)
# What does this PR do?

COPY with chmod does not work, see
https://github.com/containers/buildah/issues/4614. Also Docker arguably
implements it.

Anyway, this command is not even needed since later don't we do:

```
RUN mkdir -p /.llama /.cache && chmod -R g+rw /app /.llama /.cache
```

And providers.d will get the right modes.

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

Build with CONTAINER_BINARY=podman and success

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-11 16:25:33 +02:00
ehhuang
d880c2df0e
fix: auth sql store: user is owner policy (#2674)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Installer CI / lint (push) Failing after 4s
Installer CI / smoke-test (push) Has been skipped
Integration Tests / discover-tests (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Integration Tests / test-matrix (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Pre-commit / pre-commit (push) Successful in 1m8s
# What does this PR do?
The current authorized sql store implementation does not respect
user.principal (only checks attributes). This PR addresses that.


## Test Plan

Added test cases to integration tests.
2025-07-10 14:40:32 -07:00
Nathan Weinberg
bbe0199bb7
chore: update pre-commit hook versions (#2708)
While investigating the `uv.lock` changes made in
https://github.com/meta-llama/llama-stack/pull/2695 I noticed several of
the pre-commit hook versions were out of date

This PR updates them and fixes some new `ruff` errors

---------

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-10 16:47:59 +02:00
Charlie Doern
81ebaf6e9a
fix: properly represent paths in server logs (#2698)
# What does this PR do?

currently when logging the run yaml, if there are path objects in the
object they are represented as:

```
         external_providers_dir: !!python/object/apply:pathlib.PosixPath
         - '~'
         - .llama
         - providers.d
```

now, with a config.model_dump(mode="json"), it works properly

```
         external_providers_dir: ~/.llama/providers.d
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-10 10:19:12 -04:00
Sébastien Han
9b7eecebcf
ci: test safety with starter (#2628)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 25s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 27s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 14s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 1m7s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 31s
Python Package Build Test / build (3.13) (push) Failing after 29s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?

We are now testing the safety capability with the starter image. This
includes a few changes:

* Enable the safety integration test
* Relax the shield model requirements from llama-guard to make it work
  with llama-guard3:8b coming from Ollama
* Expose a shield for each inference provider in the starter distro. The
  shield will only be registered if the provider is enabled.

Closes: https://github.com/meta-llama/llama-stack/issues/2528

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-09 16:53:50 +02:00
ehhuang
c8bac888af
feat(auth): support github tokens (#2509)
# What does this PR do?

This PR adds GitHub OAuth authentication support to Llama Stack,
allowing users to
  authenticate using their GitHub credentials (#2508) . 

1. support verifying github acesss tokens
2. support provider-specific auth error messages
3. opportunistic reorganized the auth configs for better ergonomics

## Test Plan
Added unit tests.

Also tested e2e manually:
```
server:
  port: 8321
  auth:
    provider_config:
      type: github_token
```
```
~/projects/llama-stack/llama_stack/ui
❯ curl -v http://localhost:8321/v1/models
* Host localhost:8321 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8321...
* Connected to localhost (::1) port 8321
> GET /v1/models HTTP/1.1
> Host: localhost:8321
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 401 Unauthorized
< date: Fri, 27 Jun 2025 21:51:25 GMT
< server: uvicorn
< content-type: application/json
< x-trace-id: 5390c6c0654086c55d87c86d7cbf2f6a
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
{"error": {"message": "Authentication required. Please provide a valid GitHub access token (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) in the Authorization header (Bearer <token>)"}}
~/projects/llama-stack/llama_stack/ui
❯ ./scripts/unit-tests.sh


~/projects/llama-stack/llama_stack/ui
❯ curl "http://localhost:8321/v1/models" \
-H "Authorization: Bearer <token_obtained_from_github>" \

{"data":[{"identifier":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_resource_id":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-guard-3-8b","provider_resource_id":"accounts/fireworks/models/llama-guard-3-8b","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-v3p1-405b-instruct","provider_resource_id":"accounts/f
```

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-08 11:02:36 -07:00
Charlie Doern
27b3cd570f
fix: use --template flag for server (#2643)
# What does this PR do?

currently when a template is used, we still use `--config`.

`server.py` has a dedicated `--template` flag and logic, use that
instead

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-08 00:48:50 -07:00
Sébastien Han
c4349f532b
feat: consolidate most distros into "starter" (#2516)
# What does this PR do?

* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
  since inference providers are disabled by default and can be turned on
  manually via env variable.
* Disables safety in starter distro

Closes: https://github.com/meta-llama/llama-stack/issues/2502.

~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~

TODO:

- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 15:58:03 +02:00
Akram Ben Aissi
f4950f4ef0
fix: AccessDeniedError leads to HTTP 500 instead of error 403 (#2595)
Resolves access control error visibility issues where 500 errors were
returned instead of proper 403 responses with actionable error messages.

• Enhance AccessDeniedError with detailed context and improve exception
handling
• Enhanced AccessDeniedError class to include user, action, and resource
context
  - Added constructor parameters for action, resource, and user
- Generate detailed error messages showing user principal, attributes,
and attempted resource
- Backward compatible with existing usage (falls back to generic
message)

• Updated exception handling in server.py
  - Import AccessDeniedError from access_control module
  - Return proper 403 status codes with detailed error messages
- Separate handling for PermissionError (generic) vs AccessDeniedError
(detailed)

• Enhanced error context at raise sites
- Updated routing_tables/common.py to pass action, resource, and user
context
- Updated agents persistence to include context in access denied errors
  - Provides better debugging information for access control issues

• Added comprehensive unit tests
  - Created tests/unit/server/test_server.py with 13 test cases
  - Covers AccessDeniedError with and without context
- Tests all exception types (ValidationError, BadRequestError,
AuthenticationRequiredError, etc.)
  - Validates proper HTTP status codes and error message formats


# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

```
server:
  port: 8321
    access_policy:
    - permit:
        principal: admin
        actions: [create, read, delete]
        when: user with admin in groups
    - permit:
        actions: [read]
        when: user with system:authenticated in roles
```
then:

```
curl --request POST --url http://localhost:8321/v1/vector-dbs \
  --header "Authorization: Bearer your-bearer" \
  --data '{
    "vector_db_id": "my_demo_vector_db",
    "embedding_model": "ibm-granite/granite-embedding-125m-english",
    "embedding_dimension": 768,
    "provider_id": "milvus"
  }'
 
```

depending if user is in group admin or not, you should get the
`AccessDeniedError`. Before this PR, this was leading to an error 500
and `Traceback` displayed in the logs.
After the PR, logs display a simpler error (unless DEBUG logging is set)
and a 403 Forbidden error is returned on the HTTP side.

---------

Signed-off-by: Akram Ben Aissi <<akram.benaissi@gmail.com>>
2025-07-03 10:50:49 -07:00
ehhuang
3c43a2f529
fix: store configs (#2593)
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo,
as the config expected a str but the value was converted to int.

This PR:
1. Updates the type of port in sqlstore to be int
2. template generation uses `dict` instead of `StackRunConfig` so as to
avoid failing pydantic typechecks.
3. Adds `replace_env_vars` to StackRunConfig instantiation in
`configure.py` (not sure why this wasn't needed before).

## Test Plan
`llama stack build --template postgres_demo --image-type conda --run`
2025-07-03 10:07:23 -07:00
Sébastien Han
25268854bc
fix: allow default empty vars for conditionals (#2570)
# What does this PR do?

We were not using conditionals correctly, conditionals can only be used
when the env variable is set, so `${env.ENVIRONMENT:+}` would return
None is ENVIRONMENT is not set.

If you want to create a conditional value, you need to do
`${env.ENVIRONMENT:=}`, this will pick the value of ENVIRONMENT if set,
otherwise will return None.

Closes: https://github.com/meta-llama/llama-stack/issues/2564

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-01 14:42:05 +02:00
Juanma
e7eb9f9adc
fix: dataset metadata without provider_id (#2527)
# What does this PR do?
Fixes an error when inferring dataset provider_id with metadata

Closes #[2506](https://github.com/meta-llama/llama-stack/issues/2506)

Signed-off-by: Juanma Barea <juanmabareamartinez@gmail.com>
2025-06-27 08:51:29 -04:00
Wen Zhou
8c3f2762fb
build: update temp. created Containerfile (#2492)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- conditionally created folder /.llama/providers.d if
external_providers_dir is set
- do not create /.cache folder, not in use anywhere
- combine chmod and copy to one command


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
updated test:

```
export CONTAINER_BINARY=podman
LLAMA_STACK_DIR=. uv run llama stack build --template remote-vllm --image-type container --image-name  <name>
```
log:
```
Containerfile created successfully in /tmp/tmp.rPMunE39Aw/Containerfile

FROM python:3.11-slim
WORKDIR /app

RUN apt-get update && apt-get install -y        iputils-ping net-tools iproute2 dnsutils telnet        curl wget telnet git       procps psmisc lsof        traceroute        bubblewrap        gcc        && rm -rf /var/lib/apt/lists/*

ENV UV_SYSTEM_PYTHON=1
RUN pip install uv
RUN uv pip install --no-cache sentencepiece pillow pypdf transformers pythainlp faiss-cpu opentelemetry-sdk requests datasets chardet scipy nltk numpy matplotlib psycopg2-binary aiosqlite langdetect autoevals tree_sitter tqdm pandas chromadb-client opentelemetry-exporter-otlp-proto-http redis scikit-learn openai pymongo emoji sqlalchemy[asyncio] mcp aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
RUN uv pip install --no-cache sentence-transformers --no-deps
RUN uv pip install --no-cache torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Allows running as non-root user
RUN mkdir -p /.llama/providers.d /.cache
RUN uv pip install --no-cache llama-stack
RUN pip uninstall -y uv
ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "remote-vllm"]

RUN chmod -R g+rw /app /.llama /.cache

PWD: /tmp/llama-stack
Containerfile: /tmp/tmp.rPMunE39Aw/Containerfile
+ podman build --progress=plain --security-opt label=disable --platform linux/amd64 -t distribution-remote-vllm:0.2.12 -f /tmp/tmp.rPMunE39Aw/Containerfile /tmp/llama-stack
....
Success!
Build Successful!
You can find the newly-built template here: /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml
You can run the new Llama Stack distro via: llama stack run /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml --image-type container
```

```
podman tag localhost/distribution-remote-vllm:dev quay.io/wenzhou/distribution-remote-vllm:2492_2
podman push quay.io/wenzhou/distribution-remote-vllm:2492_2



docker run --rm -p 8321:8321 -e INFERENCE_MODEL="meta-llama/Llama-2-7b-chat-hf" -e VLLM_URL="http://localhost:8000/v1" quay.io/wenzhou/distribution-remote-vllm:2492_2 --port 8321

INFO     2025-06-26 13:47:31,813 __main__:436 server: Using template remote-vllm config file:                                                         
         /app/llama-stack-source/llama_stack/templates/remote-vllm/run.yaml                                                                           
INFO     2025-06-26 13:47:31,818 __main__:438 server: Run configuration:                                                                              
INFO     2025-06-26 13:47:31,826 __main__:440 server: apis:                                                                                           
         - agents                                                                                                                                     
         - datasetio                                                                                                                                  
         - eval                                                                                                                                       
         - inference                                                                                                                                  
         - safety                                                                                                                                     
         - scoring                                                                                                                                    
         - telemetry                                                                                                                                  
         - tool_runtime                                                                                                                               
         - vector_io                                                                                                                                  
         benchmarks: []                                                                                                                               
         container_image: null                                                                                                                        
....                                                                                                 
```
-----
previous test:
local run` >llama stack build --template remote-vllm --image-type
container`
image stored in  `quay.io/wenzhou/distribution-remote-vllm:2492`

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-06-27 10:23:12 +02:00
Ben Browning
0883944bc3
fix: Some missed env variable changes from PR 2490 (#2538)
Some checks failed
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 25s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 28s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 34s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 32s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 29s
Pre-commit / pre-commit (push) Successful in 1m1s
# What does this PR do?

Some templates were still using the old environment variable substition
syntax instead of the new one and were not getting substituted properly.

Also, some places didn't handle the new None vs old empty string ("")
values that come from the conditional environment variable substitution.

This gets the starter and remote-vllm distributions starting again, and
I tested various permutations of the starter as chroma and pgvector
needed some adjustments to their config classes to handle the new
possible `None` values. And, I had to tweak our `Provider` class to also
handle `None` values, for cases where we disable providers in the
starter config via environment variables.

This may not have caught everything that was missed, but I did grep
around quite a bit to try and find anything lingering.

## Test Plan

The following permutations now all run (or attempt to run to the point
of complaining that they can't connect to chroma, vllm, etc) when before
they failed immediately on startup because of bad environment variable
substitions:

```
uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_SQLITE_VEC=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_PGVECTOR=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_CHROMADB=true uv run llama stack run llama_stack/templates/starter/run.yaml

uv run llama stack run llama_stack/templates/remote-vllm/run.yaml
```
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: raghotham <rsm@meta.com>
2025-06-26 17:59:15 -07:00
Sébastien Han
43c1f39bd6
refactor(env)!: enhanced environment variable substitution (#2490)
# What does this PR do?

This commit significantly improves the environment variable substitution
functionality in Llama Stack configuration files:
* The version field in configuration files has been changed from string
to integer type for better type consistency across build and run
configurations.

* The environment variable substitution system for ${env.FOO:} was fixed
and properly returns an error

* The environment variable substitution system for ${env.FOO+} returns
None instead of an empty strings, it better matches type annotations in
config fields

* The system includes automatic type conversion for boolean, integer,
and float values.

* The error messages have been enhanced to provide clearer guidance when
environment variables are missing, including suggestions for using
default values or conditional syntax.

* Comprehensive documentation has been added to the configuration guide
explaining all supported syntax patterns, best practices, and runtime
override capabilities.

* Multiple provider configurations have been updated to use the new
conditional syntax for optional API keys, making the system more
flexible for different deployment scenarios. The telemetry configuration
has been improved to properly handle optional endpoints with appropriate
validation, ensuring that required endpoints are specified when their
corresponding sinks are enabled.

* There were many instances of ${env.NVIDIA_API_KEY:} that should have
caused the code to fail. However, due to a bug, the distro server was
still being started, and early validation wasn’t triggered. As a result,
failures were likely being handled downstream by the providers. I’ve
maintained similar behavior by using ${env.NVIDIA_API_KEY:+}, though I
believe this is incorrect for many configurations. I’ll leave it to each
provider to correct it as needed.

* Environment variable substitution now uses the same syntax as Bash
parameter expansion.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:20:08 +05:30
Sébastien Han
36d70637b9
fix: finish conversion to StrEnum (#2514)
# What does this PR do?

We still had a few enum declared to behave like string as well as enum.
Let's use StrEnum for those.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:01:26 +05:30
Sébastien Han
ac5fd57387
chore: remove nested imports (#2515)
# What does this PR do?

* Given that our API packages use "import *" in `__init.py__` we don't
need to do `from llama_stack.apis.models.models` but simply from
llama_stack.apis.models. The decision to use `import *` is debatable and
should probably be revisited at one point.

* Remove unneeded Ruff F401 rule
* Consolidate Ruff F403 rule in the pyprojectfrom
llama_stack.apis.models.models

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:01:05 +05:30
Ben Browning
fa0b0c13d4
fix: Ollama should be optional in starter distro (#2482)
Some checks failed
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 1m10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m8s
Python Package Build Test / build (3.13) (push) Failing after 1m6s
Test External Providers / test-external-providers (venv) (push) Failing after 1m4s
Pre-commit / pre-commit (push) Successful in 2m33s
# What does this PR do?

Our starter distro required Ollama to be running (and a large list of
models available in that Ollama) to successfully start. This adjusts
things so that Ollama does not have to be running to use the starter
template / distro.

To accomplish this, a few changes were needed:

* The Ollama provider is now configurable whether it raises an Exception
or just logs a warning when it cannot reach the Ollama server on
startup. The default is to raise an exception (same as previous
behavior), but in the starter template we adjust this to just log a
warning so that we can bring the stack up without needing a running
Ollama server.

* The starter template no longer specifies a default list of models for
Ollama, as any models specified there need to actually be pulled and
available in Ollama. Instead, it adds a new
`OLLAMA_INFERENCE_MODEL` environment variable where users can provide an
optional model to register with the Ollama provider on startup.
Additional models can also be registered via the typical
`models.register(...)` at runtime.

* The vLLM template was adjusted to also allow an optional
`VLLM_INFERENCE_MODEL` specified on startup, so that the behavior
between vLLM and Ollama was consistent here to make it easy to get up
and running quickly.

* The default vector store was changed from sqlite-vec to faiss.
sqlite-vec can enabled via setting the `ENABLE_SQLITE_VEC` environment
variable, like we do for chromadb and pgvector. This is due to
sqlite-vec not shipping proper arm64 binaries, like we previously fixed
in #1530 for the ollama distribution.

## Test Plan

With this change, the following scenarios now work with the starter
template that did not before:

* no Ollama running
* Ollama running but not all of the Llama models pulled locally
* Ollama running with a custom model registered on startup
* vLLM running with a custom model registered on startup
* running the starter template on linux/arm64, like when running
containers on Mac without rosetta emulation

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-25 15:54:00 +02:00