# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
1. Adds a broad schema for custom exception classes in the Llama Stack
project
2. Creates a new `DatasetNotFoundError` class
3. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
1. Creates a new `ModelNotFoundError` class
2. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
- Initialize route_impls to None in constructor to prevent
AttributeError
- Consolidate initialization checks to single point in request() method
- Improve error message to be more helpful ("Please call initialize()
first")
- Add comprehensive test suite to prevent regressions
The library client now has better error handling when users forget to
call initialize(), showing a clear ValueError instead of confusing
AttributeError. All initialization validation is now centralized in the
request() method, with internal methods (_call_non_streaming,
_call_streaming, _convert_body) relying on this single check for
cleaner, more maintainable code.
closes#2943
## Test Plan
`./scripts/unit-tests.sh`
Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.
For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.
As expected, tests become much much faster (more than 3x in just
inference testing.)
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
uv run pytest -s -v tests/integration/inference \
--stack-config=starter \
-k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
--text-model="ollama/llama3.2:3b-instruct-fp16" \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2
```
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
uv run pytest -s -v tests/integration/inference \
--stack-config=starter \
-k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
--text-model="ollama/llama3.2:3b-instruct-fp16" \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2
```
- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)
# What does this PR do?
the server logs have a persistent `core: refreshing registry` log that
clogs up the output. Switch it to debug
this is what it looked like:
<img width="1126" height="1028" alt="Screenshot 2025-07-28 at 9 56
44 AM"
src="https://github.com/user-attachments/assets/a1880fd3-7fc7-4a97-bfb8-89a62e4c5c19"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
in #2637, I combined the run and build config provider types to both use
`Provider`
since this includes a provider_id, a user must now specify this when
writing a build yaml. This is not very clear because all a user should
care about upon build is the code to be installed (the module and the
provider_type)
introduce `BuildProvider` and fixup the parts of the code impacted by
this
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Avoid the error message:
```
INFO 2025-07-24 21:51:54,530 __main__:598 server: Received interrupt signal, shutting down gracefully...
ERROR 2025-07-24 21:51:54,692 asyncio:1826 uncategorized: Task was destroyed but it is pending!
task: <Task pending name='Task-15' coro=<refresh_registry() running at
/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/stack.py:356> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=>
```
# What does this PR do?
Today, external providers are installed via the `external_providers_dir`
in the config. This necessitates users to understand the `ProviderSpec`
and set up their directories accordingly. This process splits up the
config for the stack across multiple files, directories, and formats.
Most (if not all) external providers today have a
[get_provider_spec](559cb18fbb/src/ramalama_stack/provider.py (L9))
method that sits unused. Utilizing this method rather than the
providers.d route allows for a much easier installation process for
external providers and limits the amount of extra configuration a
regular user has to do to get their stack off the ground.
To accomplish this and wire it throughout the build process, Introduce
the concept of a `module` for users to specify for an external provider
upon build time. In order to facilitate this, align the build and run
spec to use `Provider` class rather than the stringified provider_type
that build currently uses.
For example, say this is in your build config:
```
- provider_id: ramalama
provider_type: remote::ramalama
module: ramalama_stack
```
during build (in the various `build_...` scripts), additionally to
installing any pip dependencies we will also install this module and use
the `get_provider_spec` method to retrieve the ProviderSpec that is
currently specified using `providers.d`.
In production so far, providing instructions for installing external
providers for users has been difficult: they need to install the module
as a pre-req, create the providers.d directory, copy in the provider
spec, and also copy in the necessary build/run yaml files. Accessing an
external provider should be as easy as possible, and pointing to its
installable module aligns more with the rest of our build and dependency
management process.
For now, `external_providers_dir` still exists as an alternate more
declarative method of using external providers.
## Test Plan
added an integration test installing an external provider from module
and more unit test coverage for `get_provider_registry`
( the warning in yellow is expected, the module is installed inside of
the build env, not where we are running the command)
<img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30
48 AM"
src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d"
/>
<img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31
14 AM"
src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc"
/>
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
- Added ability to specify `required_scope` when declaring an API. This
is part of the `@webmethod` decorator.
- If auth is enabled, a user can access an API only if
`user.attributes['scope']` includes the `required_scope`
- We add `required_scope='telemetry.read'` to the telemetry read APIs.
## Test Plan
CI with added tests
1. Enable server.auth with github token
2. Observe `client.telemetry.query_traces()` returns 403
# What does this PR do?
Prototype on a new feature to allow new APIs to be plugged in Llama
Stack. Opened for early feedback on the approach and test appetite on
the functionality.
@ashwinb @raghotham open for early feedback, thanks!
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
This flips #2823 and #2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.
In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
# What does this PR do?
Adds type guards in /distribution/inspect.py and ignores a valid-type
mypy error in library_client.py. This PR is part of issue #2647 . I'm
rather unsure whether ignoring the valid-type error is correct in this
case. It appears that args[0] is interpreted as [any] but I didn't find
any way to specify the type.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
- Use printf to to escape special characters (e.g. < > )
- Apply escaping to pip_dependencies and special_pip_deps
Resolves shell interpretation of >= operators as redirections that were
causing build failing to respect versions and unexpected file creation
in /app directory.
Closes: #2866
## Test Plan
Manually tested, will also be tested by existing CI
Signed-off-by: Derek Higgins <derekh@redhat.com>
- rm TEMP_DIR when build_container.sh succeeds
- prevents multiple temp directories with Containerfile being left in
/tmp
Signed-off-by: Derek Higgins <derekh@redhat.com>
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/2716/ broke commands
like:
```
python -m llama_stack.distribution.server.server --config
llama_stack/templates/starter/run.yaml
```
And will fail with:
```
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 626, in <module>
main()
File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 402, in main
config_file = resolve_config_or_template(args.config, Mode.RUN)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/utils/config_resolution.py", line 43, in resolve_config_or_template
config_path = Path(config_or_template)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 1162, in __init__
super().__init__(*args)
File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 373, in __init__
raise TypeError(
TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'NoneType'
```
Complaining that no positional arguments are present. We now honour the
deprecation until --config and --template are removed completely.
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Both ` python -m llama_stack.distribution.server.server --config
llama_stack/templates/starter/run.yaml` and ` python -m
llama_stack.distribution.server.server
llama_stack/templates/starter/run.yaml` should run the server. Same for
`--template starter`.
Signed-off-by: Sébastien Han <seb@redhat.com>
- Remove --no-cache flags from uv pip install commands to enable caching
- Mount host uv cache directory to container for persistent caching
- Set UV_LINK_MODE=copy to prevent uv using hardlinks
- When building the starter image
o Build time reduced from ~4:45 to ~3:05 on subsequent builds
(environment specific)
o Eliminates re-downloading of 3G+ of data on each build
o Cache size: ~6.2G (when building starter image)
Fixes excessive data downloads during distro container builds.
Signed-off-by: Derek Higgins <derekh@redhat.com>
This PR updates model registration and lookup behavior to be slightly
more general / flexible. See
https://github.com/meta-llama/llama-stack/issues/2843 for more details.
Note that this change is backwards compatible given the design of the
`lookup_model()` method.
## Test Plan
Added unit tests
# What does this PR do?
Refactors the vector store routing logic by moving OpenAI-compatible
vector store operations from the `VectorIORouter` to the
`VectorDBsRoutingTable`.
Closes https://github.com/meta-llama/llama-stack/issues/2761
## Test Plan
Added unit tests to cover new routing logic and ACL checks.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Part of #2696
## Test Plan
Run `llama stack run starter`
Error:
```
myenv ❯ llama stack run starters
WARNING 2025-07-10 12:12:43,052 llama_stack.cli.stack.run:82 server: Conda detected. Using conda environment myenv for the run.
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
[--image-type {conda,venv}] [--enable-ui]
[config | template]
llama stack run: error: Could not resolve config or template 'starters'.
Tried the following locations:
1. As file path: /Users/erichuang/projects/llama-stack-git/starters
2. As template: /Users/erichuang/projects/llama-stack-git/llama_stack/templates/starters/run.yaml
3. As built distribution: (/Users/erichuang/.llama/distributions/llamastack-starters/starters-run.yaml, /Users/erichuang/.llama/distributions/starters/starters-run.yaml)
Available templates: dell, test-env, vllm-gpu, test-template, cerebras, openai-api-verification, sambanova, passthrough, direct-config, together, openai, fireworks, meta-reference-gpu, __pycache__, dev, ollama, watsonx, remote-vllm, llama_api, groq, dummy, oracle, nvidia, ci-tests, postgres-demo, test-stack, bedrock, starter, hf-serverless, hf-endpoint, tgi, open-benchmark, verification
Did you mean one of these templates?
- starter
- together
- postgres-demo
```
# What does this PR do?
After https://github.com/meta-llama/llama-stack/pull/2818, SIGINT will
print a stack trace. This is because uvicorn re-raises SIGINT and it
gets converted by Python internal signal handler (default handles
SIGINT) to KeyboardInterrupt exception. We know simply catch the
exception to get a clean exit, this is not changing the behavior on
SIGINT.
## Test Plan
Run the server, hit Ctrl+C or `kill -2 <server pid>` and expect a clean
exit with no stack trace.
Signed-off-by: Sébastien Han <seb@redhat.com>
Just like #2805 but for vLLM.
We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.
## Test Plan
Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:
```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
uv run llama stack run --image-type venv /tmp/starter.yaml
```
Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.
_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._
## Test Plan
Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:
```
LLAMA_STACK_LOGGING=all=debug \
ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```
See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
When we call `construct_stack()`, providers are instantiated and
`initialize()` is called. This call can end up doing _anything_ at all
-- specifically, providers are free to create long running background
tasks as part of this. If we wrapped this within a `asyncio.run()` as in
the current code, these tasks get canceled when the stack construction
finishes. This is not correct. The PR addresses the issue by creating a
persistent event loop which is used for both the stack as well as for
running the uvicorn server. In other words, the lifetime of the
providers (and downstream async code) is now the same as the lifetime of
the uvicorn server.
## Test Plan
This should not affect any current code since we don't have background
tasks created right now. However,
https://github.com/meta-llama/llama-stack/pull/2805 will start using
this functionality.
# What does this PR do?
currently each disabled provider is printed as a warning, switch to
debug. This level of verbosity isn't necessary, especially if we intend
to grow the list of providers over time that can be in a single run yaml
## Test Plan
before:
<img width="1144" height="667" alt="Screenshot 2025-07-16 at 12 37
18 PM"
src="https://github.com/user-attachments/assets/d14dbf76-6e40-4996-8a27-111e6a987d71"
/>
after:
<img width="925" height="141" alt="Screenshot 2025-07-16 at 12 37 42 PM"
src="https://github.com/user-attachments/assets/81efdbe1-923c-4c5f-9731-f89729043920"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/2490 introduced a new
function for type conversion of strings.
However, a side effect of this is that it will cast any string that can
be cast to an integer if possible, which for something like `image_name`
is not desired as we only accept strings for this field in the
`StackRunConfig`
This PR introduces logic to ensure that `image_name` remains a string
Closes#2749
## Test Plan
You can run the original step to reproduce from the bug to verify this
manually
```bash
OPENAI_API_KEY=bogus llama stack build --image-type venv --image-name 2745 --providers inference=remote::openai --run
```
I have also added an additional unit test to prevent any future
regression here
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
Resolves https://github.com/meta-llama/llama-stack/issues/2735
Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).
This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.
Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.
NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.
## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```
Unit tests and test script below 👇
<details>
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>
```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os
# Initialize colorama for color support in terminal
init(autoreset=True)
# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2
def colored_print(color, text):
"""Prints text to the console with the specified color."""
print(f"{color}{text}{Style.RESET_ALL}")
def log_and_print(color, message, level=logging.INFO):
"""Logs a message and prints it to the console with the specified color."""
logging.log(level, message)
colored_print(color, message)
def run_tests(client, prefix="openai"):
"""
Runs all tests using the provided OpenAI client and saves the output
to JSON files with the given prefix.
"""
# Create the directory if it doesn't exist
os.makedirs('openai_testing', exist_ok=True)
# Default values in case tests fail
global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
DEMO_VECTOR_STORE_ID = None
DEMO_VECTOR_STORE_ID2 = None
def test_idempotent_vector_store_creation():
"""
Test that creating a vector store with the same name is idempotent.
"""
log_and_print(Fore.BLUE, "Starting vector store creation test...")
try:
vector_store = client.vector_stores.create(
name=DEMO_VECTOR_STORE_NAME,
)
# Attempt to create the same vector store again
vector_store2 = client.vector_stores.create(
name=DEMO_VECTOR_STORE_NAME,
)
# Check instead of assert
if vector_store2.id != vector_store.id:
log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")
vector_store_data = vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
json.dump(vector_store_data, f, indent=2)
global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
DEMO_VECTOR_STORE_ID = vector_store.id
DEMO_VECTOR_STORE_ID2 = vector_store2.id
return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
except Exception as e:
log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
# Create a fallback vector store ID if needed
if 'vector_store' in locals() and vector_store:
DEMO_VECTOR_STORE_ID = vector_store.id
return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
def test_vector_store_list():
"""
Test listing vector stores.
"""
log_and_print(Fore.BLUE, "Starting vector store list test...")
try:
vector_stores = client.vector_stores.list()
# Check instead of assert
if not isinstance(vector_stores, pagination.SyncCursorPage):
log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Vector store list test passed!")
vector_stores_data = vector_stores.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
json.dump(vector_stores_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_retrieve_vector_store():
"""
Test retrieving a specific vector store.
"""
log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
vector_store = client.vector_stores.retrieve(
vector_store_id=DEMO_VECTOR_STORE_ID,
)
# Check instead of assert
if vector_store.id != DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Retrieve vector store test passed!")
vector_store_data = vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
json.dump(vector_store_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_modify_vector_store():
"""
Test modifying a vector store.
"""
log_and_print(Fore.BLUE, "Starting modify vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
updated_vector_store = client.vector_stores.update(
vector_store_id=DEMO_VECTOR_STORE_ID,
name="Updated Support FAQ FJA",
)
# Check instead of assert
if updated_vector_store.name != "Updated Support FAQ FJA":
log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Modify vector store test passed!")
updated_vector_store_data = updated_vector_store.to_dict()
log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
json.dump(updated_vector_store_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_delete_vector_store():
"""
Test deleting a vector store.
"""
log_and_print(Fore.BLUE, "Starting delete vector store test...")
if not DEMO_VECTOR_STORE_ID2:
log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
level=logging.WARNING)
return
try:
response = client.vector_stores.delete(
vector_store_id=DEMO_VECTOR_STORE_ID2,
)
log_and_print(Fore.GREEN, "Delete vector store test passed!")
response_data = response.to_dict()
log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
json.dump(response_data, f, indent=2)
except Exception as e:
log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_create_vector_store_file():
log_and_print(Fore.BLUE, "Starting create vector store file test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
level=logging.WARNING)
return
try:
# create jsonl of files as an example
with open("mydata.jsonl", "w") as f:
f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')
# Create a simple text file if my_data_small.txt doesn't exist
if not os.path.exists("my_data_small.txt"):
with open("my_data_small.txt", "w") as f:
f.write("This is a test file for vector store testing.\n")
created_file = client.files.create(
file=open("my_data_small.txt", "rb"),
purpose="assistants",
)
created_file_data = created_file.to_dict()
log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
json.dump(created_file_data, f, indent=2)
retrieved_files = client.files.retrieve(created_file.id)
retrieved_files_data = retrieved_files.to_dict()
log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
json.dump(retrieved_files_data, f, indent=2)
vector_store_file = client.vector_stores.files.create(
vector_store_id=DEMO_VECTOR_STORE_ID,
file_id=created_file.id,
)
log_and_print(Fore.GREEN, "Create vector store file test passed!")
except Exception as e:
log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
def test_search_vector_store():
"""
Test searching a vector store.
"""
log_and_print(Fore.BLUE, "Starting search vector store test...")
if not DEMO_VECTOR_STORE_ID:
log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
level=logging.WARNING)
return
try:
query = "What is the banana policy?"
search_results = client.vector_stores.search(
vector_store_id=DEMO_VECTOR_STORE_ID,
query=query,
max_num_results=10,
ranking_options={
'ranker': 'default-2024-11-15',
'score_threshold': 0.0,
},
rewrite_query=False,
)
# Check instead of assert
if not isinstance(search_results, pagination.SyncPage):
log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
level=logging.WARNING)
else:
log_and_print(Fore.GREEN, "Search vector store test passed!")
search_results_dict = search_results.to_dict()
log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
json.dump(search_results_dict, f, indent=2)
log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
except Exception as e:
log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
# Run all tests in sequence, even if some fail
test_results = []
try:
result = test_idempotent_vector_store_creation()
if result and len(result) == 2:
DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
test_results.append(True)
except Exception as e:
log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
test_results.append(False)
for test_func in [
test_vector_store_list,
test_retrieve_vector_store,
test_modify_vector_store,
test_delete_vector_store,
test_create_vector_store_file,
test_search_vector_store
]:
try:
test_func()
test_results.append(True)
except Exception as e:
log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
test_results.append(False)
if all(test_results):
log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
else:
failed_count = test_results.count(False)
log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
parser.add_argument(
"--provider",
type=str,
default="llama",
choices=["openai", "llama", "both"],
help="Specify which environment to test: openai, llama, or both. Default is both.",
)
args = parser.parse_args()
try:
if args.provider in ("openai", "both"):
openai_client = OpenAI()
run_tests(openai_client, prefix="openai")
if args.provider in ("llama", "both"):
llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
run_tests(llama_client, prefix="llama")
log_and_print(Fore.GREEN, "All tests completed!")
except Exception as e:
log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
logging.error(traceback.format_exc())
```
</details>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
COPY with chmod does not work, see
https://github.com/containers/buildah/issues/4614. Also Docker arguably
implements it.
Anyway, this command is not even needed since later don't we do:
```
RUN mkdir -p /.llama /.cache && chmod -R g+rw /app /.llama /.cache
```
And providers.d will get the right modes.
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Build with CONTAINER_BINARY=podman and success
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
The current authorized sql store implementation does not respect
user.principal (only checks attributes). This PR addresses that.
## Test Plan
Added test cases to integration tests.
While investigating the `uv.lock` changes made in
https://github.com/meta-llama/llama-stack/pull/2695 I noticed several of
the pre-commit hook versions were out of date
This PR updates them and fixes some new `ruff` errors
---------
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
currently when logging the run yaml, if there are path objects in the
object they are represented as:
```
external_providers_dir: !!python/object/apply:pathlib.PosixPath
- '~'
- .llama
- providers.d
```
now, with a config.model_dump(mode="json"), it works properly
```
external_providers_dir: ~/.llama/providers.d
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
We are now testing the safety capability with the starter image. This
includes a few changes:
* Enable the safety integration test
* Relax the shield model requirements from llama-guard to make it work
with llama-guard3:8b coming from Ollama
* Expose a shield for each inference provider in the starter distro. The
shield will only be registered if the provider is enabled.
Closes: https://github.com/meta-llama/llama-stack/issues/2528
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
currently when a template is used, we still use `--config`.
`server.py` has a dedicated `--template` flag and logic, use that
instead
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
since inference providers are disabled by default and can be turned on
manually via env variable.
* Disables safety in starter distro
Closes: https://github.com/meta-llama/llama-stack/issues/2502.
~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~
TODO:
- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama
Signed-off-by: Sébastien Han <seb@redhat.com>
Resolves access control error visibility issues where 500 errors were
returned instead of proper 403 responses with actionable error messages.
• Enhance AccessDeniedError with detailed context and improve exception
handling
• Enhanced AccessDeniedError class to include user, action, and resource
context
- Added constructor parameters for action, resource, and user
- Generate detailed error messages showing user principal, attributes,
and attempted resource
- Backward compatible with existing usage (falls back to generic
message)
• Updated exception handling in server.py
- Import AccessDeniedError from access_control module
- Return proper 403 status codes with detailed error messages
- Separate handling for PermissionError (generic) vs AccessDeniedError
(detailed)
• Enhanced error context at raise sites
- Updated routing_tables/common.py to pass action, resource, and user
context
- Updated agents persistence to include context in access denied errors
- Provides better debugging information for access control issues
• Added comprehensive unit tests
- Created tests/unit/server/test_server.py with 13 test cases
- Covers AccessDeniedError with and without context
- Tests all exception types (ValidationError, BadRequestError,
AuthenticationRequiredError, etc.)
- Validates proper HTTP status codes and error message formats
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
```
server:
port: 8321
access_policy:
- permit:
principal: admin
actions: [create, read, delete]
when: user with admin in groups
- permit:
actions: [read]
when: user with system:authenticated in roles
```
then:
```
curl --request POST --url http://localhost:8321/v1/vector-dbs \
--header "Authorization: Bearer your-bearer" \
--data '{
"vector_db_id": "my_demo_vector_db",
"embedding_model": "ibm-granite/granite-embedding-125m-english",
"embedding_dimension": 768,
"provider_id": "milvus"
}'
```
depending if user is in group admin or not, you should get the
`AccessDeniedError`. Before this PR, this was leading to an error 500
and `Traceback` displayed in the logs.
After the PR, logs display a simpler error (unless DEBUG logging is set)
and a 403 Forbidden error is returned on the HTTP side.
---------
Signed-off-by: Akram Ben Aissi <<akram.benaissi@gmail.com>>
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo,
as the config expected a str but the value was converted to int.
This PR:
1. Updates the type of port in sqlstore to be int
2. template generation uses `dict` instead of `StackRunConfig` so as to
avoid failing pydantic typechecks.
3. Adds `replace_env_vars` to StackRunConfig instantiation in
`configure.py` (not sure why this wasn't needed before).
## Test Plan
`llama stack build --template postgres_demo --image-type conda --run`
# What does this PR do?
We were not using conditionals correctly, conditionals can only be used
when the env variable is set, so `${env.ENVIRONMENT:+}` would return
None is ENVIRONMENT is not set.
If you want to create a conditional value, you need to do
`${env.ENVIRONMENT:=}`, this will pick the value of ENVIRONMENT if set,
otherwise will return None.
Closes: https://github.com/meta-llama/llama-stack/issues/2564
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Fixes an error when inferring dataset provider_id with metadata
Closes #[2506](https://github.com/meta-llama/llama-stack/issues/2506)
Signed-off-by: Juanma Barea <juanmabareamartinez@gmail.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- conditionally created folder /.llama/providers.d if
external_providers_dir is set
- do not create /.cache folder, not in use anywhere
- combine chmod and copy to one command
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
updated test:
```
export CONTAINER_BINARY=podman
LLAMA_STACK_DIR=. uv run llama stack build --template remote-vllm --image-type container --image-name <name>
```
log:
```
Containerfile created successfully in /tmp/tmp.rPMunE39Aw/Containerfile
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y iputils-ping net-tools iproute2 dnsutils telnet curl wget telnet git procps psmisc lsof traceroute bubblewrap gcc && rm -rf /var/lib/apt/lists/*
ENV UV_SYSTEM_PYTHON=1
RUN pip install uv
RUN uv pip install --no-cache sentencepiece pillow pypdf transformers pythainlp faiss-cpu opentelemetry-sdk requests datasets chardet scipy nltk numpy matplotlib psycopg2-binary aiosqlite langdetect autoevals tree_sitter tqdm pandas chromadb-client opentelemetry-exporter-otlp-proto-http redis scikit-learn openai pymongo emoji sqlalchemy[asyncio] mcp aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
RUN uv pip install --no-cache sentence-transformers --no-deps
RUN uv pip install --no-cache torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Allows running as non-root user
RUN mkdir -p /.llama/providers.d /.cache
RUN uv pip install --no-cache llama-stack
RUN pip uninstall -y uv
ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "remote-vllm"]
RUN chmod -R g+rw /app /.llama /.cache
PWD: /tmp/llama-stack
Containerfile: /tmp/tmp.rPMunE39Aw/Containerfile
+ podman build --progress=plain --security-opt label=disable --platform linux/amd64 -t distribution-remote-vllm:0.2.12 -f /tmp/tmp.rPMunE39Aw/Containerfile /tmp/llama-stack
....
Success!
Build Successful!
You can find the newly-built template here: /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml
You can run the new Llama Stack distro via: llama stack run /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml --image-type container
```
```
podman tag localhost/distribution-remote-vllm:dev quay.io/wenzhou/distribution-remote-vllm:2492_2
podman push quay.io/wenzhou/distribution-remote-vllm:2492_2
docker run --rm -p 8321:8321 -e INFERENCE_MODEL="meta-llama/Llama-2-7b-chat-hf" -e VLLM_URL="http://localhost:8000/v1" quay.io/wenzhou/distribution-remote-vllm:2492_2 --port 8321
INFO 2025-06-26 13:47:31,813 __main__:436 server: Using template remote-vllm config file:
/app/llama-stack-source/llama_stack/templates/remote-vllm/run.yaml
INFO 2025-06-26 13:47:31,818 __main__:438 server: Run configuration:
INFO 2025-06-26 13:47:31,826 __main__:440 server: apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
benchmarks: []
container_image: null
....
```
-----
previous test:
local run` >llama stack build --template remote-vllm --image-type
container`
image stored in `quay.io/wenzhou/distribution-remote-vllm:2492`
---------
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
# What does this PR do?
Some templates were still using the old environment variable substition
syntax instead of the new one and were not getting substituted properly.
Also, some places didn't handle the new None vs old empty string ("")
values that come from the conditional environment variable substitution.
This gets the starter and remote-vllm distributions starting again, and
I tested various permutations of the starter as chroma and pgvector
needed some adjustments to their config classes to handle the new
possible `None` values. And, I had to tweak our `Provider` class to also
handle `None` values, for cases where we disable providers in the
starter config via environment variables.
This may not have caught everything that was missed, but I did grep
around quite a bit to try and find anything lingering.
## Test Plan
The following permutations now all run (or attempt to run to the point
of complaining that they can't connect to chroma, vllm, etc) when before
they failed immediately on startup because of bad environment variable
substitions:
```
uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_SQLITE_VEC=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_PGVECTOR=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_CHROMADB=true uv run llama stack run llama_stack/templates/starter/run.yaml
uv run llama stack run llama_stack/templates/remote-vllm/run.yaml
```
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: raghotham <rsm@meta.com>
# What does this PR do?
This commit significantly improves the environment variable substitution
functionality in Llama Stack configuration files:
* The version field in configuration files has been changed from string
to integer type for better type consistency across build and run
configurations.
* The environment variable substitution system for ${env.FOO:} was fixed
and properly returns an error
* The environment variable substitution system for ${env.FOO+} returns
None instead of an empty strings, it better matches type annotations in
config fields
* The system includes automatic type conversion for boolean, integer,
and float values.
* The error messages have been enhanced to provide clearer guidance when
environment variables are missing, including suggestions for using
default values or conditional syntax.
* Comprehensive documentation has been added to the configuration guide
explaining all supported syntax patterns, best practices, and runtime
override capabilities.
* Multiple provider configurations have been updated to use the new
conditional syntax for optional API keys, making the system more
flexible for different deployment scenarios. The telemetry configuration
has been improved to properly handle optional endpoints with appropriate
validation, ensuring that required endpoints are specified when their
corresponding sinks are enabled.
* There were many instances of ${env.NVIDIA_API_KEY:} that should have
caused the code to fail. However, due to a bug, the distro server was
still being started, and early validation wasn’t triggered. As a result,
failures were likely being handled downstream by the providers. I’ve
maintained similar behavior by using ${env.NVIDIA_API_KEY:+}, though I
believe this is incorrect for many configurations. I’ll leave it to each
provider to correct it as needed.
* Environment variable substitution now uses the same syntax as Bash
parameter expansion.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
We still had a few enum declared to behave like string as well as enum.
Let's use StrEnum for those.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
* Given that our API packages use "import *" in `__init.py__` we don't
need to do `from llama_stack.apis.models.models` but simply from
llama_stack.apis.models. The decision to use `import *` is debatable and
should probably be revisited at one point.
* Remove unneeded Ruff F401 rule
* Consolidate Ruff F403 rule in the pyprojectfrom
llama_stack.apis.models.models
Signed-off-by: Sébastien Han <seb@redhat.com>