# What does this PR do?
most third-party actions use hashes for pinning but not all
do proper hash pinning on all remaining actions using tags
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
We are dropping configuration via CLI flag almost entirely. If any
server configuration has to be tweak it must be done through the server
section in the run.yaml.
This is unfortunately a breaking change for whover was using:
* `--tls-*`
* `--disable_ipv6`
`--port` stays around and get a special treatment since we believe, it's
common for user dev to change port for quick experimentations.
Closes: https://github.com/meta-llama/llama-stack/issues/1076
## Test Plan
Simply do `llama stack run <config>` nothing should break :)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This PR introduces a reusable GitHub Actions workflow for pulling and
running an Ollama model, with caching to avoid repeated downloads.
[//]: # (If resolving an issue, uncomment and update the line below)
Closes: #1949
## Test Plan
1. Trigger a workflow that uses the Ollama setup. Confirm that:
- The model is pulled successfully.
- It is placed in the correct directory, official at the moment (not
~ollama/.ollama/models as per comment so need to confirm this).
2. Re-run the same workflow to validate that:
- The model is restored from the cache.
- Execution succeeds with the cached model.
[//]: # (## Documentation)
All merges produced by github are pushes to main, which makes the check
fail. The check is local by design, not meant for CI.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
* pull the embedding model so that it's not pulled during the distro
server startup sequence
* cache the models
* collect logs at the end of the workflow
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
* new workflow job **build-ubi9-container-distribution**
* runs on the default `ubuntu-latest` runner
* uses the existing `dev` template
* invokes `uv run llama stack build` with `.container_base =
"registry.access.redhat.com/ubi9/ubi-minimal:latest"`
* inspects the resulting image to verify its entrypoint
# (Closes#1994)
## Test Plan
- CI now includes the `build-ubi9-container-distribution` job and will
turn green when that job passes on changes to build files
# What does this PR do?
This builds on top of
https://github.com/meta-llama/llama-stack/pull/2037 to include some
additional changes to fix integration tests builds.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# What does this PR do?
This commit adds a new authentication system to the Llama Stack server
with support for Kubernetes and custom authentication providers. Key
changes include:
- Implemented KubernetesAuthProvider for validating Kubernetes service
account tokens
- Implemented CustomAuthProvider for validating tokens against external
endpoints - this is the same code that was already present.
- Added test for Kubernetes
- Updated server configuration to support authentication settings
- Added documentation for authentication configuration and usage
The authentication system supports:
- Bearer token validation
- Kubernetes service account token validation
- Custom authentication endpoints
## Test Plan
Setup a Kube cluster using Kind or Minikube.
Run a server with:
```
server:
port: 8321
auth:
provider_type: kubernetes
config:
api_server_url: http://url
ca_cert_path: path/to/cert (optional)
```
Run:
```
curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers
```
Or replace "my-user" with your service account.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Add installation script for Llama Stack Meta Reference distro (Docker
only).
# Closes#1374
## Test Plan
./instal.sh
---------
Co-authored-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Remove `distributions/**` from integration, external provider, and unit
tests
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
N/A
[//]: # (## Documentation)
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Update External Providers CI to not run on changes to docs, rfcs, and
scripts
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
As part of the build process, we now include the generated run.yaml
(based of the provided build configuration file) into the container. We
updated the entrypoint to use this run configuration as well.
Given this simple distribution configuration:
```
# build.yaml
version: '2'
distribution_spec:
description: Use (an external) Ollama server for running LLM inference
providers:
inference:
- remote::ollama
vector_io:
- inline::faiss
safety:
- inline::llama-guard
agents:
- inline::meta-reference
telemetry:
- inline::meta-reference
eval:
- inline::meta-reference
datasetio:
- remote::huggingface
- inline::localfs
scoring:
- inline::basic
- inline::llm-as-judge
- inline::braintrust
tool_runtime:
- remote::brave-search
- remote::tavily-search
- inline::code-interpreter
- inline::rag-runtime
- remote::model-context-protocol
- remote::wolfram-alpha
container_image: "registry.access.redhat.com/ubi9"
image_type: container
image_name: test
```
Build it:
```
llama stack build --config build.yaml
```
Run it:
```
podman run --rm \
-p 8321:8321 \
-e OLLAMA_URL=http://host.containers.internal:11434 \
--name llama-stack-server \
localhost/leseb-test:0.2.2
```
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
allow users to specify only the providers they want in the llama stack
build command. If a user wants a non-interactive build, but doesn't want
to use a template, `--providers` allows someone to specify something
like `--providers inference=remote::ollama` for a distro with JUST
ollama
## Test Plan
`llama stack build --providers inference=remote::ollama --image-type
venv`
<img width="1084" alt="Screenshot 2025-03-20 at 9 34 14 AM"
src="https://github.com/user-attachments/assets/502b5fa2-edab-4267-a595-4f987204a6a9"
/>
`llama stack run --image-type venv
/Users/charliedoern/projects/Documents/llama-stack/venv-run.yaml`
<img width="1149" alt="Screenshot 2025-03-20 at 9 35 19 AM"
src="https://github.com/user-attachments/assets/433765f3-6b7f-4383-9241-dad085b69228"
/>
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
5.4.0 to 5.4.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v5.4.1 🌈 Add support for pep440 version specifiers</h2>
<h2>Changes</h2>
<p>With this release you can also use <a
href="https://peps.python.org/pep-0440/#version-specifiers">pep440
version specifiers</a> as <code>required-version</code> in
files<code>uv.toml</code>, <code>pyroject.toml</code> and in the
<code>version</code> input:</p>
<pre lang="yaml"><code>- name: Install a pep440-specifier-satisfying
version of uv
uses: astral-sh/setup-uv@v5
with:
version: ">=0.4.25,<0.5"
</code></pre>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Add support for pep440 version identifiers <a
href="https://github.com/eifinger"><code>@eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/353">#353</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.6.10 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/345">#345</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Add pep440 to docs header <a
href="https://github.com/eifinger"><code>@eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/355">#355</a>)</li>
<li>Fix glob syntax link <a
href="https://github.com/flying-sheep"><code>@flying-sheep</code></a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/349">#349</a>)</li>
<li>Add link to supported glob patterns <a
href="https://github.com/eifinger"><code>@eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/348">#348</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0c5e2b8115"><code>0c5e2b8</code></a>
Add pep440 to docs header (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/355">#355</a>)</li>
<li><a
href="794ea9455c"><code>794ea94</code></a>
Add support for pep440 version identifiers (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/353">#353</a>)</li>
<li><a
href="2d49baf2b6"><code>2d49baf</code></a>
chore: update known checksums for 0.6.10 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/345">#345</a>)</li>
<li><a
href="4fa25599ce"><code>4fa2559</code></a>
Fix glob syntax link (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/349">#349</a>)</li>
<li><a
href="224dce1d79"><code>224dce1</code></a>
Add link to supported glob patterns (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/348">#348</a>)</li>
<li>See full diff in <a
href="22695119d7...0c5e2b8115">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
workflow -
0. Checkout
1. Install uv
2. Install Ollama
3. Pull Ollama image
4. Start Ollama in background
5. Set Up Environment and Install Dependencies
6. Wait for Ollama to start
7. Start Llama Stack server in background
8. Wait for Llama Stack server to be ready
9. Run Integration Tests
changes -
(4) starts the loading of the ollama model, it does not start ollama.
the model will be loaded when used. this step is removed.
(6) is handled in (2). this step is removed.
(2) is renamed to reflect it's dual purpose.
# What does this PR do?
Providers that live outside of the llama-stack codebase are now
supported.
A new property `external_providers_dir` has been added to the main
config and can be configured as follow:
```
external_providers_dir: /etc/llama-stack/providers.d/
```
Where the expected structure is:
```
providers.d/
inference/
custom_ollama.yaml
vllm.yaml
vector_io/
qdrant.yaml
```
Where `custom_ollama.yaml` is:
```
adapter:
adapter_type: custom_ollama
pip_packages: ["ollama", "aiohttp"]
config_class: llama_stack_ollama_provider.config.OllamaImplConfig
module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
```
Obviously the package must be installed on the system, here is the
`llama_stack_ollama_provider` example:
```
$ uv pip show llama-stack-ollama-provider
Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv
Name: llama-stack-ollama-provider
Version: 0.1.0
Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages
Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider
Requires:
Required-by:
```
Closes: https://github.com/meta-llama/llama-stack/issues/658
Signed-off-by: Sébastien Han <seb@redhat.com>
Previously, the integration tests started the server, but never really
used it because `--stack-config=ollama` uses the ollama template and the
inline "llama stack as library" client, not the HTTP client.
This PR makes sure we test it both ways.
We also add agents tests to the mix.
## Test Plan
GitHub
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
When multiple commits are pushed to a PR, multiple CI builds will be
triggered. This PR ensures that we only run one concurrent build for
each PR to reduce CI loads.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.
## Test Plan
spin up a llama stack server with open-benchmark template
run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
# What does this PR do?
This is a follow up from
https://github.com/meta-llama/llama-stack/pull/1463. cc @yanxi0830
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This makes it easier to know the statuses of both and identifying failed
builds.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Run additional tests in a matrix to accelerate the process and clearly
identify failing providers.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This PR adds dependabot updates for Python dependencies. In addition:
* Consistent weekly schedule on a specific day
* Specific commit messages
* `open-pull-requests-limit` is intentional to avoid upgrading
dependencies that will likely cause regressions. We want to keep the
focus here on security updates only
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
rather than have unit and functional tests run on all PRs, we should
only have them run on PRs changing relevant files
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
A PTY is unnecessary for interactive mode since `subprocess.run()`
already inherits the calling terminal’s stdin, stdout, and stderr,
allowing natural interaction. Using a PTY can introduce unwanted side
effects like buffering issues and inconsistent signal handling. Standard
input/output is sufficient for most interactive programs.
This commit simplifies the command execution by:
1. Removing PTY-based execution in favor of direct subprocess handling
2. Consolidating command execution into a single run_command function
3. Improving error handling with specific subprocess error types
4. Adding proper type hints and documentation
5. Maintaining Ctrl+C handling for graceful interruption
## Test Plan
```
llama stack run
```
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Useful for local development. Now you can just trigger the script and
not care about specific arguments to pass to run unit tests.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
$ . ./venv/bin/activate
$ ./scripts/run_tests.sh
$ echo $?
0
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>
# What does this PR do?
- Issues/PRs inactive for 60 days are marked as stale
- Stale items are closed after 30 additional days of inactivity
- Adds appropriate warning and closing messages
- Sets daily schedule for stale checks
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Added a GitHub Action to run inference tests for the Ollama provider.
This ensures we have coverage for Ollama integration.
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Introduced a new CI job that dynamically generates a build matrix based
on available templates from `llama_stack/templates/*/build.yaml`.
This allows automated testing for all templates without manual
intervention.
The CI currently builds for venv and containers.
Signed-off-by: Sébastien Han <seb@redhat.com>
~Will pass once https://github.com/meta-llama/llama-stack/pull/1228
merges.~
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
additional artifacts make test results more human-readable
## Test Plan
Ran locally
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
This PR adds a simple unit test badge to the project README
It also modifies the workflow to run on merges to main, so that the
status reflected in the README is that of main and not pull request
branches
---------
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
python unit tests running via GitHub Actions were only running with
python 3.10
the project supports all python versions greater than or equal to 3.10
this commit adds 3.11, 3.12, and 3.13 to the test matrix for better
coverage and confidence for non-3.10 users
## Test Plan
All tests pass locally with python 3.11, 3.12, and 3.13
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
as I brought up in #1515 it shouldn't be nessessary to tie the unit test
runner to an exact z-stream of Python 3.10
updated so unit test runner always uses latest z-stream of Python 3.10
## Test Plan
```shell
$ uv run -p 3.10 --with-editable . --with-editable ".[dev]" --with-editable ".[unit]" pytest --cov=llama_stack -s -v tests/unit/ --junitxml=pytest-report.xml
```
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
The `test` section has been updated to include only the essential
dependencies needed for running integration tests, which are shared
across all providers. If a provider requires additional dependencies,
please add them to your environment separately. When using uv to
run your tests, you can specify extra dependencies with the
`--with` flag.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This PR allows for unit test code coverage % to be reported in PR
builds. Currently, today's output tells the end user which tests passed
and which tests failed:
<img width="744" alt="Screenshot 2025-03-10 at 9 44 28 AM"
src="https://github.com/user-attachments/assets/40b1a578-951f-4b74-8a37-a39c039b1d7e"
/>
If a contributor is creating a new module within Llama Stack and starts
writing unit tests for that module, it might be difficult for Llama
Stack maintainers to immediately determine the code coverage percentage
for that new module.
To allow for code coverage reporting in the CI, we simply need to
install `pytest-cov` so we can use the `--cov` flag with the existing
`pytest` command.
Ideally, it would be nicer to have a bot report code coverage, but this
PR can be a temporary solution.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
I ran these changes locally:
<img width="1455" alt="Screenshot 2025-03-10 at 10 01 53 AM"
src="https://github.com/user-attachments/assets/dfd765c6-5979-42a3-b899-7713a3f202e6"
/>
PR build to confirm the expected behavior:
<img width="1326" alt="Screenshot 2025-03-10 at 12 47 36 PM"
src="https://github.com/user-attachments/assets/fe94f1e6-fbb5-4e57-9902-197502c50621"
/>
[//]: # (## Documentation)
Signed-off-by: Courtney Pacheco <6019922+courtneypacheco@users.noreply.github.com>