# What does this PR do?
dropped python3.10, updated pyproject and dependencies, and also removed
some blocks of code with special handling for enum.StrEnum
Closes#2458
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Instead of downloading the models each time we now have a single Ollama
container that is baked with the models pulled and ready to use.
This will remove the CI flakiness on model pulling.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Expand the test matrix to include Python 3.10, 3.11, and 3.12 to ensure
the project runs correctly on these versions. This will give us
confidence to begin considering an increase to the project's minimum
supported Python version.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Use a composite action to avoid similar steps repetitions and
centralization of the defaults.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
adds an inline HF SFTTrainer provider. Alongside touchtune -- this is a
super popular option for running training jobs. The config allows a user
to specify some key fields such as a model, chat_template, device, etc
the provider comes with one recipe `finetune_single_device` which works
both with and without LoRA.
any model that is a valid HF identifier can be given and the model will
be pulled.
this has been tested so far with CPU and MPS device types, but should be
compatible with CUDA out of the box
The provider processes the given dataset into the proper format,
establishes the various steps per epoch, steps per save, steps per eval,
sets a sane SFTConfig, and runs n_epochs of training
if checkpoint_dir is none, no model is saved. If there is a checkpoint
dir, a model is saved every `save_steps` and at the end of training.
## Test Plan
re-enabled post_training integration test suite with a singular test
that loads the simpleqa dataset:
https://huggingface.co/datasets/llamastack/simpleqa and a tiny granite
model: https://huggingface.co/ibm-granite/granite-3.3-2b-instruct. The
test now uses the llama stack client and the proper post_training API
runs one step with a batch_size of 1. This test runs on CPU on the
Ubuntu runner so it needs to be a small batch and a single step.
[//]: # (## Documentation)
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
most third-party actions use hashes for pinning but not all
do proper hash pinning on all remaining actions using tags
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
This PR introduces a reusable GitHub Actions workflow for pulling and
running an Ollama model, with caching to avoid repeated downloads.
[//]: # (If resolving an issue, uncomment and update the line below)
Closes: #1949
## Test Plan
1. Trigger a workflow that uses the Ollama setup. Confirm that:
- The model is pulled successfully.
- It is placed in the correct directory, official at the moment (not
~ollama/.ollama/models as per comment so need to confirm this).
2. Re-run the same workflow to validate that:
- The model is restored from the cache.
- Execution succeeds with the cached model.
[//]: # (## Documentation)
# What does this PR do?
* pull the embedding model so that it's not pulled during the distro
server startup sequence
* cache the models
* collect logs at the end of the workflow
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This builds on top of
https://github.com/meta-llama/llama-stack/pull/2037 to include some
additional changes to fix integration tests builds.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# What does this PR do?
Remove `distributions/**` from integration, external provider, and unit
tests
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
N/A
[//]: # (## Documentation)
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
5.4.0 to 5.4.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v5.4.1 🌈 Add support for pep440 version specifiers</h2>
<h2>Changes</h2>
<p>With this release you can also use <a
href="https://peps.python.org/pep-0440/#version-specifiers">pep440
version specifiers</a> as <code>required-version</code> in
files<code>uv.toml</code>, <code>pyroject.toml</code> and in the
<code>version</code> input:</p>
<pre lang="yaml"><code>- name: Install a pep440-specifier-satisfying
version of uv
uses: astral-sh/setup-uv@v5
with:
version: ">=0.4.25,<0.5"
</code></pre>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Add support for pep440 version identifiers <a
href="https://github.com/eifinger"><code>@eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/353">#353</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.6.10 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/345">#345</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Add pep440 to docs header <a
href="https://github.com/eifinger"><code>@eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/355">#355</a>)</li>
<li>Fix glob syntax link <a
href="https://github.com/flying-sheep"><code>@flying-sheep</code></a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/349">#349</a>)</li>
<li>Add link to supported glob patterns <a
href="https://github.com/eifinger"><code>@eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/348">#348</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0c5e2b8115"><code>0c5e2b8</code></a>
Add pep440 to docs header (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/355">#355</a>)</li>
<li><a
href="794ea9455c"><code>794ea94</code></a>
Add support for pep440 version identifiers (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/353">#353</a>)</li>
<li><a
href="2d49baf2b6"><code>2d49baf</code></a>
chore: update known checksums for 0.6.10 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/345">#345</a>)</li>
<li><a
href="4fa25599ce"><code>4fa2559</code></a>
Fix glob syntax link (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/349">#349</a>)</li>
<li><a
href="224dce1d79"><code>224dce1</code></a>
Add link to supported glob patterns (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/348">#348</a>)</li>
<li>See full diff in <a
href="22695119d7...0c5e2b8115">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
workflow -
0. Checkout
1. Install uv
2. Install Ollama
3. Pull Ollama image
4. Start Ollama in background
5. Set Up Environment and Install Dependencies
6. Wait for Ollama to start
7. Start Llama Stack server in background
8. Wait for Llama Stack server to be ready
9. Run Integration Tests
changes -
(4) starts the loading of the ollama model, it does not start ollama.
the model will be loaded when used. this step is removed.
(6) is handled in (2). this step is removed.
(2) is renamed to reflect it's dual purpose.
Previously, the integration tests started the server, but never really
used it because `--stack-config=ollama` uses the ollama template and the
inline "llama stack as library" client, not the HTTP client.
This PR makes sure we test it both ways.
We also add agents tests to the mix.
## Test Plan
GitHub
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
When multiple commits are pushed to a PR, multiple CI builds will be
triggered. This PR ensures that we only run one concurrent build for
each PR to reduce CI loads.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.
## Test Plan
spin up a llama stack server with open-benchmark template
run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
# What does this PR do?
This makes it easier to know the statuses of both and identifying failed
builds.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Run additional tests in a matrix to accelerate the process and clearly
identify failing providers.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
rather than have unit and functional tests run on all PRs, we should
only have them run on PRs changing relevant files
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
A PTY is unnecessary for interactive mode since `subprocess.run()`
already inherits the calling terminal’s stdin, stdout, and stderr,
allowing natural interaction. Using a PTY can introduce unwanted side
effects like buffering issues and inconsistent signal handling. Standard
input/output is sufficient for most interactive programs.
This commit simplifies the command execution by:
1. Removing PTY-based execution in favor of direct subprocess handling
2. Consolidating command execution into a single run_command function
3. Improving error handling with specific subprocess error types
4. Adding proper type hints and documentation
5. Maintaining Ctrl+C handling for graceful interruption
## Test Plan
```
llama stack run
```
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Added a GitHub Action to run inference tests for the Ollama provider.
This ensures we have coverage for Ollama integration.
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>