# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR migrates `unittest` to `pytest` in
`tests/unit/providers/nvidia/test_eval.py`.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Part of https://github.com/llamastack/llama-stack/issues/2680
Supersedes https://github.com/llamastack/llama-stack/pull/2791
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
This PR removes all routes which we had marked deprecated for the 0.3.0
release.
This includes:
- all the `/v1/openai/v1/` routes (the corresponding /v1 routes still
exist of course)
- the /agents API (which is superseded completely by Responses +
Conversations)
- several alpha routes which had a "v1" route to aide transitioning to
"v1alpha"
This is the corresponding client-python change:
https://github.com/llamastack/llama-stack-client-python/pull/294
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.
**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
# What does this PR do?
This API hasn't received any traction and close to zero interest from
the community. Let's revisit in the future if things change.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
We need to remove `/v1/openai/v1` paths shortly. There is one trouble --
our current `/v1/openai/v1/models` endpoint provides different data than
`/v1/models`. Unfortunately our tests target the latter (llama-stack
customized) behavior. We need to get to true OpenAI compatibility.
This is step 1: adding `custom_metadata` field to `OpenAIModel` that
includes all the extra stuff we add in the native `/v1/models` response.
This can be extracted on the consumer end by look at
`__pydantic_extra__` or other similar fields.
This PR:
- Adds `custom_metadata` field to `OpenAIModel` class in
`src/llama_stack/apis/models/models.py`
- Modified `openai_list_models()` in
`src/llama_stack/core/routing_tables/models.py` to populate
custom_metadata
Next Steps
1. Update stainless client to use `/v1/openai/v1/models` instead of
`/v1/models`
2. Migrate tests to read from `custom_metadata`
3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single
`/v1/models` endpoint
Fixes race condition causing "database is locked" errors during
concurrent writes to SQLite, particularly in streaming responses with
guardrails where multiple inference calls write simultaneously.
Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple
concurrent readers and one writer without blocking. Set busy_timeout to
5s so SQLite retries instead of failing immediately. Remove the logic
that disabled write queues for SQLite since WAL mode eliminates the
locking issues that prompted disabling them.
Fixes: test_output_safety_guardrails_safe_content[stream=True] flake
# What does this PR do?
This prevents interference from already running servers, and allows
multiple concurrent integration test runs. Unleash the AIs!
## Test Plan
start a LS server at port 8321
Then observe test uses port 8322:
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config
server:ci-tests --inference-mode replay --setup ollama --suite base
--pattern '(telemetry or safety)'
=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: ollama
Inference Mode: replay
Test Suite: base
Test Subdirs:
Test Pattern: (telemetry or safety)
Checking llama packages
llama-stack 0.4.0.dev0 /Users/erichuang/projects/new_test_server
llama-stack-client 0.3.0
ollama 0.6.0
=== Applying Setup Environment Variables ===
Setting SQLITE_STORE_DIR:
/var/folders/cz/vyh7y1d11xg881lsxsshnc5c0000gn/T/tmp.bKLsaVAxyU
Setting stack config type: server
Setting up environment variables:
export OLLAMA_URL='http://0.0.0.0:11434'
export SAFETY_MODEL='ollama/llama-guard3:1b'
Will use port: 8322
=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start on port 8322...
✅ Llama Stack Server started successfully
This commit introduces Mergify, a powerful bot designed to assist with
automated merging and other CI-related tasks. As an initial step, we
enable a basic feature: automatically notifying users when a pull
request has merge conflicts.
When a conflict is detected, Mergify will add a label to the PR. This
label will be removed once the conflict is resolved.
This is foundation PR to activate the bot and start using it for
backports too.
In the future, we plan to expand Mergify’s role to include auto-merging,
as discussed in #1667, once the project is ready.
Signed-off-by: Sébastien Han <seb@redhat.com>
Without this hint Qwen3-0.6B tends to reply with the full name
and sometimes doesn't reply with the correct drafted year.
---------
Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
This seems to be an ancient artifact when we were using readthedocs? Now
docusaurus read the specs directly.
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Fixes latent bug where UV_INDEX_STRATEGY was only exported to GITHUB_ENV
but not to the current shell.
While this bug doesn't currently affect main (since UV_EXTRA_INDEX_URL
is only set on release branches), it's a latent bug that could cause
issues if the logic changes in the future or if someone tests with
UV_EXTRA_INDEX_URL set.
The setup-runner action only exported UV_INDEX_STRATEGY to GITHUB_ENV
(for subsequent steps), not to the current shell environment. Since uv
sync runs in the same step, it would never see the variable if it were
set.
This fix adds `export UV_INDEX_STRATEGY=unsafe-best-match` to make the
variable available in the current shell before running uv commands.
Related: #4019 (same fix for release-0.3.x where the bug is actively
triggered)
# What does this PR do?
llama stack run --providers takes a list of providers in the format of
api1=provider1,api2=provider2
this allows users to run with a simple list of providers.
given the architecture of `create_app`, this run config needs to be
written to disk. use ~/.llama/distribution/providers-run/run.yaml each
time for consistency
resolves#3956
## Test Plan
new unit tests to ensure --providers.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Fixes container builds failing with UV index strategy errors when build
args are passed with empty values.
Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="")
become environment variables with empty string values in RUN commands.
UV interprets these as if --index-strategy "" was passed on the command
line, causing build failures with "error: a value is required for
'--index-strategy <UV_INDEX_STRATEGY>'".
This is a footgun because empty string ≠ unset variable, and ARGs
silently propagate to all RUN commands, only failing when declared with
empty defaults.
The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of
RUN blocks, saves the values early, and only restores them for editable
installs with RC dependencies. All other install modes (PyPI, test-pypi,
client) now run with a clean environment.
Backports UV index configuration fixes from `release-0.3.x` (PR #4002).
The main issue: when we created the release branch infrastructure, we
configured UV to use `test.pypi` as the PRIMARY index to resolve RC
dependencies. This caused UV to look for ALL packages there first, which
led to problems - some packages don't have binary wheels on `test.pypi`,
so UV tried building from source and failed (like the `psycopg2-binary`
issue we hit).
The fix is simple: use PyPI as primary (default) and `test.pypi` as an
EXTRA index. UV will check PyPI first for everything, and only fall back
to `test.pypi` for packages not found there (like our RC client
versions).
This PR includes:
- Fixed `install-llama-stack-client` action to output
`UV_EXTRA_INDEX_URL` instead of `UV_INDEX_URL`
- New `uv-run-with-index.sh` wrapper that auto-detects release branches
and sets UV env vars
- Updated pre-commit hooks (`uv-lock`, codegen, etc.) to use the wrapper
- Pass UV env vars as Docker build args in all locations
- Scope UV env vars properly in Containerfile (inline for llama-stack
install, explicitly unset before distribution deps)
- Export UV env vars to `GITHUB_ENV` in setup-runner for cross-step
persistence
The wrapper detects release branches automatically in both CI and local
environments, so this "just works" without manual configuration. On main
(non-release branch), the wrapper becomes a no-op.
Tested and validated on `release-0.3.x` where all CI checks pass.
# What does this PR do?
Allow filtering for v1alpha, v1beta, deprecated and v1. Backward
incompatible change since by default it only returns v1 apis now.
## Test Plan
added unit test
Fixes CI failures on release branches where uv sync can't resolve RC
dependencies.
The problem: on release branches like `release-0.3.x`, pyproject.toml
requires `llama-stack-client>=0.3.1rc1`. But RC versions only exist on
test.pypi, not PyPI. So uv sync fails before we even get a chance to
install the client from git.
The fix is simple - on release branches, pre-install the client from the
matching git branch first, then run uv sync. This satisfies the RC
requirement and lets dependency resolution succeed.
Modified setup-runner and pre-commit workflows to do this. Also cleaned
up some duplicate logic in setup-test-environment that's now handled
centrally.
Example failure:
5415478835
Replace unused `LLAMA_STACK_CLIENT_DIR` env var (from old `llama stack
build`) with direct `uv pip install` for release branch client
installation.
cc @ehhuang
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#3278
## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```
Integration test:
```
pytest -s -v tests/integration/inference/test_rerank.py --stack-config="inference=nvidia" --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3 --env NVIDIA_API_KEY="" --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
Standardize CI workflows to use `release-X.Y.x` branch pattern instead
of multiple numeric variants.
That's the pattern we are settling on. See
https://github.com/llamastack/llama-stack-ops/pull/20 for reference.
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the handling of the external_providers_dir configuration
field to align with its ongoing deprecation, in favor of the provider
`module` specification approach.
It addresses the issue in #3950, where using the default provided
run.yaml config resulted in the `external_providers_dir` parameter being
set to the literal string `None`, and crashing the llama-stack server
when starting.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#3950
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
- Built a new container image from `podman build . -f
containers/Containerfile --build-arg DISTRO_NAME=starter --tag
llama-stack:starter`
- Tested it locally with `podman run -it localhost/llama-stack:starter`
- Tested it on an OpenShift 4.19 cluster, deployed via the
llama-stack-k8s-operator.
Signed-off-by: Doug Edgar <dedgar@redhat.com>
… case variations
The ollama/llama3.2:3b-instruct-fp16 model returns string values with
trailing whitespace in structured JSON output. Updated test assertions
to use case-insensitive substring matching instead of exact equality.
Use .lower() for case-insensitive comparison
Check if expected value is contained in actual value (handles
whitespace)
Closes: #3996
Signed-off-by: Derek Higgins <derekh@redhat.com>
We will be updating our release procedure to be more "normal" or "sane".
We will
- create release branches like normal people
- land cherry-picks onto those branches
- run releases off of those branches
- no more "rc" branch pollution either
Given that, this PR cleans things up a bit
- Remove `-maint` suffix from release branch patterns in CI workflows
- Update branch matching to `release-X.Y.x` format
The integration-tests.sh script already sets LLAMA_STACK_TEST_STACK_CONFIG_TYPE
based on the stack config. Our custom detection logic was unnecessary and
potentially interfering. Revert to relying on the environment variable set
by the test script.
The LLAMA_STACK_DISABLE_GUNICORN environment variable is still set correctly
when stack_mode == 'server', which happens for both server: and docker: configs.
The telemetry fixture was only checking LLAMA_STACK_TEST_STACK_CONFIG_TYPE
environment variable, which defaults to 'library_client'. In CI, tests run
with --stack-config=docker:ci-tests, which wasn't being detected as server mode.
This commit checks the --stack-config argument and treats both 'server:' and
'docker:' prefixes as server mode, ensuring LLAMA_STACK_DISABLE_GUNICORN is
set when needed for telemetry span collection.
Telemetry tests use an OTLP collector that expects single-process
telemetry spans. Gunicorn's multi-process architecture spawns multiple
workers, each with separate telemetry instrumentation, preventing the
test collector from capturing all spans.
This commit adds LLAMA_STACK_DISABLE_GUNICORN environment variable
support and sets it in telemetry test configuration to ensure
single-process Uvicorn is used during tests while maintaining
production multi-process behavior.
Fixes failing tests:
- test_streaming_chunk_count
- test_telemetry_format_completeness
This should be "remote::vllm". This causes some log probs tests to be
skipped with remote vllm. (They
fail if run).
Signed-off-by: Derek Higgins <derekh@redhat.com>
`mypy` became very slow for the common path. This can make local
pre-commit runs very slow. Let's restore that.
- restore fast mirrors-mypy hook for local runs
- add optional mypy-full hook and docs so devs can match CI
- run full mypy in CI with a hint when failures occur
### Test Plan
- uv run pre-commit run mypy --all-files
- uv run pre-commit run mypy-full --hook-stage manual --all-files
- uv run --group dev --group type_checking mypy