# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the handling of the external_providers_dir configuration
field to align with its ongoing deprecation, in favor of the provider
`module` specification approach.
It addresses the issue in #3950, where using the default provided
run.yaml config resulted in the `external_providers_dir` parameter being
set to the literal string `None`, and crashing the llama-stack server
when starting.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#3950
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
- Built a new container image from `podman build . -f
containers/Containerfile --build-arg DISTRO_NAME=starter --tag
llama-stack:starter`
- Tested it locally with `podman run -it localhost/llama-stack:starter`
- Tested it on an OpenShift 4.19 cluster, deployed via the
llama-stack-k8s-operator.
Signed-off-by: Doug Edgar <dedgar@redhat.com>
… case variations
The ollama/llama3.2:3b-instruct-fp16 model returns string values with
trailing whitespace in structured JSON output. Updated test assertions
to use case-insensitive substring matching instead of exact equality.
Use .lower() for case-insensitive comparison
Check if expected value is contained in actual value (handles
whitespace)
Closes: #3996
Signed-off-by: Derek Higgins <derekh@redhat.com>
We will be updating our release procedure to be more "normal" or "sane".
We will
- create release branches like normal people
- land cherry-picks onto those branches
- run releases off of those branches
- no more "rc" branch pollution either
Given that, this PR cleans things up a bit
- Remove `-maint` suffix from release branch patterns in CI workflows
- Update branch matching to `release-X.Y.x` format
This should be "remote::vllm". This causes some log probs tests to be
skipped with remote vllm. (They
fail if run).
Signed-off-by: Derek Higgins <derekh@redhat.com>
`mypy` became very slow for the common path. This can make local
pre-commit runs very slow. Let's restore that.
- restore fast mirrors-mypy hook for local runs
- add optional mypy-full hook and docs so devs can match CI
- run full mypy in CI with a hint when failures occur
### Test Plan
- uv run pre-commit run mypy --all-files
- uv run pre-commit run mypy-full --hook-stage manual --all-files
- uv run --group dev --group type_checking mypy
# What does this PR do?
chunk_id in the Chunk class executes actual logic to compute a chunk ID.
This sort of logic should not live in the API spec.
Instead, the providers should be in charge of calling generate_chunk_id,
and pass it to `Chunk`.
this removes the incorrect dependency between Provider impl and API impl
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
When running ./scripts/integration-tests.sh --network host on mac fails
regularly due to how Docker runs on MacOS.
if on mac, keep network bridge mode.
before:
=== Starting Docker Container ===
Using image: localhost/distribution-ci-tests:dev
WARNING: Published ports are discarded when using host network mode
Waiting for Docker container to start...
❌ Docker container failed to start
Container logs:
INFO 2025-10-29 18:38:32,180 llama_stack.cli.stack.run:100 cli: Using
run configuration:
/workspace/src/llama_stack/distributions/ci-tests/run.yaml
... (stack starts but is not reachable on network)
after:
=== Starting Docker Container ===
Using image: localhost/distribution-ci-tests:dev
Using bridge networking with port mapping (non-Linux) Waiting for Docker
container to start...
✅ Docker container started successfully
=== Running Integration Tests ===
## Test Plan
integration tests pass!
Signed-off-by: Charlie Doern <cdoern@redhat.com>
## Summary
When users provide API keys via `X-LlamaStack-Provider-Data` header,
`models.list()` now returns models they can access from those providers,
not just pre-registered models from the registry.
This complements the routing fix from f88416ef8 which enabled inference
calls with `provider_id/model_id` format for unregistered models. Users
can now discover which models are available to them before making
inference requests.
The implementation reuses
`NeedsRequestProviderData.get_request_provider_data()` to validate
credentials, then dynamically fetches models from providers without
caching them since they're user-specific. Registry models take
precedence to respect any pre-configured aliases.
## Test Script
```python
#!/usr/bin/env python3
import json
import os
from openai import OpenAI
# Test 1: Without provider_data header
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy")
models = client.models.list()
anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id]
print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic")
# Test 2: With provider_data header containing Anthropic API key
anthropic_api_key = os.environ["ANTHROPIC_API_KEY"]
client_with_key = OpenAI(
base_url="http://localhost:8321/v1/openai/v1",
api_key="dummy",
default_headers={
"X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key})
}
)
models_with_key = client_with_key.models.list()
anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id]
print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic")
print(f"Anthropic models: {anthropic_with}")
assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key"
print("\n✓ Test passed!")
```
Run with a stack that has Anthropic provider configured (but without API
key in config):
```bash
ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py
```
## Summary
Fixes all mypy type errors in `providers/inline/agents/meta_reference/`
and removes exclusions from pyproject.toml.
## Changes
- Fix type annotations for Safety API message parameters
(OpenAIMessageParam)
- Add Action enum usage in access control checks
- Correct method signatures to match API supertype (parameter ordering)
- Handle optional return types with proper None checks
- Remove 3 meta_reference exclusions from mypy config
**Files fixed:** 25 errors across 3 files (safety.py, persistence.py,
agents.py)
## Summary
Resolves all mypy errors in meta reference agent OpenAI responses
implementation by adding proper type narrowing, None checks, and
Sequence type support.
## Changes
- Fixed streaming.py, openai_responses.py, utils.py, tool_executor.py,
agent_instance.py
- Added Sequence type support to schema generator (ensures correct JSON
schema generation)
- Applied union type narrowing and None checks throughout
## Test plan
- All modified files pass mypy type checking (0 errors)
- Schema generator produces correct `type: array` for Sequence types
---------
Co-authored-by: Claude <noreply@anthropic.com>
Error fixes in Agents implementation (`meta-reference` provider) --
adding proper type annotations and using type narrowing for optional
attributes. Essentially a bunch of `if x and x_foo := getattr(x, "foo")`
instead of `x.foo` directly
Part of ongoing mypy remediation effort.
---------
Co-authored-by: Claude <noreply@anthropic.com>
# What does this PR do?
this commit adds a new pre-commit hook to scan for non-FIPS compliant
function usage within llama-stack
Closes#3427
## Test Plan
Ran locally
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
This adds automated backward compatibility testing for `run.yaml` files.
As we evolve `StackRunConfig`, changes can inadvertently break existing
user configurations. This workflow catches those breaks before merge.
We test old run.yaml files (from main and the latest release) against
the PR's new code. If configs that worked before now fail, the PR is
blocked unless explicitly acknowledged as a breaking change.
**Two test layers:**
- Schema validation: Quick pytest checks that configs parse without
errors
- Integration tests: Full test suite execution to catch runtime semantic
issues (cross-field validations, provider initialization, etc.)
**What we test against:**
- main branch: Breaking changes here block the PR (this is the gate)
- Latest release: Informational only - shows if we've drifted from what
users have
If tests fail, the PR author must acknowledge the breaking change by
adding `!:` to the PR title (e.g., `feat!: change xyz`) or including
`BREAKING CHANGE:` in a commit message. Once acknowledged, the check
passes with a warning.
These jobs are run:
1. `check-main-compatibility` - Schema validation of all distribution
run.yaml files from main
2. `test-integration-main` - Full integration test suite using main's
ci-tests run.yaml
3. `test-integration-release` - Integration tests with latest release
config (informational)
4. `check-schema-release-compatibility` - Schema checks against release
(informational)
The integration tests catch issues that schema validation alone would
miss, like assertion failures in
`StackRunConfig.validate_server_stores()` or provider-specific runtime
logic.
Resolves#3311
Related to #3237
Remove unused methods that became obsolete after d266c59c: o
_compute_and_log_token_usage
o _count_tokens
o stream_tokens_and_compute_metrics
o count_tokens_and_compute_metrics
These methods are no longer referenced anywhere in the codebase
following the removal of deprecated inference.chat_completion
implementations.
---------
Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
# What does this PR do?
- Adds OpenAI files provider
- Note that file content retrieval is pretty limited by `purpose`
https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com
## Test Plan
Modify run yaml to use openai files provider:
```
files:
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY:=}
metadata_store:
backend: sql_default
table_name: openai_files_metadata
# Then run files tests
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files
```
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.
Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.
Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.
Also, updated inference router so that the responses have the _exact_
model that the request had.
## Test Plan
Added an integration test
Closes#3929
---------
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>