Commit graph

760 commits

Author SHA1 Message Date
ehhuang
84a84ee85c
fix: last_id when listing files in vector store (#4079)
# What does this PR do?
the last_id should be the id of the last item in the returned list, not
the unfiltered list.

## Test Plan
fixed test
2025-11-05 14:10:10 -08:00
Emilio Garcia
ba50790a28
feat(tests): metrics tests (#3966)
# What does this PR do?
1. Make telemetry tests as easy as possible for users by expanding the
`SpanStub` data class and creating the `MetricStub` dataclass as a way
to consistently marshal telemetry data in test fixtures and unmarshal
and handle it in tests.
2. Structure server and client tests to always follow the same standards
for consistent testing experience by using the `SpanStub` and
`MetricStub` data class objects.
3. Enable Metrics Testing for completions endpoint
4. Correct token metrics to use histograms instead of counts to capture
tokens per request rather than a cumulative count of tokens over the
lifecycle of the server.

## Test Plan
These are tests
2025-11-05 10:26:15 -08:00
Roy Belio
2619f3552e
fix: show built-in distributions in llama stack list (#4040)
# What does this PR do?
Fixes issue #3922 where `llama stack list` only showed distributions
after they were run. This PR makes the command show all available
distributions immediately on a fresh install.

Closes #3922

## Changes
- **Updated `_get_distribution_dirs()`** to discover both built-in and
built distributions:
- Built-in distributions from `src/llama_stack/distributions/` (e.g.,
starter, nvidia, dell)
  - Built distributions from `~/.llama/distributions`
- **Added a "Source" column** to distinguish between "built-in" and
"built" distributions
- **Built distributions override built-in ones** with the same name
(expected behavior)
- **Updated config file detection logic** to handle both naming
conventions:
  - Built-in: `build.yaml` and `run.yaml`
  - Built: `{name}-build.yaml` and `{name}-run.yaml`

## Test Plan
### Unit Tests
Added comprehensive unit tests in
`tests/unit/distribution/test_stack_list.py`:
```bash
uv run pytest tests/unit/distribution/test_stack_list.py -v
```
**Result**:  All 8 tests pass
- `test_builtin_distros_shown_without_running` - Verifies the core fix
for issue #3922
- `test_builtin_and_built_distros_shown_together` - Ensures both types
are shown
- `test_built_distribution_overrides_builtin` - Tests override behavior
- `test_empty_distributions` - Edge case handling
- `test_config_files_detection_builtin` - Config file detection for
built-in distros
- `test_config_files_detection_built` - Config file detection for built
distros
- `test_llamastack_prefix_stripped` - Name normalization
- `test_hidden_directories_ignored` - Filters hidden directories

### Manual Testing
**Before the fix** (simulated with empty `~/.llama/distributions`):
```bash
$ llama stack list
No stacks found in ~/.llama/distributions
```

**After the fix**:
```bash
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ci-tests          │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ dell              │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ meta-reference-g… │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ nvidia            │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ open-benchmark    │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ postgres-demo     │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter-gpu       │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ watsonx           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```

**After running a distribution**:
```bash
$ llama stack run starter  # Creates ~/.llama/distributions/starter
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
│ starter           │ built    │ ~/.llama/distri…  │ No           │ No         │
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```
Note how `starter` now shows as "built" and points to
`~/.llama/distributions`, overriding the built-in version.

## Breaking Changes
**No breaking changes** - This is a bug fix that improves user
experience with minimal risk:
- No programmatic parsing of output found in the codebase
- Table format is clearly for human consumption
- The new "Source" column helps users understand where distributions
come from
- The behavior change is exactly what users expect (seeing all available
distributions)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-05 10:16:28 -08:00
Ashwin Bharambe
4d3069bfa5
chore(ci): remove unused recordings (#4074)
Added a script to cleanup recordings. While doing this, moved the CI
matrix generation to a separate script so there is a single source of
truth for the matrix.

Ran the cleanup script as:
```
PYTHONPATH=. python scripts/cleanup_recordings.py
```

Also added this as part of the pre-commit workflow to ensure that the
recordings are always up to date and that no stale recordings are left
in the repo.
2025-11-05 09:21:58 -08:00
Sébastien Han
fd1603beef
chore: remove unused classes (#4077)
# What does this PR do?

These were maybe be included in the webmethod?
The unit test was pointless too since the request was never used
anywhere?

This shouldn't be in the API definition, if we never consume it.

## Test Plan

CI with pre-commit on OpenAPI spec generation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-05 16:45:23 +01:00
Ashwin Bharambe
a8a8aa56c0
chore!: remove the agents (sessions and turns) API (#4055)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m10s
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed

The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.

Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
2025-11-04 09:38:39 -08:00
Mustafa Elbehery
a6ddbae0ed
chore(test): migrate unit tests from unittest to pytest nvidia test eval (#3249)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 1m16s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR migrates `unittest` to `pytest` in
`tests/unit/providers/nvidia/test_eval.py`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Part of https://github.com/llamastack/llama-stack/issues/2680

Supersedes https://github.com/llamastack/llama-stack/pull/2791

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-11-04 10:29:07 +01:00
Ashwin Bharambe
cb40da210f
fix: update tests for OpenAI-style models endpoint (#4053)
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.

**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
2025-11-03 17:30:08 -08:00
Ashwin Bharambe
44096512b5
feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051)
We need to remove `/v1/openai/v1` paths shortly. There is one trouble --
our current `/v1/openai/v1/models` endpoint provides different data than
`/v1/models`. Unfortunately our tests target the latter (llama-stack
customized) behavior. We need to get to true OpenAI compatibility.

This is step 1: adding `custom_metadata` field to `OpenAIModel` that
includes all the extra stuff we add in the native `/v1/models` response.
This can be extracted on the consumer end by look at
`__pydantic_extra__` or other similar fields.

This PR:
- Adds `custom_metadata` field to `OpenAIModel` class in
`src/llama_stack/apis/models/models.py`
- Modified `openai_list_models()` in
`src/llama_stack/core/routing_tables/models.py` to populate
custom_metadata

Next Steps
1. Update stainless client to use `/v1/openai/v1/models` instead of
`/v1/models`
2. Migrate tests to read from `custom_metadata`
3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single
`/v1/models` endpoint
2025-11-03 15:56:07 -08:00
Derek Higgins
1562277cfd
ci: test adjustments for Qwen3-0.6B (#3978)
Without this hint Qwen3-0.6B tends to reply with the full name
and sometimes doesn't reply with the correct drafted year.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 12:19:35 -08:00
Matthew Farrellee
1263448de2
fix: allowed_models config did not filter models (#4030)
# What does this PR do?

closes #4022 

## Test Plan

ci w/ new tests

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:43:39 -08:00
Charlie Doern
30f8921240
fix: generate provider config when using --providers (#4044)
# What does this PR do?

call the sample_run_config method for providers that have it when
generating a run config using `llama stack run --providers`. This will
propagate API keys

resolves #4032


## Test Plan

new unit test checks the output of using `--providers` to ensure
`api_key` is in the config.

manual testing:

```
╰─ llama stack list-deps --providers=inference=remote::openai --format uv | sh
Using Python 3.12.11 environment at: venv
Audited 7 packages in 8ms

╰─ llama stack run --providers=inference=remote::openai
INFO     2025-11-03 14:33:02,094 llama_stack.cli.stack.run:161 cli: Writing generated config to:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,096 llama_stack.cli.stack.run:169 cli: Using run configuration:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates:
           Key: None
           Cert: None
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:230 cli: Listening on 0.0.0.0:8321
INFO     2025-11-03 14:33:02,145 llama_stack.core.server.server:513 core::server: Run configuration:
INFO     2025-11-03 14:33:02,146 llama_stack.core.server.server:516 core::server: apis:
         - inference
         image_name: providers-run
         providers:
           inference:
           - config:
               api_key: '********'
               base_url: https://api.openai.com/v1
             provider_id: openai
             provider_type: remote::openai
         registered_resources:
           benchmarks: []
           datasets: []
           models: []
           scoring_fns: []
           shields: []
           tool_groups: []
           vector_stores: []
         server:
           port: 8321
           workers: 1
         storage:
           backends:
             kv_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/kvstore.db
               type: kv_sqlite
             sql_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/sql_store.db
               type: sql_sqlite
           stores:
             conversations:
               backend: sql_default
               table_name: openai_conversations
             inference:
               backend: sql_default
               max_write_queue_size: 10000
               num_writers: 4
               table_name: inference_store
             metadata:
               backend: kv_default
               namespace: registry
             prompts:
               backend: kv_default
               namespace: prompts
         telemetry:
           enabled: false
         version: 2

INFO     2025-11-03 14:33:02,299 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue
         disabled for SQLite to avoid concurrency issues
INFO     2025-11-03 14:33:05,272 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils:
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models
INFO     2025-11-03 14:33:05,368 uvicorn.error:84 uncategorized: Started server process [69109]
INFO     2025-11-03 14:33:05,369 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-11-03 14:33:05,370 llama_stack.core.server.server:172 core::server: Starting up Llama Stack server
         (version: 0.3.0)
INFO     2025-11-03 14:33:05,370 llama_stack.core.stack:495 core: starting registry refresh task
INFO     2025-11-03 14:33:05,370 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-11-03 14:33:05,371 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C
         to quit)
INFO     2025-11-03 14:34:19,242 uvicorn.access:473 uncategorized: 127.0.0.1:63102 - "POST /v1/chat/completions
         HTTP/1.1" 200
```

client:

```
curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
 "model": "openai/gpt-5",
 "messages": [
     {"role": "user", "content": "What is 1 + 2"}
 ]
}'
{"id":"...","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"3","refusal":null,"role":"assistant","annotations":[],"audio":null,"function_call":null,"tool_calls":null}}],"created":1762198455,"model":"openai/gpt-5","object":"chat.completion","service_tier":"default","system_fingerprint":null,"usage":{"completion_tokens":10,"prompt_tokens":13,"total_tokens":23,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0}}}%
```

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:37:58 -08:00
Charlie Doern
93401836b7
feat: llama stack run --providers (#3989)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Pre-commit / pre-commit (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 56s
# What does this PR do?

llama stack run --providers takes a list of providers in the format of
api1=provider1,api2=provider2

this allows users to run with a simple list of providers.

given the architecture of `create_app`, this run config needs to be
written to disk. use ~/.llama/distribution/providers-run/run.yaml each
time for consistency

resolves #3956

## Test Plan

new unit tests to ensure --providers.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-31 16:21:32 -07:00
raghotham
62603d25c2
chore(api)!: /v1/inspect only lists v1 apis by default (#3948)
# What does this PR do?
Allow filtering for v1alpha, v1beta, deprecated and v1. Backward
incompatible change since by default it only returns v1 apis now.

## Test Plan
added unit test
2025-10-31 11:55:46 -07:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
Doug Edgar
e8cd8508b5
fix: handle missing external_providers_dir (#3974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 50s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the handling of the external_providers_dir configuration
field to align with its ongoing deprecation, in favor of the provider
`module` specification approach.

It addresses the issue in #3950, where using the default provided
run.yaml config resulted in the `external_providers_dir` parameter being
set to the literal string `None`, and crashing the llama-stack server
when starting.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3950 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- Built a new container image from `podman build . -f
containers/Containerfile --build-arg DISTRO_NAME=starter --tag
llama-stack:starter`
- Tested it locally with `podman run -it localhost/llama-stack:starter`
- Tested it on an OpenShift 4.19 cluster, deployed via the
llama-stack-k8s-operator.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-10-30 17:01:31 -07:00
Derek Higgins
ff2b270e2f
fix: relax structured output test assertions to handle whitespace and… (#3997)
… case variations

The ollama/llama3.2:3b-instruct-fp16 model returns string values with
trailing whitespace in structured JSON output. Updated test assertions
to use case-insensitive substring matching instead of exact equality.

Use .lower() for case-insensitive comparison
Check if expected value is contained in actual value (handles
whitespace)

Closes: #3996

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-30 16:55:23 -07:00
Ashwin Bharambe
77c8bc6fa7
fix(ci): add back server:ci-tests to replay tests (#3976)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Python Package Build Test / build (3.12) (push) Failing after 39s
Unit Tests / unit-tests (3.12) (push) Failing after 40s
UI Tests / ui-tests (22) (push) Successful in 42s
It is useful for local debugging. If both server and docker are failing,
you can just run server locally to debug which is much easier to do.
2025-10-30 11:02:59 -07:00
ehhuang
5e20938832
fix: remove LLAMA_STACK_TEST_FORCE_SERVER_RESTART setting in fixture (#3982)
# What does this PR do?
this is meant to be a manual flag

## Test Plan
CI
2025-10-30 09:13:04 -07:00
Derek Higgins
19d85003de
test: Updated test skips that were marked with "inline::vllm" (#3979)
This should be "remote::vllm". This causes some log probs tests to be
skipped with remote vllm. (They
fail if run).

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-30 14:48:21 +01:00
Charlie Doern
e8ecc99524
fix!: remove chunk_id property from Chunk class (#3954)
# What does this PR do?

chunk_id in the Chunk class executes actual logic to compute a chunk ID.
This sort of logic should not live in the API spec.

Instead, the providers should be in charge of calling generate_chunk_id,
and pass it to `Chunk`.

this removes the incorrect dependency between Provider impl and API impl

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-29 18:59:59 -07:00
Ashwin Bharambe
c9d4b6c54f
chore(mypy): part-04 resolve mypy errors in meta_reference agents (#3969)
## Summary
Fixes all mypy type errors in `providers/inline/agents/meta_reference/`
and removes exclusions from pyproject.toml.

## Changes
- Fix type annotations for Safety API message parameters
(OpenAIMessageParam)
- Add Action enum usage in access control checks
- Correct method signatures to match API supertype (parameter ordering)
- Handle optional return types with proper None checks
- Remove 3 meta_reference exclusions from mypy config

**Files fixed:** 25 errors across 3 files (safety.py, persistence.py,
agents.py)
2025-10-29 13:37:28 -07:00
Ashwin Bharambe
e809d21357
feat: add backward compatibility tests for run.yaml (#3952)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (push) Failing after 45s
API Conformance Tests / check-schema-compatibility (push) Successful in 54s
UI Tests / ui-tests (22) (push) Successful in 52s
Pre-commit / pre-commit (push) Successful in 3m28s
This adds automated backward compatibility testing for `run.yaml` files.
As we evolve `StackRunConfig`, changes can inadvertently break existing
user configurations. This workflow catches those breaks before merge.

We test old run.yaml files (from main and the latest release) against
the PR's new code. If configs that worked before now fail, the PR is
blocked unless explicitly acknowledged as a breaking change.

**Two test layers:**
- Schema validation: Quick pytest checks that configs parse without
errors
- Integration tests: Full test suite execution to catch runtime semantic
issues (cross-field validations, provider initialization, etc.)

**What we test against:**
- main branch: Breaking changes here block the PR (this is the gate)
- Latest release: Informational only - shows if we've drifted from what
users have

If tests fail, the PR author must acknowledge the breaking change by
adding `!:` to the PR title (e.g., `feat!: change xyz`) or including
`BREAKING CHANGE:` in a commit message. Once acknowledged, the check
passes with a warning.

These jobs are run:
1. `check-main-compatibility` - Schema validation of all distribution
run.yaml files from main
2. `test-integration-main` - Full integration test suite using main's
ci-tests run.yaml
3. `test-integration-release` - Integration tests with latest release
config (informational)
4. `check-schema-release-compatibility` - Schema checks against release
(informational)

The integration tests catch issues that schema validation alone would
miss, like assertion failures in
`StackRunConfig.validate_server_stores()` or provider-specific runtime
logic.

Resolves #3311
Related to #3237
2025-10-28 21:51:56 -07:00
ehhuang
1aa8979050
test: enable telemetry tests in server mode (#3927)
# What does this PR do?
- added a server-based test OLTP collector

## Test Plan
CI
2025-10-28 16:33:48 -07:00
ehhuang
1f9d48cd54
feat: openai files provider (#3946)
# What does this PR do?
- Adds OpenAI files provider 
- Note that file content retrieval is pretty limited by `purpose`
https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com

## Test Plan
Modify run yaml to use openai files provider:
```
  files:
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY:=}
      metadata_store:
        backend: sql_default
        table_name: openai_files_metadata

# Then run files tests
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files
```
2025-10-28 16:25:03 -07:00
Ashwin Bharambe
f88416ef87
fix(inference): enable routing of models with provider_data alone (#3928)
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.

Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.

Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.

Also, updated inference router so that the responses have the _exact_
model that the request had.

## Test Plan

Added an integration test

Closes #3929

---------

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-10-28 11:16:37 -07:00
Ashwin Bharambe
4e6c769cc4
fix(context): prevent provider data leak between streaming requests (#3924)
## Summary

- `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and
other context vars) populated after a streaming generator completed on
HEAD~1, so the asyncio context for request N+1 started with request N's
provider payload.
- FastAPI dependencies and middleware execute before
`request_provider_data_context` rebinds the header data, meaning
auth/logging hooks could observe a prior tenant's credentials or treat
them as authenticated. Traces and any background work that inspects the
context outside the `with` block leak as well—this is a real security
regression, not just a CLI artifact.
- The wrapper now restores each tracked `ContextVar` to the value it
held before the iteration (falling back to clearing when necessary)
after every yield and when the generator terminates, so provider data is
wiped while callers that set their own defaults keep them.

## Test Plan

- `uv run pytest tests/unit/core/test_provider_data_context.py -q`
- `uv run pytest tests/unit/distribution/test_context.py -q`

Both suites fail on HEAD~1 and pass with this change.
2025-10-27 23:01:12 -07:00
ehhuang
b7dd3f5c56
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do?


## Test Plan
CI
vector_io tests will fail until next client sync

passed with
https://github.com/llamastack/llama-stack-client-python/pull/286 checked
out locally
2025-10-27 14:26:06 -07:00
Matthew Farrellee
a9b00db421
feat: add provider data keys for Cerebras, Databricks, NVIDIA, and RunPod (#3734)
# What does this PR do?

add provider-data key passing support to Cerebras, Databricks, NVIDIA
and RunPod

also, added missing tests for Fireworks, Anthropic, Gemini, SambaNova,
and vLLM

addresses #3517 

## Test Plan

ci w/ new tests

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-27 13:09:35 -07:00
IAN MILLER
98a5047f9d
feat(prompts): attach prompts to storage stores in run configs (#3893)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for attaching prompts to storage stores in run
configs. It allows to specify prompts as stores in different
distributions. The need of this functionality was initiated in #3514

> Note, #3514 is divided on three separate PRs. Current PR is the first
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing and updated CI unit tests

Prerequisites:

1. `uv run --with llama-stack llama stack list-deps starter | xargs -L1
uv pip install`

2. `llama stack run starter `

```
INFO     2025-10-23 15:36:17,387 llama_stack.cli.stack.run:100 cli: Using run configuration:                            
         /Users/ianmiller/llama-stack/llama_stack/distributions/starter/run.yaml                                        
INFO     2025-10-23 15:36:17,423 llama_stack.cli.stack.run:157 cli: HTTPS enabled with certificates:                    
           Key: None                                                                                                    
           Cert: None                                                                                                   
INFO     2025-10-23 15:36:17,424 llama_stack.cli.stack.run:159 cli: Listening on ['::', '0.0.0.0']:8321                 
INFO     2025-10-23 15:36:17,749 llama_stack.core.server.server:521 core::server: Run configuration:                    
INFO     2025-10-23 15:36:17,756 llama_stack.core.server.server:524 core::server: apis:                                 
         - agents                                                                                                       
         - batches                                                                                                      
         - datasetio                                                                                                    
         - eval                                                                                                         
         - files                                                                                                        
         - inference                                                                                                    
         - post_training                                                                                                
         - safety                                                                                                       
         - scoring                                                                                                      
         - tool_runtime                                                                                                 
         - vector_io                                                                                                    
         image_name: starter                                                                                            
         providers:                                                                                                     
           agents:                                                                                                      
           - config:                                                                                                    
               persistence:                                                                                             
                 agent_state:                                                                                           
                   backend: kv_default                                                                                  
                   namespace: agents                                                                                    
                 responses:                                                                                             
                   backend: sql_default                                                                                 
                   max_write_queue_size: 10000                                                                          
                   num_writers: 4                                                                                       
                   table_name: responses                                                                                
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           batches:                                                                                                     
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: batches                                                                                     
             provider_id: reference                                                                                     
             provider_type: inline::reference                                                                           
           datasetio:                                                                                                   
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::huggingface                                                                      
             provider_id: huggingface                                                                                   
             provider_type: remote::huggingface                                                                         
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::localfs                                                                          
             provider_id: localfs                                                                                       
             provider_type: inline::localfs                                                                             
           eval:                                                                                                        
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: eval                                                                                        
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           files:                                                                                                       
           - config:                                                                                                    
               metadata_store:                                                                                          
                 backend: sql_default                                                                                   
                 table_name: files_metadata                                                                             
               storage_dir: /Users/ianmiller/.llama/distributions/starter/files                                         
             provider_id: meta-reference-files                                                                          
             provider_type: inline::localfs                                                                             
           inference:                                                                                                   
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.fireworks.ai/inference/v1                                                               
             provider_id: fireworks                                                                                     
             provider_type: remote::fireworks                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.together.xyz/v1                                                                         
             provider_id: together                                                                                      
             provider_type: remote::together                                                                            
           - config: {}                                                                                                 
             provider_id: bedrock                                                                                       
             provider_type: remote::bedrock                                                                             
           - config:                                                                                                    
               api_key: '********'                                                                                      
               base_url: https://api.openai.com/v1                                                                      
             provider_id: openai                                                                                        
             provider_type: remote::openai                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: anthropic                                                                                     
             provider_type: remote::anthropic                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: gemini                                                                                        
             provider_type: remote::gemini                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.groq.com                                                                                
             provider_id: groq                                                                                          
             provider_type: remote::groq                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.sambanova.ai/v1                                                                         
             provider_id: sambanova                                                                                     
             provider_type: remote::sambanova                                                                           
           - config: {}                                                                                                 
             provider_id: sentence-transformers                                                                         
             provider_type: inline::sentence-transformers                                                               
           post_training:                                                                                               
           - config:                                                                                                    
               checkpoint_format: meta                                                                                  
             provider_id: torchtune-cpu                                                                                 
             provider_type: inline::torchtune-cpu                                                                       
           safety:                                                                                                      
           - config:                                                                                                    
               excluded_categories: []                                                                                  
             provider_id: llama-guard                                                                                   
             provider_type: inline::llama-guard                                                                         
           - config: {}                                                                                                 
             provider_id: code-scanner                                                                                  
             provider_type: inline::code-scanner                                                                        
           scoring:                                                                                                     
           - config: {}                                                                                                 
             provider_id: basic                                                                                         
             provider_type: inline::basic                                                                               
           - config: {}                                                                                                 
             provider_id: llm-as-judge                                                                                  
             provider_type: inline::llm-as-judge                                                                        
           - config:                                                                                                    
               openai_api_key: '********'                                                                               
             provider_id: braintrust                                                                                    
             provider_type: inline::braintrust                                                                          
           tool_runtime:                                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: brave-search                                                                                  
             provider_type: remote::brave-search                                                                        
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: tavily-search                                                                                 
             provider_type: remote::tavily-search                                                                       
           - config: {}                                                                                                 
             provider_id: rag-runtime                                                                                   
             provider_type: inline::rag-runtime                                                                         
           - config: {}                                                                                                 
             provider_id: model-context-protocol                                                                        
             provider_type: remote::model-context-protocol                                                              
           vector_io:                                                                                                   
           - config:                                                                                                    
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::faiss                                                                            
             provider_id: faiss                                                                                         
             provider_type: inline::faiss                                                                               
           - config:                                                                                                    
               db_path: /Users/ianmiller/.llama/distributions/starter/sqlite_vec.db                                     
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::sqlite_vec                                                                       
             provider_id: sqlite-vec                                                                                    
             provider_type: inline::sqlite-vec                                                                          
         registered_resources:                                                                                          
           benchmarks: []                                                                                               
           datasets: []                                                                                                 
           models: []                                                                                                   
           scoring_fns: []                                                                                              
           shields: []                                                                                                  
           tool_groups:                                                                                                 
           - provider_id: tavily-search                                                                                 
             toolgroup_id: builtin::websearch                                                                           
           - provider_id: rag-runtime                                                                                   
             toolgroup_id: builtin::rag                                                                                 
           vector_stores: []                                                                                            
         server:                                                                                                        
           port: 8321                                                                                                   
         storage:                                                                                                       
           backends:                                                                                                    
             kv_default:                                                                                                
               db_path: /Users/ianmiller/.llama/distributions/starter/kvstore.db                                        
               type: kv_sqlite                                                                                          
             sql_default:                                                                                               
               db_path: /Users/ianmiller/.llama/distributions/starter/sql_store.db                                      
               type: sql_sqlite                                                                                         
           stores:                                                                                                      
             conversations:                                                                                             
               backend: sql_default                                                                                     
               table_name: openai_conversations                                                                         
             inference:                                                                                                 
               backend: sql_default                                                                                     
               max_write_queue_size: 10000                                                                              
               num_writers: 4                                                                                           
               table_name: inference_store                                                                              
             metadata:                                                                                                  
               backend: kv_default                                                                                      
               namespace: registry                                                                                      
             prompts:                                                                                                   
               backend: kv_default                                                                                      
               namespace: prompts                                                                                       
         telemetry:                                                                                                     
           enabled: true                                                                                                
         vector_stores:                                                                                                 
           default_embedding_model:                                                                                     
             model_id: nomic-ai/nomic-embed-text-v1.5                                                                   
             provider_id: sentence-transformers                                                                         
           default_provider_id: faiss                                                                                   
         version: 2                                                                                                     
                                                                                                                        
INFO     2025-10-23 15:36:20,032 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue        
         disabled for SQLite to avoid concurrency issues                                                                
WARNING  2025-10-23 15:36:20,422 llama_stack.providers.inline.telemetry.meta_reference.telemetry:84 telemetry:          
         OTEL_EXPORTER_OTLP_ENDPOINT is not set, skipping telemetry                                                     
INFO     2025-10-23 15:36:22,379 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils:               
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models                                           
INFO     2025-10-23 15:36:22,703 uvicorn.error:84 uncategorized: Started server process [17328]                         
INFO     2025-10-23 15:36:22,704 uvicorn.error:48 uncategorized: Waiting for application startup.                       
INFO     2025-10-23 15:36:22,706 llama_stack.core.server.server:179 core::server: Starting up Llama Stack server        
         (version: 0.3.0)                                                                                               
INFO     2025-10-23 15:36:22,707 llama_stack.core.stack:470 core: starting registry refresh task                        
INFO     2025-10-23 15:36:22,708 uvicorn.error:62 uncategorized: Application startup complete.                          
INFO     2025-10-23 15:36:22,708 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)   
```
As you can see, prompts are attached to stores in config

Testing:

1. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \                 
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

2. Get prompt:

`curl -X GET
http://localhost:8321/v1/prompts/pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f`

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

3. Query sqlite KV storage to check created prompt:

```
sqlite> .mode column
sqlite> .headers on
sqlite> SELECT * FROM kvstore WHERE key LIKE 'prompts:v1:%';
key                                                           value                                                         expiration
------------------------------------------------------------  ------------------------------------------------------------  ----------
prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  {"prompt_id": "pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab            
163f:1                                                        5f6e163f", "prompt": "Hello {{name}}! You are working at {{c            
                                                              ompany}}. Your role is {{role}} at {{company}}. Remember, {{            
                                                              name}}, to be {{tone}}.", "version": 1, "variables": ["name"            
                                                              , "company", "role", "tone"], "is_default": false}                      

prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  1                                                                       
163f:default                                                                                                                          
sqlite> 
```
2025-10-27 11:12:12 -07:00
Luis Tomas Bolivar
63422e5b36
fix!: Enhance response API support to not fail with tool calling (#3385)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 1m3s
Vector IO Integration Tests / test-matrix (push) Failing after 1m6s
API Conformance Tests / check-schema-compatibility (push) Successful in 1m17s
UI Tests / ui-tests (22) (push) Successful in 1m18s
Pre-commit / pre-commit (push) Successful in 3m5s
# What does this PR do?
Introduces two main fixes to enhance the stability of Responses API when
dealing with tool calling responses and structured outputs.

### Changes Made

1. It added OpenAIResponseOutputMessageMCPCall and ListTools to
OpenAIResponseInput but
https://github.com/llamastack/llama-stack/pull/3810 got merge that did
the same in a different way. Still this PR does it in a way that keep
the sync between OpenAIResponsesOutput and the allowed objects in
OpenAIResponseInput.

2. Add protection in case self.ctx.response_format does not have type
attribute

BREAKING CHANGE: OpenAIResponseInput now uses OpenAIResponseOutput union
type.
This is semantically equivalent - all previously accepted types are
still supported
via the OpenAIResponseOutput union. This improves type consistency and
maintainability.
2025-10-27 09:33:02 -07:00
Luis Tomas Bolivar
f18b5eb537
fix: Avoid BadRequestError due to invalid max_tokens (#3667)
This patch ensures if max tokens is not defined, then is set to None
instead of 0 when calling openai_chat_completion. This way some
providers (like gemini) that cannot handle the `max_tokens = 0` will not
fail

Issue: #3666
2025-10-27 09:27:21 -07:00
Derek Higgins
00d8414597
fix(tests): limit vector store providers for record mode in CI tests (#3898)
The vector_provider_wrapper was only limiting providers to
faiss/sqlite-vec for replay mode, but CI tests also run in record mode
with the same limited set of providers. This caused test failures when
trying to test against milvus, chromadb, pgvector, weaviate, and qdrant
which aren't configured in the record job.
2025-10-27 09:22:49 -07:00
ehhuang
8265d4efc8
chore(telemetry): code cleanup (#3897)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
UI Tests / ui-tests (22) (push) Successful in 43s
Pre-commit / pre-commit (push) Successful in 1m35s
# What does this PR do?
Clean up telemetry code since the telemetry API has been remove.
- moved telemetry files out of providers to core
- removed from Api

## Test Plan

❯ OTEL_SERVICE_NAME=llama_stack
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 uv run llama stack run
starter
❯ curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

-> verify traces in Grafana

CI
2025-10-23 23:13:02 -07:00
ehhuang
9916cb3b17
chore: support default model in moderations API (#3890)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?
https://platform.openai.com/docs/api-reference/moderations supports
optional model parameter.

This PR adds support for using moderations API with model=None if a
default shield id is provided via safety config.

## Test Plan
added tests

manual test:
```
> SAFETY_MODEL='together/meta-llama/Llama-Guard-4-12B'   uv run llama stack run starter
> curl http://localhost:8321/v1/moderations \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
        "hello"
    ]
  }'
```
2025-10-23 16:03:53 -07:00
Ashwin Bharambe
0e57233a0a
chore(misc): update datasets, benchmarks to use alpha, beta prefixes (#3891)
This will be landed together with
https://github.com/llamastack/llama-stack-client-python/pull/282 (hence
CI will be red on this one.)

I have verified locally that tests pass with the updated version of the
client-sdk.
2025-10-22 15:26:35 -07:00
Ashwin Bharambe
7918188f1e
fix(ci): enable responses tests in CI; suppress expected MCP auth error logs (#3889)
Let us enable responses suite in CI now.

Also a minor fix: MCP tool tests intentionally trigger authentication
failures to verify error handling, but the resulting error logs clutter
test output.
2025-10-22 14:59:42 -07:00
Ashwin Bharambe
7b90e0e9c8
test: suppress expected error logs in SSE test (#3886)
Our unit test outputs are filled with all kinds of obscene logs. This
makes it really hard to spot real issues quickly. The problem is that
these logs are necessary to output at the given logging level when the
server is operating normally. It's just that we don't want to see some
of them (especially the noisy ones) during tests.

This PR begins the cleanup. We pytest's caplog fixture to for
suppression.
2025-10-22 14:34:32 -07:00
Ashwin Bharambe
30ba8c8655
fix(responses): sync conversation before yielding terminal events in streaming (#3888)
Move conversation sync logic before yield to ensure it executes even
when
streaming consumers break early after receiving response.completed
event.

## Test Plan

```
OLLAMA_URL=http://localhost:11434 \
  pytest -sv tests/integration/responses/ \
  --stack-config server:ci-tests \
  --text-model ollama/llama3.2:3b-instruct-fp16 \
  --inference-mode live \
  -k conversation_multi
```

This test now passes.
2025-10-22 14:31:12 -07:00
dependabot[bot]
8885cea8d7
fix(conversations)!: update Conversations API definitions (was: bump openai from 1.107.0 to 2.5.0) (#3847)
Bumps [openai](https://github.com/openai/openai-python) from 1.107.0 to
2.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/releases">openai's
releases</a>.</em></p>
<blockquote>
<h2>v2.5.0</h2>
<h2>2.5.0 (2025-10-17)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.4.0...v2.5.0">v2.4.0...v2.5.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> api update (<a
href="8b280d57d6">8b280d5</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>bump <code>httpx-aiohttp</code> version to 0.1.9 (<a
href="67f2f0afe5">67f2f0a</a>)</li>
</ul>
<h2>v2.4.0</h2>
<h2>2.4.0 (2025-10-16)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.3.0...v2.4.0">v2.3.0...v2.4.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> Add support for gpt-4o-transcribe-diarize on
audio/transcriptions endpoint (<a
href="bdbe9b8f44">bdbe9b8</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>fix dangling comment (<a
href="da14e99606">da14e99</a>)</li>
<li><strong>internal:</strong> detect missing future annotations with
ruff (<a
href="2672b8f072">2672b8f</a>)</li>
</ul>
<h2>v2.3.0</h2>
<h2>2.3.0 (2025-10-10)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.2.0...v2.3.0">v2.2.0...v2.3.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> comparison filter in/not in (<a
href="aa49f626a6">aa49f62</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li><strong>package:</strong> bump jiter to &gt;=0.10.0 to support
Python 3.14 (<a
href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>)
(<a
href="aa445cab5c">aa445ca</a>)</li>
</ul>
<h2>v2.2.0</h2>
<h2>2.2.0 (2025-10-06)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.1.0...v2.2.0">v2.1.0...v2.2.0</a></p>
<h3>Features</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/blob/main/CHANGELOG.md">openai's
changelog</a>.</em></p>
<blockquote>
<h2>2.5.0 (2025-10-17)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.4.0...v2.5.0">v2.4.0...v2.5.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> api update (<a
href="8b280d57d6">8b280d5</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>bump <code>httpx-aiohttp</code> version to 0.1.9 (<a
href="67f2f0afe5">67f2f0a</a>)</li>
</ul>
<h2>2.4.0 (2025-10-16)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.3.0...v2.4.0">v2.3.0...v2.4.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> Add support for gpt-4o-transcribe-diarize on
audio/transcriptions endpoint (<a
href="bdbe9b8f44">bdbe9b8</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>fix dangling comment (<a
href="da14e99606">da14e99</a>)</li>
<li><strong>internal:</strong> detect missing future annotations with
ruff (<a
href="2672b8f072">2672b8f</a>)</li>
</ul>
<h2>2.3.0 (2025-10-10)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.2.0...v2.3.0">v2.2.0...v2.3.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> comparison filter in/not in (<a
href="aa49f626a6">aa49f62</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li><strong>package:</strong> bump jiter to &gt;=0.10.0 to support
Python 3.14 (<a
href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>)
(<a
href="aa445cab5c">aa445ca</a>)</li>
</ul>
<h2>2.2.0 (2025-10-06)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.1.0...v2.2.0">v2.1.0...v2.2.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> dev day 2025 launches (<a
href="38ac0093eb">38ac009</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="513ae76253"><code>513ae76</code></a>
release: 2.5.0 (<a
href="https://redirect.github.com/openai/openai-python/issues/2694">#2694</a>)</li>
<li><a
href="ebf32212f7"><code>ebf3221</code></a>
release: 2.4.0</li>
<li><a
href="e043d7b164"><code>e043d7b</code></a>
chore: fix dangling comment</li>
<li><a
href="25cbb74f83"><code>25cbb74</code></a>
feat(api): Add support for gpt-4o-transcribe-diarize on
audio/transcriptions ...</li>
<li><a
href="8cdfd0650e"><code>8cdfd06</code></a>
codegen metadata</li>
<li><a
href="d5c64434b7"><code>d5c6443</code></a>
codegen metadata</li>
<li><a
href="b20a9e7b81"><code>b20a9e7</code></a>
chore(internal): detect missing future annotations with ruff</li>
<li><a
href="e5f93f5dae"><code>e5f93f5</code></a>
release: 2.3.0</li>
<li><a
href="044878859c"><code>0448788</code></a>
feat(api): comparison filter in/not in</li>
<li><a
href="85a91ade61"><code>85a91ad</code></a>
chore(package): bump jiter to &gt;=0.10.0 to support Python 3.14 (<a
href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/openai/openai-python/compare/v1.107.0...v2.5.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=openai&package-manager=uv&previous-version=1.107.0&new-version=2.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-22 12:32:48 -07:00
Jiayi Ni
bb1ebb3c6b
feat: Add rerank models and rerank API change (#3831)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Extend the model type to include rerank models.
- Implement `rerank()` method in inference router.
- Add `rerank_model_list` to `OpenAIMixin` to enable providers to
register and identify rerank models
- Update documentation.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
pytest tests/unit/providers/utils/inference/test_openai_mixin.py
```
2025-10-22 12:02:28 -07:00
Ashwin Bharambe
c0c0e337d9 misc(tests): add recordings for responses tests 2025-10-21 16:39:08 -07:00
slekkala1
eb2b240594
fix: remove consistency checks (#3881)
# What does this PR do?
metadata is conflicting with the default embedding model set on server
side via extra body, removing the check and just letting metadata take
precedence over extra body

`ValueError: Embedding model inconsistent between metadata
('text-embedding-3-small') and extra_body
     ('sentence-transformers/nomic-ai/nomic-embed-text-v1.5')`
## Test Plan
CI
2025-10-21 14:40:14 -07:00
Ashwin Bharambe
bd3c473208
revert: "chore(cleanup)!: remove tool_runtime.rag_tool" (#3877)
Reverts llamastack/llama-stack#3871

This PR broke RAG (even from Responses -- there _is_ a dependency)
2025-10-21 11:22:06 -07:00
Ashwin Bharambe
0e96279bee
chore(cleanup)!: remove tool_runtime.rag_tool (#3871)
Kill the `builtin::rag` tool group completely since it is no longer
targeted. We use the Responses implementation for knowledge_search which
uses the `openai_vector_stores` pathway.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-20 22:26:21 -07:00
Ashwin Bharambe
122de785c4
chore(cleanup)!: kill vector_db references as far as possible (#3864)
There should not be "vector db" anywhere.
2025-10-20 20:06:16 -07:00
ehhuang
444f6c88f3
chore: remove build.py (#3869)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
Test llama stack list-deps / list-deps-from-config (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 23s
Test llama stack list-deps / list-deps (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 57s
Pre-commit / pre-commit (push) Successful in 1m52s
# What does this PR do?


## Test Plan
CI
2025-10-20 16:28:15 -07:00
Francisco Arceo
48581bf651
chore: Updating how default embedding model is set in stack (#3818)
# What does this PR do?

Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).

New config is simply (default for Starter distro):

```yaml
vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: sentence-transformers
    model_id: nomic-ai/nomic-embed-text-v1.5
```

## Test Plan
CI and Unit tests.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-20 14:22:45 -07:00
Ashwin Bharambe
2c43285e22
feat(stores)!: use backend storage references instead of configs (#3697)
**This PR changes configurations in a backward incompatible way.**

Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.

## Key Changes

- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.

## Migration

Before:
```yaml
metadata_store:
  type: sqlite
  db_path: ~/.llama/distributions/foo/registry.db
inference_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
conversations_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
```

After:
```yaml
storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: ~/.llama/distributions/foo/kvstore.db
    sql_default:
      type: sql_postgres
      host: ${env.POSTGRES_HOST}
      port: ${env.POSTGRES_PORT}
      db: ${env.POSTGRES_DB}
      user: ${env.POSTGRES_USER}
      password: ${env.POSTGRES_PASSWORD}
  stores:
    metadata:
      backend: kv_default
      namespace: registry
    inference:
      backend: sql_default
      table_name: inference_store
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      backend: sql_default
      table_name: openai_conversations
```

Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      kvstore:
        type: sqlite
        db_path: ~/.llama/distributions/foo/chroma.db
```

to:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      persistence:
        backend: kv_default
        namespace: vector_io::chroma_remote
```

Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
2025-10-20 13:20:09 -07:00
Shabana Baig
add64e8e2a
feat: Add instructions parameter in response object (#3741)
# Problem
The current inline provider appends the user provided instructions to
messages as a system prompt, but the returned response object does not
contain the instructions field (as specified in the OpenAI responses
spec).

# What does this PR do?
This pull request adds the instruction field to the response object
definition and updates the inline provider. It also ensures that
instructions from previous response is not carried over to the next
response (as specified in the openAI spec).

Closes #[3566](https://github.com/llamastack/llama-stack/issues/3566)

## Test Plan

- Tested manually for change in model response w.r.t supplied
instructions field.
- Added unit test to check that the instructions from previous response
is not carried over to the next response.
- Added integration tests to check instructions parameter in the
returned response object.
- Added new recordings for the integration tests.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-20 13:10:37 -07:00