Commit graph

3186 commits

Author SHA1 Message Date
Wojciech-Rebisz
07c28cd519
fix: Avoid model_limits KeyError (#4060)
# What does this PR do?
It avoids model_limit KeyError while trying to get embedding models for
Watsonx

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/llamastack/llama-stack/issues/4059

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Start server with watsonx distro:
```bash
llama stack list-deps watsonx | xargs -L1 uv pip install
uv run llama stack run watsonx
```
Run 
```python
client = LlamaStackClient(base_url=base_url)
client.models.list()
```
Check if there is any embedding model available (currently there is not
a single one)
2025-11-05 10:34:40 -08:00
Emilio Garcia
ba50790a28
feat(tests): metrics tests (#3966)
# What does this PR do?
1. Make telemetry tests as easy as possible for users by expanding the
`SpanStub` data class and creating the `MetricStub` dataclass as a way
to consistently marshal telemetry data in test fixtures and unmarshal
and handle it in tests.
2. Structure server and client tests to always follow the same standards
for consistent testing experience by using the `SpanStub` and
`MetricStub` data class objects.
3. Enable Metrics Testing for completions endpoint
4. Correct token metrics to use histograms instead of counts to capture
tokens per request rather than a cumulative count of tokens over the
lifecycle of the server.

## Test Plan
These are tests
2025-11-05 10:26:15 -08:00
Roy Belio
2619f3552e
fix: show built-in distributions in llama stack list (#4040)
# What does this PR do?
Fixes issue #3922 where `llama stack list` only showed distributions
after they were run. This PR makes the command show all available
distributions immediately on a fresh install.

Closes #3922

## Changes
- **Updated `_get_distribution_dirs()`** to discover both built-in and
built distributions:
- Built-in distributions from `src/llama_stack/distributions/` (e.g.,
starter, nvidia, dell)
  - Built distributions from `~/.llama/distributions`
- **Added a "Source" column** to distinguish between "built-in" and
"built" distributions
- **Built distributions override built-in ones** with the same name
(expected behavior)
- **Updated config file detection logic** to handle both naming
conventions:
  - Built-in: `build.yaml` and `run.yaml`
  - Built: `{name}-build.yaml` and `{name}-run.yaml`

## Test Plan
### Unit Tests
Added comprehensive unit tests in
`tests/unit/distribution/test_stack_list.py`:
```bash
uv run pytest tests/unit/distribution/test_stack_list.py -v
```
**Result**:  All 8 tests pass
- `test_builtin_distros_shown_without_running` - Verifies the core fix
for issue #3922
- `test_builtin_and_built_distros_shown_together` - Ensures both types
are shown
- `test_built_distribution_overrides_builtin` - Tests override behavior
- `test_empty_distributions` - Edge case handling
- `test_config_files_detection_builtin` - Config file detection for
built-in distros
- `test_config_files_detection_built` - Config file detection for built
distros
- `test_llamastack_prefix_stripped` - Name normalization
- `test_hidden_directories_ignored` - Filters hidden directories

### Manual Testing
**Before the fix** (simulated with empty `~/.llama/distributions`):
```bash
$ llama stack list
No stacks found in ~/.llama/distributions
```

**After the fix**:
```bash
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ci-tests          │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ dell              │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ meta-reference-g… │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ nvidia            │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ open-benchmark    │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ postgres-demo     │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter-gpu       │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ watsonx           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```

**After running a distribution**:
```bash
$ llama stack run starter  # Creates ~/.llama/distributions/starter
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
│ starter           │ built    │ ~/.llama/distri…  │ No           │ No         │
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```
Note how `starter` now shows as "built" and points to
`~/.llama/distributions`, overriding the built-in version.

## Breaking Changes
**No breaking changes** - This is a bug fix that improves user
experience with minimal risk:
- No programmatic parsing of output found in the codebase
- Table format is clearly for human consumption
- The new "Source" column helps users understand where distributions
come from
- The behavior change is exactly what users expect (seeing all available
distributions)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-05 10:16:28 -08:00
Ashwin Bharambe
4d3069bfa5
chore(ci): remove unused recordings (#4074)
Added a script to cleanup recordings. While doing this, moved the CI
matrix generation to a separate script so there is a single source of
truth for the matrix.

Ran the cleanup script as:
```
PYTHONPATH=. python scripts/cleanup_recordings.py
```

Also added this as part of the pre-commit workflow to ensure that the
recordings are always up to date and that no stale recordings are left
in the repo.
2025-11-05 09:21:58 -08:00
Sébastien Han
fd1603beef
chore: remove unused classes (#4077)
# What does this PR do?

These were maybe be included in the webmethod?
The unit test was pointless too since the request was never used
anywhere?

This shouldn't be in the API definition, if we never consume it.

## Test Plan

CI with pre-commit on OpenAPI spec generation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-05 16:45:23 +01:00
Ashwin Bharambe
392e01dc79 chore: add stainless config
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 1m13s
name it to indicate it is not yet source of truth to avoid confusion
2025-11-04 15:44:07 -08:00
ehhuang
95b0493fae
chore: move src/llama_stack/ui to src/llama_stack_ui (#4068)
# What does this PR do?
This better separates UI from backend code, which was a point of
confusion often for our beloved AI friends.


## Test Plan
CI
2025-11-04 15:21:49 -08:00
Ashwin Bharambe
5850e3473f fix: remove straggler openapi HTML file 2025-11-04 14:54:33 -08:00
Ashwin Bharambe
0c49a53c97
chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067)
RAG aka file search is implemented via the Responses API by specifying
the file-search tool. The backend implementation remains unchanged. This
PR merely removes the directly exposed API surface which allowed users
to directly perform searches from the client.

This facility is now available via the `client.vector_store.search()`
OpenAI compatible API.
2025-11-04 14:50:54 -08:00
Ashwin Bharambe
a8a8aa56c0
chore!: remove the agents (sessions and turns) API (#4055)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m10s
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed

The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.

Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
2025-11-04 09:38:39 -08:00
Mustafa Elbehery
a6ddbae0ed
chore(test): migrate unit tests from unittest to pytest nvidia test eval (#3249)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 1m16s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR migrates `unittest` to `pytest` in
`tests/unit/providers/nvidia/test_eval.py`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Part of https://github.com/llamastack/llama-stack/issues/2680

Supersedes https://github.com/llamastack/llama-stack/pull/2791

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-11-04 10:29:07 +01:00
Ashwin Bharambe
053fc0ac39
chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m13s
This PR removes all routes which we had marked deprecated for the 0.3.0
release.

This includes:
- all the `/v1/openai/v1/` routes (the corresponding /v1 routes still
exist of course)
- the /agents API (which is superseded completely by Responses +
Conversations)
- several alpha routes which had a "v1" route to aide transitioning to
"v1alpha"

This is the corresponding client-python change:
https://github.com/llamastack/llama-stack-client-python/pull/294
2025-11-03 19:00:59 -08:00
Nathan Weinberg
62b3ad349a
fix: return to hardcoded model IDs for Vertex AI (#4041)
# What does this PR do?
partial revert of b67aef2

Vertex AI doesn't offer an endpoint for listing models from Google's
Model Garden

Return to hardcoded values until such an endpoint is available

Closes #3988 

## Test Plan
Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`,
`VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the
starter distribution
```bash
$ llama stack list-deps starter | xargs -L1 uv pip install
$ llama stack run starter
```

Client side, formerly broken cURL requests now working
```bash
$ curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.provider_id == "vertexai"))'
[
  {
    "identifier": "vertexai/vertex_ai/gemini-2.0-flash",
    "provider_resource_id": "vertex_ai/gemini-2.0-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-flash",
    "provider_resource_id": "vertex_ai/gemini-2.5-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-pro",
    "provider_resource_id": "vertex_ai/gemini-2.5-pro",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  }
]
$ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a
i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" | jq 
{                                                                                                                                    
  "id": "p8oIaYiQF8_PptQPo-GH8QQ",                                                                                                   
  "choices": [                                                                                                                       
    {                                                                                                                                
      "finish_reason": "stop",                                                                                                       
      "index": 0,                                                                                                                    
      "logprobs": null,                                                                                                              
      "message": {                                                                                                                   
        "content": "Hello there! How can I help you today?",                                                                         
        "refusal": null,                                                                                                             
        "role": "assistant",                                                                                                         
        "annotations": null,                                                                                                         
        "audio": null,                                                                                                               
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
...
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-03 17:38:16 -08:00
Ashwin Bharambe
cb40da210f
fix: update tests for OpenAI-style models endpoint (#4053)
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.

**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
2025-11-03 17:30:08 -08:00
Sébastien Han
4a5ef65286
chore!: remove SDG API (#4035)
# What does this PR do?

This API hasn't received any traction and close to zero interest from
the community. Let's revisit in the future if things change.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 16:12:06 -08:00
Ashwin Bharambe
44096512b5
feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051)
We need to remove `/v1/openai/v1` paths shortly. There is one trouble --
our current `/v1/openai/v1/models` endpoint provides different data than
`/v1/models`. Unfortunately our tests target the latter (llama-stack
customized) behavior. We need to get to true OpenAI compatibility.

This is step 1: adding `custom_metadata` field to `OpenAIModel` that
includes all the extra stuff we add in the native `/v1/models` response.
This can be extracted on the consumer end by look at
`__pydantic_extra__` or other similar fields.

This PR:
- Adds `custom_metadata` field to `OpenAIModel` class in
`src/llama_stack/apis/models/models.py`
- Modified `openai_list_models()` in
`src/llama_stack/core/routing_tables/models.py` to populate
custom_metadata

Next Steps
1. Update stainless client to use `/v1/openai/v1/models` instead of
`/v1/models`
2. Migrate tests to read from `custom_metadata`
3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single
`/v1/models` endpoint
2025-11-03 15:56:07 -08:00
Ashwin Bharambe
2381714904
fix: enable SQLite WAL mode to prevent database locking errors (#4048)
Fixes race condition causing "database is locked" errors during
concurrent writes to SQLite, particularly in streaming responses with
guardrails where multiple inference calls write simultaneously.

Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple
concurrent readers and one writer without blocking. Set busy_timeout to
5s so SQLite retries instead of failing immediately. Remove the logic
that disabled write queues for SQLite since WAL mode eliminates the
locking issues that prompted disabling them.

Fixes: test_output_safety_guardrails_safe_content[stream=True] flake
2025-11-03 15:27:41 -08:00
ehhuang
628e38b3d5
test: always start a new server in integration-tests.sh (#4050)
# What does this PR do?
This prevents interference from already running servers, and allows
multiple concurrent integration test runs. Unleash the AIs!

## Test Plan
start a LS server at port 8321

Then observe test uses port 8322:

❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config
server:ci-tests --inference-mode replay --setup ollama --suite base
--pattern '(telemetry or safety)'
=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: ollama
Inference Mode: replay
Test Suite: base
Test Subdirs:
Test Pattern: (telemetry or safety)

Checking llama packages
llama-stack 0.4.0.dev0 /Users/erichuang/projects/new_test_server
llama-stack-client                       0.3.0
ollama                                   0.6.0
=== Applying Setup Environment Variables ===
Setting SQLITE_STORE_DIR:
/var/folders/cz/vyh7y1d11xg881lsxsshnc5c0000gn/T/tmp.bKLsaVAxyU
Setting stack config type: server
Setting up environment variables:
export OLLAMA_URL='http://0.0.0.0:11434'
export SAFETY_MODEL='ollama/llama-guard3:1b'

Will use port: 8322
=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start on port 8322...
 Llama Stack Server started successfully
2025-11-03 15:23:10 -08:00
Sébastien Han
da57b51fb6
ci: introduce Mergify bot to notify on PR conflicts (#4043)
This commit introduces Mergify, a powerful bot designed to assist with
automated merging and other CI-related tasks. As an initial step, we
enable a basic feature: automatically notifying users when a pull
request has merge conflicts.

When a conflict is detected, Mergify will add a label to the PR. This
label will be removed once the conflict is resolved.
This is foundation PR to activate the bot and start using it for
backports too.

In the future, we plan to expand Mergify’s role to include auto-merging,
as discussed in #1667, once the project is ready.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 12:21:19 -08:00
Derek Higgins
1562277cfd
ci: test adjustments for Qwen3-0.6B (#3978)
Without this hint Qwen3-0.6B tends to reply with the full name
and sometimes doesn't reply with the correct drafted year.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 12:19:35 -08:00
Matthew Farrellee
1263448de2
fix: allowed_models config did not filter models (#4030)
# What does this PR do?

closes #4022 

## Test Plan

ci w/ new tests

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:43:39 -08:00
Charlie Doern
30f8921240
fix: generate provider config when using --providers (#4044)
# What does this PR do?

call the sample_run_config method for providers that have it when
generating a run config using `llama stack run --providers`. This will
propagate API keys

resolves #4032


## Test Plan

new unit test checks the output of using `--providers` to ensure
`api_key` is in the config.

manual testing:

```
╰─ llama stack list-deps --providers=inference=remote::openai --format uv | sh
Using Python 3.12.11 environment at: venv
Audited 7 packages in 8ms

╰─ llama stack run --providers=inference=remote::openai
INFO     2025-11-03 14:33:02,094 llama_stack.cli.stack.run:161 cli: Writing generated config to:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,096 llama_stack.cli.stack.run:169 cli: Using run configuration:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates:
           Key: None
           Cert: None
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:230 cli: Listening on 0.0.0.0:8321
INFO     2025-11-03 14:33:02,145 llama_stack.core.server.server:513 core::server: Run configuration:
INFO     2025-11-03 14:33:02,146 llama_stack.core.server.server:516 core::server: apis:
         - inference
         image_name: providers-run
         providers:
           inference:
           - config:
               api_key: '********'
               base_url: https://api.openai.com/v1
             provider_id: openai
             provider_type: remote::openai
         registered_resources:
           benchmarks: []
           datasets: []
           models: []
           scoring_fns: []
           shields: []
           tool_groups: []
           vector_stores: []
         server:
           port: 8321
           workers: 1
         storage:
           backends:
             kv_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/kvstore.db
               type: kv_sqlite
             sql_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/sql_store.db
               type: sql_sqlite
           stores:
             conversations:
               backend: sql_default
               table_name: openai_conversations
             inference:
               backend: sql_default
               max_write_queue_size: 10000
               num_writers: 4
               table_name: inference_store
             metadata:
               backend: kv_default
               namespace: registry
             prompts:
               backend: kv_default
               namespace: prompts
         telemetry:
           enabled: false
         version: 2

INFO     2025-11-03 14:33:02,299 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue
         disabled for SQLite to avoid concurrency issues
INFO     2025-11-03 14:33:05,272 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils:
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models
INFO     2025-11-03 14:33:05,368 uvicorn.error:84 uncategorized: Started server process [69109]
INFO     2025-11-03 14:33:05,369 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-11-03 14:33:05,370 llama_stack.core.server.server:172 core::server: Starting up Llama Stack server
         (version: 0.3.0)
INFO     2025-11-03 14:33:05,370 llama_stack.core.stack:495 core: starting registry refresh task
INFO     2025-11-03 14:33:05,370 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-11-03 14:33:05,371 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C
         to quit)
INFO     2025-11-03 14:34:19,242 uvicorn.access:473 uncategorized: 127.0.0.1:63102 - "POST /v1/chat/completions
         HTTP/1.1" 200
```

client:

```
curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
 "model": "openai/gpt-5",
 "messages": [
     {"role": "user", "content": "What is 1 + 2"}
 ]
}'
{"id":"...","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"3","refusal":null,"role":"assistant","annotations":[],"audio":null,"function_call":null,"tool_calls":null}}],"created":1762198455,"model":"openai/gpt-5","object":"chat.completion","service_tier":"default","system_fingerprint":null,"usage":{"completion_tokens":10,"prompt_tokens":13,"total_tokens":23,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0}}}%
```

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:37:58 -08:00
Ashwin Bharambe
415fd9e36b
chore: bump version to 0.4.0.dev0 (#4018)
Some checks failed
Test llama stack list-deps / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test llama stack list-deps / list-deps (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 16s
Pre-commit / pre-commit (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 20s
Test Llama Stack Build / build (push) Failing after 15s
UI Tests / ui-tests (22) (push) Successful in 1m12s
Automated version bump after releasing 0.3.1

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-03 09:36:04 -08:00
Sébastien Han
d4aa348b60
chore: remove HTML generation for openapi spec (#4039)
# What does this PR do?

This seems to be an ancient artifact when we were using readthedocs? Now
docusaurus read the specs directly.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 18:03:40 +01:00
dependabot[bot]
7e294d33d9
chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 7.1.2 (#4023)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.0.1 to 7.1.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.2 🌈 Speed up extraction on Windows</h2>
<h2>Changes</h2>
<p><a href="https://github.com/lazka"><code>@​lazka</code></a> fixed a
bug that caused extracting uv to take up to 30s. Thank you!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Use tar for extracting the uv zip file on Windows too <a
href="https://github.com/lazka"><code>@​lazka</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.5 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li>
<li>Bump github/codeql-action from 4.30.8 to 4.30.9 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li>
</ul>
<h2>v7.1.1 🌈 Fix empty workdir detection and lowest resolution
strategy</h2>
<h2>Changes</h2>
<p>This release fixes a bug where the <code>working-directory</code>
input was not used to detect an empty work dir. It also fixes the
<code>lowest</code> resolution strategy resolving to latest when only a
lower bound was specified.</p>
<p>Special thanks to <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> for the
first contribution!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Fix &quot;lowest&quot; resolution strategy with lower-bound only <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li>Use working-directory to detect empty workdir <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.4 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li>chore: update known checksums for 0.9.3 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Change version in docs to v7 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump github/codeql-action from 4.30.7 to 4.30.8 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li>Bump actions/setup-node from 5.0.0 to 6.0.0 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/641">#641</a>)</li>
<li>Bump eifinger/actionlint-action from 1.9.1 to 1.9.2 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/634">#634</a>)</li>
<li>Update lockfile with latest npm <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/636">#636</a>)</li>
</ul>
<h2>v7.1.0 🌈 Support all the use cases</h2>
<h2>Changes</h2>
<p><strong>Support all the use cases!!!</strong></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="85856786d1"><code>8585678</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li>
<li><a
href="22d500a65c"><code>22d500a</code></a>
Bump github/codeql-action from 4.30.8 to 4.30.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li>
<li><a
href="14d557131d"><code>14d5571</code></a>
chore: update known checksums for 0.9.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li>
<li><a
href="29cd2350cd"><code>29cd235</code></a>
Use tar for extracting the uv zip file on Windows too (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li>
<li><a
href="2ddd2b9cb3"><code>2ddd2b9</code></a>
chore: update known checksums for 0.9.4 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li><a
href="b7bf78939d"><code>b7bf789</code></a>
Fix &quot;lowest&quot; resolution strategy with lower-bound only (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li><a
href="cb6c0a53d9"><code>cb6c0a5</code></a>
Change version in docs to v7 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
<li><a
href="dffc6292f2"><code>dffc629</code></a>
Use working-directory to detect empty workdir (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
<li><a
href="6e346e1653"><code>6e346e1</code></a>
chore: update known checksums for 0.9.3 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
<li><a
href="3ccd0fd498"><code>3ccd0fd</code></a>
Bump github/codeql-action from 4.30.7 to 4.30.8 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/setup-uv/compare/v6.0.1...85856786d1ce8acfbcc2f13a5f3fbd6b938f9f41">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=7.1.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-03 13:43:04 +01:00
Sébastien Han
3dbff6bf3f
fix: help mypy & fix precommit on main (#4037)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 21s
UI Tests / ui-tests (22) (push) Successful in 1m15s
# What does this PR do?

Add type to help mypy figure out.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 05:39:50 -05:00
Ashwin Bharambe
d45137a399
fix(ci): export UV_INDEX_STRATEGY to current shell before running uv sync (#4020)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 1m6s
Fixes latent bug where UV_INDEX_STRATEGY was only exported to GITHUB_ENV
but not to the current shell.

While this bug doesn't currently affect main (since UV_EXTRA_INDEX_URL
is only set on release branches), it's a latent bug that could cause
issues if the logic changes in the future or if someone tests with
UV_EXTRA_INDEX_URL set.

The setup-runner action only exported UV_INDEX_STRATEGY to GITHUB_ENV
(for subsequent steps), not to the current shell environment. Since uv
sync runs in the same step, it would never see the variable if it were
set.

This fix adds `export UV_INDEX_STRATEGY=unsafe-best-match` to make the
variable available in the current shell before running uv commands.

Related: #4019 (same fix for release-0.3.x where the bug is actively
triggered)
2025-11-01 12:57:24 -07:00
Charlie Doern
93401836b7
feat: llama stack run --providers (#3989)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Pre-commit / pre-commit (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 56s
# What does this PR do?

llama stack run --providers takes a list of providers in the format of
api1=provider1,api2=provider2

this allows users to run with a simple list of providers.

given the architecture of `create_app`, this run config needs to be
written to disk. use ~/.llama/distribution/providers-run/run.yaml each
time for consistency

resolves #3956

## Test Plan

new unit tests to ensure --providers.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-31 16:21:32 -07:00
Ashwin Bharambe
b2a5428a14
fix(ci): unset empty UV index env vars to prevent uv errors (#4012)
Fixes container builds failing with UV index strategy errors when build
args are passed with empty values.

Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="")
become environment variables with empty string values in RUN commands.
UV interprets these as if --index-strategy "" was passed on the command
line, causing build failures with "error: a value is required for
'--index-strategy <UV_INDEX_STRATEGY>'".

This is a footgun because empty string ≠ unset variable, and ARGs
silently propagate to all RUN commands, only failing when declared with
empty defaults.

The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of
RUN blocks, saves the values early, and only restores them for editable
installs with RC dependencies. All other install modes (PyPI, test-pypi,
client) now run with a clean environment.
2025-10-31 13:29:14 -07:00
Ashwin Bharambe
f8fe3018af
fix(ci): use test.pypi as extra index for RC dependencies (#4009)
Backports UV index configuration fixes from `release-0.3.x` (PR #4002). 

The main issue: when we created the release branch infrastructure, we
configured UV to use `test.pypi` as the PRIMARY index to resolve RC
dependencies. This caused UV to look for ALL packages there first, which
led to problems - some packages don't have binary wheels on `test.pypi`,
so UV tried building from source and failed (like the `psycopg2-binary`
issue we hit).

The fix is simple: use PyPI as primary (default) and `test.pypi` as an
EXTRA index. UV will check PyPI first for everything, and only fall back
to `test.pypi` for packages not found there (like our RC client
versions).

This PR includes:
- Fixed `install-llama-stack-client` action to output
`UV_EXTRA_INDEX_URL` instead of `UV_INDEX_URL`
- New `uv-run-with-index.sh` wrapper that auto-detects release branches
and sets UV env vars
- Updated pre-commit hooks (`uv-lock`, codegen, etc.) to use the wrapper
- Pass UV env vars as Docker build args in all locations
- Scope UV env vars properly in Containerfile (inline for llama-stack
install, explicitly unset before distribution deps)
- Export UV env vars to `GITHUB_ENV` in setup-runner for cross-step
persistence

The wrapper detects release branches automatically in both CI and local
environments, so this "just works" without manual configuration. On main
(non-release branch), the wrapper becomes a no-op.

Tested and validated on `release-0.3.x` where all CI checks pass.
2025-10-31 12:55:43 -07:00
raghotham
62603d25c2
chore(api)!: /v1/inspect only lists v1 apis by default (#3948)
# What does this PR do?
Allow filtering for v1alpha, v1beta, deprecated and v1. Backward
incompatible change since by default it only returns v1 apis now.

## Test Plan
added unit test
2025-10-31 11:55:46 -07:00
Ashwin Bharambe
61aab1889b
fix(ci): remove precommit trigger workflow (#4008)
Not safe!
2025-10-31 11:41:26 -07:00
Francisco Arceo
7b79cd05d5
feat: Adding Prompts to admin UI (#3987)
# What does this PR do?

1. Updates Llama Stack Typescript client to include `prompts`api in
playground client.
2. Updates the UI to display prompts and execute basic CRUD operations
for prompts.

(2) adds an explicit "Preview" section when creating the prompt to show
users how the Prompts API behaves as you dynamically edit the prompt
content. See example here:

<p align="center"><img width="468.5" height="333" alt="Screenshot
2025-10-31 at 12 22 34 PM"
src="https://github.com/user-attachments/assets/3542ce7f-56fe-4fb4-b0a3-5cfba5917f6d"
/></p>

Some screen shots:

<details><Summary>Click me to expand!</Summary>

### Prompts List with Prompts
<img width="1906" height="1108" alt="Screenshot 2025-10-31 at 12 20
05 PM"
src="https://github.com/user-attachments/assets/494a4748-ea6a-4527-8cfe-8959cb741c0f"
/>

### Empty Prompts List
<img width="1889" height="1123" alt="Screenshot 2025-10-31 at 12 08
44 PM"
src="https://github.com/user-attachments/assets/ac95b807-d311-4725-86da-0258b3cce81a"
/>

### Create Prompt
<img width="1918" height="1167" alt="Screenshot 2025-10-31 at 11 03
29 AM"
src="https://github.com/user-attachments/assets/b3100a78-f4f3-410f-af89-f7e7fe4a89e7"
/>

### Submit Prompt with error
<img width="1901" height="1213" alt="Screenshot 2025-10-31 at 12 09
28 PM"
src="https://github.com/user-attachments/assets/dca71354-a602-449d-a0d8-0ed3d009a275"
/>
</details>

## Closes https://github.com/llamastack/llama-stack/issues/3322

## Test Plan
Added tests and manual testing.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-31 11:37:25 -07:00
Ashwin Bharambe
c2fd17474e fix: stop printing server log, it is confusing
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 54s
2025-10-31 11:22:08 -07:00
Ashwin Bharambe
5f95c1f8cc
fix(ci): install client from release branch before uv sync (#4001)
Fixes CI failures on release branches where uv sync can't resolve RC
dependencies.

The problem: on release branches like `release-0.3.x`, pyproject.toml
requires `llama-stack-client>=0.3.1rc1`. But RC versions only exist on
test.pypi, not PyPI. So uv sync fails before we even get a chance to
install the client from git.

The fix is simple - on release branches, pre-install the client from the
matching git branch first, then run uv sync. This satisfies the RC
requirement and lets dependency resolution succeed.

Modified setup-runner and pre-commit workflows to do this. Also cleaned
up some duplicate logic in setup-test-environment that's now handled
centrally.

Example failure:
5415478835
2025-10-31 06:16:20 -07:00
Ashwin Bharambe
6d80ca4bf7
fix(ci): replace unused LLAMA_STACK_CLIENT_DIR with direct install (#4000)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
UI Tests / ui-tests (22) (push) Successful in 27s
Replace unused `LLAMA_STACK_CLIENT_DIR` env var (from old `llama stack
build`) with direct `uv pip install` for release branch client
installation.

cc @ehhuang
2025-10-30 22:09:25 -07:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
Ashwin Bharambe
c396de57a4
ci: standardize release branch pattern to release-X.Y.x (#3999)
Standardize CI workflows to use `release-X.Y.x` branch pattern instead
of multiple numeric variants.

That's the pattern we are settling on. See
https://github.com/llamastack/llama-stack-ops/pull/20 for reference.
2025-10-30 21:33:32 -07:00
Doug Edgar
e8cd8508b5
fix: handle missing external_providers_dir (#3974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 50s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the handling of the external_providers_dir configuration
field to align with its ongoing deprecation, in favor of the provider
`module` specification approach.

It addresses the issue in #3950, where using the default provided
run.yaml config resulted in the `external_providers_dir` parameter being
set to the literal string `None`, and crashing the llama-stack server
when starting.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3950 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- Built a new container image from `podman build . -f
containers/Containerfile --build-arg DISTRO_NAME=starter --tag
llama-stack:starter`
- Tested it locally with `podman run -it localhost/llama-stack:starter`
- Tested it on an OpenShift 4.19 cluster, deployed via the
llama-stack-k8s-operator.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-10-30 17:01:31 -07:00
Derek Higgins
ff2b270e2f
fix: relax structured output test assertions to handle whitespace and… (#3997)
… case variations

The ollama/llama3.2:3b-instruct-fp16 model returns string values with
trailing whitespace in structured JSON output. Updated test assertions
to use case-insensitive substring matching instead of exact equality.

Use .lower() for case-insensitive comparison
Check if expected value is contained in actual value (handles
whitespace)

Closes: #3996

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-30 16:55:23 -07:00
ehhuang
0e384a55a1
feat: support workers in run config (#3992)
# What does this PR do?


## Test Plan
Set workers: 4 in run.yaml. Start server and observe logs multiple
times.
2025-10-30 16:34:12 -07:00
Ashwin Bharambe
6f90a7af4b
ci: target release-X.Y.x branches instead of release-X.Y.x-maint (#3995)
We will be updating our release procedure to be more "normal" or "sane".
We will
- create release branches like normal people
- land cherry-picks onto those branches
- run releases off of those branches
- no more "rc" branch pollution either

Given that, this PR cleans things up a bit
- Remove `-maint` suffix from release branch patterns in CI workflows
- Update branch matching to `release-X.Y.x` format
2025-10-30 16:27:13 -07:00
Ashwin Bharambe
90234d6973
ci: support release branches and match client branch (#3990)
- Update workflows to trigger on release-X.Y.x-maint branches
- When PR targets release branch, fetch matching branch from
llama-stack-client-python
- Falls back to main if matching client branch doesn't exist
- Updated workflows:
  - integration-tests.yml
  - integration-auth-tests.yml
  - integration-sql-store-tests.yml
  - integration-vector-io-tests.yml
  - unit-tests.yml
  - backward-compat.yml
  - pre-commit.yml
2025-10-30 15:20:34 -07:00
Ashwin Bharambe
c2ae42b343
fix(ci): show pre-commit output easily on failure (#3985)
Right now, the failed Step which is opened by GH by default tells me to
just go up and click and scroll through for no reason.
2025-10-30 11:48:20 -07:00
Ashwin Bharambe
77c8bc6fa7
fix(ci): add back server:ci-tests to replay tests (#3976)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Python Package Build Test / build (3.12) (push) Failing after 39s
Unit Tests / unit-tests (3.12) (push) Failing after 40s
UI Tests / ui-tests (22) (push) Successful in 42s
It is useful for local debugging. If both server and docker are failing,
you can just run server locally to debug which is much easier to do.
2025-10-30 11:02:59 -07:00
ehhuang
5e20938832
fix: remove LLAMA_STACK_TEST_FORCE_SERVER_RESTART setting in fixture (#3982)
# What does this PR do?
this is meant to be a manual flag

## Test Plan
CI
2025-10-30 09:13:04 -07:00
Sébastien Han
b4ea05ada9
chore: add batches to openapi schema (#3980)
# What does this PR do?

While working on https://github.com/llamastack/llama-stack/pull/3944 I
realized that the batches API wasn't generated.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-30 07:08:35 -07:00
Derek Higgins
19d85003de
test: Updated test skips that were marked with "inline::vllm" (#3979)
This should be "remote::vllm". This causes some log probs tests to be
skipped with remote vllm. (They
fail if run).

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-30 14:48:21 +01:00
Ashwin Bharambe
174ef162b3
fix(mypy): add fast and full mypy modes (#3975)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test llama stack list-deps / generate-matrix (push) Successful in 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Test Llama Stack Build / build (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
UI Tests / ui-tests (22) (push) Successful in 38s
`mypy` became very slow for the common path. This can make local
pre-commit runs very slow. Let's restore that.

- restore fast mirrors-mypy hook for local runs
- add optional mypy-full hook and docs so devs can match CI
- run full mypy in CI with a hint when failures occur

### Test Plan
- uv run pre-commit run mypy --all-files
- uv run pre-commit run mypy-full --hook-stage manual --all-files
- uv run --group dev --group type_checking mypy
2025-10-29 19:02:32 -07:00
Charlie Doern
e8ecc99524
fix!: remove chunk_id property from Chunk class (#3954)
# What does this PR do?

chunk_id in the Chunk class executes actual logic to compute a chunk ID.
This sort of logic should not live in the API spec.

Instead, the providers should be in charge of calling generate_chunk_id,
and pass it to `Chunk`.

this removes the incorrect dependency between Provider impl and API impl

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-29 18:59:59 -07:00