Commit graph

3197 commits

Author SHA1 Message Date
Charlie Doern
9df073450f
feat: remove core.telemetry as a dependency of llama_stack.apis (#4064)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 55s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
# What does this PR do?

Remove circular dependency by moving tracing from API protocol
definitions
 to router implementation layer.

This gets us closer to having a self contained API package with no other
cross-cutting dependencies to other parts of the llama stack codebase.
To the best of our ability, the llama_stack.api should only be type and
protocol definitions.

  Changes:
- Create apis/common/tracing.py with marker decorator (zero core
dependencies)
- Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol
classes
- Apply actual tracing in core/resolver.py in `instantiate_provider`
based on protocol marker
- Move MetricResponseMixin from core to apis (it's an API response type)
  - APIs package is now self-contained with zero core dependencies

The tracing functionality remains identical - actual trace_protocol from
core
is applied to router implementations at runtime when both telemetry is
enabled
  and the protocol has the `__marked_for_tracing__` marker.

  ## Test Plan

  Manual integration test confirms identical behavior to main branch:

  ```bash
  llama stack list-deps --format uv starter | sh
  export OLLAMA_URL=http://localhost:11434
  llama stack run starter

  curl -X POST http://localhost:8321/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "ollama/gpt-oss:20b",
         "messages": [{"role": "user", "content": "Say hello"}],
         "max_tokens": 10}'
         
```

  Verified identical between main and this branch:
  - trace_id present in response
  - metrics array with prompt_tokens, completion_tokens, total_tokens
  - Server logs show trace_protocol applied to all routers

  Existing telemetry integration tests (tests/integration/telemetry/) validate
  trace context propagation and span attributes.


relates to #3895

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-06 10:58:30 -08:00
Derek Higgins
dc9497a3b2
ci: Temperarily disable Telemetry during tests (#4090)
Closes: #4089

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 17:53:02 +01:00
Derek Higgins
03d23db910
ci: vllm ci job update (#4088)
Add missing recording for vllm in library mode
Add Docker env (missed during rebase)

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 16:59:55 +01:00
Derek Higgins
c62a09ab76
ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 22s
UI Tests / ui-tests (22) (push) Successful in 57s
o Introduces vLLM provider support to the record/replay testing
framework
o Enabling both recording and replay of vLLM API interactions alongside
existing Ollama support.

The changes enable testing of vLLM functionality. vLLM tests focus on
inference capabilities, while Ollama continues to exercise the full API
surface
including vision features.

--
This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B
appears to be more capable at structure output and tool calls.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-06 10:36:40 +01:00
Ashwin Bharambe
bef1b044bd
refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 48s
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.

A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models

## Test Plan

Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581

Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']

Using LLM model: passthrough/ollama/llama3.2-vision:11b

Making inference request...

Response: 4.

--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
2025-11-05 18:15:11 -08:00
ehhuang
b335419faa
fix: actualize chunking strategy in vector store create API (#4086)
# What does this PR do?

- when create vector store is called without chunk strategy, we actually
the strategy used so that the value is persisted instead of
strategy='None'

## Test Plan
updated tests
2025-11-05 15:47:54 -08:00
Roy Belio
c672a5d792
feat: ability to use postgres as store for starter distro (#4076)
## What does this PR do?

The starter distribution now comes with all the required packages to
support persistent stores—like the agent store, metadata, and
inference—using PostgreSQL. Users can enable PostgreSQL support by
setting the `ENABLE_POSTGRES_STORE=1` environment variable.

This PR consolidates the functionality from the removed `postgres-demo`
distribution into the starter distribution, reducing maintenance
overhead.

**Closes: #2619**  
**Supersedes: #2851** (rebased and updated)

## Changes Made

1. **Added PostgreSQL support to starter distribution**
   - New `run-with-postgres-store.yaml` configuration
- Automatic config switching via `ENABLE_POSTGRES_STORE` environment
variable
   - Removed separate `postgres-demo` distribution

2. **Updated to new build system**
   - Integrated postgres switching logic into Containerfile entrypoint
   - Uses new `storage_backends` and `storage_stores` API
   - Properly configured both PostgreSQL KV store and SQL store

3. **Updated dependencies**
   - Added `psycopg2-binary` and `asyncpg` to starter distribution
   - All postgres-related dependencies automatically included

## How to Use

### With Docker (PostgreSQL):
```bash
docker run \
  -e ENABLE_POSTGRES_STORE=1 \
  -e POSTGRES_HOST=your_postgres_host \
  -e POSTGRES_PORT=5432 \
  -e POSTGRES_DB=llamastack \
  -e POSTGRES_USER=llamastack \
  -e POSTGRES_PASSWORD=llamastack \
  -e OPENAI_API_KEY=your_key \
  llamastack/distribution-starter
```

### PostgreSQL environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)

## Test Plan

All pre-commit hooks pass (mypy, ruff, distro-codegen)  
`llama stack list-deps starter` confirms psycopg2-binary is included  
Storage configuration correctly uses PostgreSQL backends  
Container builds successfully with postgres support  

## Credits

Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to
work with latest main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Sébastien Han @leseb

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-11-05 15:37:06 -08:00
ehhuang
9d5c34af27
fix!: BREAKING CHANGE: vector_store: search API response fix (#4080)
# What does this PR do?
- search_query in the vector store search API should be a list,
according to https://github.com/openai/openai-openapi


## Test Plan
modified tests


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080).
* #4086
* __->__ #4080
2025-11-05 15:01:48 -08:00
ehhuang
84a84ee85c
fix: last_id when listing files in vector store (#4079)
# What does this PR do?
the last_id should be the id of the last item in the returned list, not
the unfiltered list.

## Test Plan
fixed test
2025-11-05 14:10:10 -08:00
Ashwin Bharambe
d9cf5cd480
fix(ci): use --no-cache instead of --no-cache-dir (#4081)
This is necessary to make sure GPU dockers can be built on CI without
running out of space.
2025-11-05 12:14:02 -08:00
Charlie Doern
c899b50723
fix: print help for list-deps if no args (#4078)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test llama stack list-deps / generate-matrix (push) Successful in 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test llama stack list-deps / list-deps (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 57s
# What does this PR do?

list-deps takes  positional args OR things like --providers

the issue with this, is that these args need to be optional since by
nature, one or the other can be specified.

add a check to list-deps that checks `if not args.providers and not
args.config`. If this is true, help is printed and we exit.

resolves #4075

## Test Plan
before:

```
╰─ llama stack list-deps
Traceback (most recent call last):
  File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
    parser.run(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run
    args.func(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command
    return run_stack_list_deps_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command
    normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config)
                                                                                          ^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value

```

after:

```
╰─ llama stack list-deps
usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config | distro]

list the dependencies for a llama stack distribution

positional arguments:
  config | distro       Path to config file to use or name of known distro (llama stack list for a list). (default: None)

options:
  -h, --help            show this help message and exit
  --providers PROVIDERS
                        sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple
                        providers per API. (default: None)
  --format {uv,deps-only}
                        Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only)
 ```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-05 11:34:08 -08:00
Wojciech-Rebisz
07c28cd519
fix: Avoid model_limits KeyError (#4060)
# What does this PR do?
It avoids model_limit KeyError while trying to get embedding models for
Watsonx

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/llamastack/llama-stack/issues/4059

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Start server with watsonx distro:
```bash
llama stack list-deps watsonx | xargs -L1 uv pip install
uv run llama stack run watsonx
```
Run 
```python
client = LlamaStackClient(base_url=base_url)
client.models.list()
```
Check if there is any embedding model available (currently there is not
a single one)
2025-11-05 10:34:40 -08:00
Emilio Garcia
ba50790a28
feat(tests): metrics tests (#3966)
# What does this PR do?
1. Make telemetry tests as easy as possible for users by expanding the
`SpanStub` data class and creating the `MetricStub` dataclass as a way
to consistently marshal telemetry data in test fixtures and unmarshal
and handle it in tests.
2. Structure server and client tests to always follow the same standards
for consistent testing experience by using the `SpanStub` and
`MetricStub` data class objects.
3. Enable Metrics Testing for completions endpoint
4. Correct token metrics to use histograms instead of counts to capture
tokens per request rather than a cumulative count of tokens over the
lifecycle of the server.

## Test Plan
These are tests
2025-11-05 10:26:15 -08:00
Roy Belio
2619f3552e
fix: show built-in distributions in llama stack list (#4040)
# What does this PR do?
Fixes issue #3922 where `llama stack list` only showed distributions
after they were run. This PR makes the command show all available
distributions immediately on a fresh install.

Closes #3922

## Changes
- **Updated `_get_distribution_dirs()`** to discover both built-in and
built distributions:
- Built-in distributions from `src/llama_stack/distributions/` (e.g.,
starter, nvidia, dell)
  - Built distributions from `~/.llama/distributions`
- **Added a "Source" column** to distinguish between "built-in" and
"built" distributions
- **Built distributions override built-in ones** with the same name
(expected behavior)
- **Updated config file detection logic** to handle both naming
conventions:
  - Built-in: `build.yaml` and `run.yaml`
  - Built: `{name}-build.yaml` and `{name}-run.yaml`

## Test Plan
### Unit Tests
Added comprehensive unit tests in
`tests/unit/distribution/test_stack_list.py`:
```bash
uv run pytest tests/unit/distribution/test_stack_list.py -v
```
**Result**:  All 8 tests pass
- `test_builtin_distros_shown_without_running` - Verifies the core fix
for issue #3922
- `test_builtin_and_built_distros_shown_together` - Ensures both types
are shown
- `test_built_distribution_overrides_builtin` - Tests override behavior
- `test_empty_distributions` - Edge case handling
- `test_config_files_detection_builtin` - Config file detection for
built-in distros
- `test_config_files_detection_built` - Config file detection for built
distros
- `test_llamastack_prefix_stripped` - Name normalization
- `test_hidden_directories_ignored` - Filters hidden directories

### Manual Testing
**Before the fix** (simulated with empty `~/.llama/distributions`):
```bash
$ llama stack list
No stacks found in ~/.llama/distributions
```

**After the fix**:
```bash
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ci-tests          │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ dell              │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ meta-reference-g… │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ nvidia            │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ open-benchmark    │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ postgres-demo     │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter-gpu       │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ watsonx           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```

**After running a distribution**:
```bash
$ llama stack run starter  # Creates ~/.llama/distributions/starter
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
│ starter           │ built    │ ~/.llama/distri…  │ No           │ No         │
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```
Note how `starter` now shows as "built" and points to
`~/.llama/distributions`, overriding the built-in version.

## Breaking Changes
**No breaking changes** - This is a bug fix that improves user
experience with minimal risk:
- No programmatic parsing of output found in the codebase
- Table format is clearly for human consumption
- The new "Source" column helps users understand where distributions
come from
- The behavior change is exactly what users expect (seeing all available
distributions)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-05 10:16:28 -08:00
Ashwin Bharambe
4d3069bfa5
chore(ci): remove unused recordings (#4074)
Added a script to cleanup recordings. While doing this, moved the CI
matrix generation to a separate script so there is a single source of
truth for the matrix.

Ran the cleanup script as:
```
PYTHONPATH=. python scripts/cleanup_recordings.py
```

Also added this as part of the pre-commit workflow to ensure that the
recordings are always up to date and that no stale recordings are left
in the repo.
2025-11-05 09:21:58 -08:00
Sébastien Han
fd1603beef
chore: remove unused classes (#4077)
# What does this PR do?

These were maybe be included in the webmethod?
The unit test was pointless too since the request was never used
anywhere?

This shouldn't be in the API definition, if we never consume it.

## Test Plan

CI with pre-commit on OpenAPI spec generation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-05 16:45:23 +01:00
Ashwin Bharambe
392e01dc79 chore: add stainless config
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 1m13s
name it to indicate it is not yet source of truth to avoid confusion
2025-11-04 15:44:07 -08:00
ehhuang
95b0493fae
chore: move src/llama_stack/ui to src/llama_stack_ui (#4068)
# What does this PR do?
This better separates UI from backend code, which was a point of
confusion often for our beloved AI friends.


## Test Plan
CI
2025-11-04 15:21:49 -08:00
Ashwin Bharambe
5850e3473f fix: remove straggler openapi HTML file 2025-11-04 14:54:33 -08:00
Ashwin Bharambe
0c49a53c97
chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067)
RAG aka file search is implemented via the Responses API by specifying
the file-search tool. The backend implementation remains unchanged. This
PR merely removes the directly exposed API surface which allowed users
to directly perform searches from the client.

This facility is now available via the `client.vector_store.search()`
OpenAI compatible API.
2025-11-04 14:50:54 -08:00
Ashwin Bharambe
a8a8aa56c0
chore!: remove the agents (sessions and turns) API (#4055)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m10s
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed

The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.

Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
2025-11-04 09:38:39 -08:00
Mustafa Elbehery
a6ddbae0ed
chore(test): migrate unit tests from unittest to pytest nvidia test eval (#3249)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 1m16s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR migrates `unittest` to `pytest` in
`tests/unit/providers/nvidia/test_eval.py`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Part of https://github.com/llamastack/llama-stack/issues/2680

Supersedes https://github.com/llamastack/llama-stack/pull/2791

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-11-04 10:29:07 +01:00
Ashwin Bharambe
053fc0ac39
chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m13s
This PR removes all routes which we had marked deprecated for the 0.3.0
release.

This includes:
- all the `/v1/openai/v1/` routes (the corresponding /v1 routes still
exist of course)
- the /agents API (which is superseded completely by Responses +
Conversations)
- several alpha routes which had a "v1" route to aide transitioning to
"v1alpha"

This is the corresponding client-python change:
https://github.com/llamastack/llama-stack-client-python/pull/294
2025-11-03 19:00:59 -08:00
Nathan Weinberg
62b3ad349a
fix: return to hardcoded model IDs for Vertex AI (#4041)
# What does this PR do?
partial revert of b67aef2

Vertex AI doesn't offer an endpoint for listing models from Google's
Model Garden

Return to hardcoded values until such an endpoint is available

Closes #3988 

## Test Plan
Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`,
`VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the
starter distribution
```bash
$ llama stack list-deps starter | xargs -L1 uv pip install
$ llama stack run starter
```

Client side, formerly broken cURL requests now working
```bash
$ curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.provider_id == "vertexai"))'
[
  {
    "identifier": "vertexai/vertex_ai/gemini-2.0-flash",
    "provider_resource_id": "vertex_ai/gemini-2.0-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-flash",
    "provider_resource_id": "vertex_ai/gemini-2.5-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-pro",
    "provider_resource_id": "vertex_ai/gemini-2.5-pro",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  }
]
$ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a
i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" | jq 
{                                                                                                                                    
  "id": "p8oIaYiQF8_PptQPo-GH8QQ",                                                                                                   
  "choices": [                                                                                                                       
    {                                                                                                                                
      "finish_reason": "stop",                                                                                                       
      "index": 0,                                                                                                                    
      "logprobs": null,                                                                                                              
      "message": {                                                                                                                   
        "content": "Hello there! How can I help you today?",                                                                         
        "refusal": null,                                                                                                             
        "role": "assistant",                                                                                                         
        "annotations": null,                                                                                                         
        "audio": null,                                                                                                               
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
...
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-03 17:38:16 -08:00
Ashwin Bharambe
cb40da210f
fix: update tests for OpenAI-style models endpoint (#4053)
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.

**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
2025-11-03 17:30:08 -08:00
Sébastien Han
4a5ef65286
chore!: remove SDG API (#4035)
# What does this PR do?

This API hasn't received any traction and close to zero interest from
the community. Let's revisit in the future if things change.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 16:12:06 -08:00
Ashwin Bharambe
44096512b5
feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051)
We need to remove `/v1/openai/v1` paths shortly. There is one trouble --
our current `/v1/openai/v1/models` endpoint provides different data than
`/v1/models`. Unfortunately our tests target the latter (llama-stack
customized) behavior. We need to get to true OpenAI compatibility.

This is step 1: adding `custom_metadata` field to `OpenAIModel` that
includes all the extra stuff we add in the native `/v1/models` response.
This can be extracted on the consumer end by look at
`__pydantic_extra__` or other similar fields.

This PR:
- Adds `custom_metadata` field to `OpenAIModel` class in
`src/llama_stack/apis/models/models.py`
- Modified `openai_list_models()` in
`src/llama_stack/core/routing_tables/models.py` to populate
custom_metadata

Next Steps
1. Update stainless client to use `/v1/openai/v1/models` instead of
`/v1/models`
2. Migrate tests to read from `custom_metadata`
3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single
`/v1/models` endpoint
2025-11-03 15:56:07 -08:00
Ashwin Bharambe
2381714904
fix: enable SQLite WAL mode to prevent database locking errors (#4048)
Fixes race condition causing "database is locked" errors during
concurrent writes to SQLite, particularly in streaming responses with
guardrails where multiple inference calls write simultaneously.

Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple
concurrent readers and one writer without blocking. Set busy_timeout to
5s so SQLite retries instead of failing immediately. Remove the logic
that disabled write queues for SQLite since WAL mode eliminates the
locking issues that prompted disabling them.

Fixes: test_output_safety_guardrails_safe_content[stream=True] flake
2025-11-03 15:27:41 -08:00
ehhuang
628e38b3d5
test: always start a new server in integration-tests.sh (#4050)
# What does this PR do?
This prevents interference from already running servers, and allows
multiple concurrent integration test runs. Unleash the AIs!

## Test Plan
start a LS server at port 8321

Then observe test uses port 8322:

❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config
server:ci-tests --inference-mode replay --setup ollama --suite base
--pattern '(telemetry or safety)'
=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: ollama
Inference Mode: replay
Test Suite: base
Test Subdirs:
Test Pattern: (telemetry or safety)

Checking llama packages
llama-stack 0.4.0.dev0 /Users/erichuang/projects/new_test_server
llama-stack-client                       0.3.0
ollama                                   0.6.0
=== Applying Setup Environment Variables ===
Setting SQLITE_STORE_DIR:
/var/folders/cz/vyh7y1d11xg881lsxsshnc5c0000gn/T/tmp.bKLsaVAxyU
Setting stack config type: server
Setting up environment variables:
export OLLAMA_URL='http://0.0.0.0:11434'
export SAFETY_MODEL='ollama/llama-guard3:1b'

Will use port: 8322
=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start on port 8322...
 Llama Stack Server started successfully
2025-11-03 15:23:10 -08:00
Sébastien Han
da57b51fb6
ci: introduce Mergify bot to notify on PR conflicts (#4043)
This commit introduces Mergify, a powerful bot designed to assist with
automated merging and other CI-related tasks. As an initial step, we
enable a basic feature: automatically notifying users when a pull
request has merge conflicts.

When a conflict is detected, Mergify will add a label to the PR. This
label will be removed once the conflict is resolved.
This is foundation PR to activate the bot and start using it for
backports too.

In the future, we plan to expand Mergify’s role to include auto-merging,
as discussed in #1667, once the project is ready.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 12:21:19 -08:00
Derek Higgins
1562277cfd
ci: test adjustments for Qwen3-0.6B (#3978)
Without this hint Qwen3-0.6B tends to reply with the full name
and sometimes doesn't reply with the correct drafted year.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 12:19:35 -08:00
Matthew Farrellee
1263448de2
fix: allowed_models config did not filter models (#4030)
# What does this PR do?

closes #4022 

## Test Plan

ci w/ new tests

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:43:39 -08:00
Charlie Doern
30f8921240
fix: generate provider config when using --providers (#4044)
# What does this PR do?

call the sample_run_config method for providers that have it when
generating a run config using `llama stack run --providers`. This will
propagate API keys

resolves #4032


## Test Plan

new unit test checks the output of using `--providers` to ensure
`api_key` is in the config.

manual testing:

```
╰─ llama stack list-deps --providers=inference=remote::openai --format uv | sh
Using Python 3.12.11 environment at: venv
Audited 7 packages in 8ms

╰─ llama stack run --providers=inference=remote::openai
INFO     2025-11-03 14:33:02,094 llama_stack.cli.stack.run:161 cli: Writing generated config to:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,096 llama_stack.cli.stack.run:169 cli: Using run configuration:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates:
           Key: None
           Cert: None
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:230 cli: Listening on 0.0.0.0:8321
INFO     2025-11-03 14:33:02,145 llama_stack.core.server.server:513 core::server: Run configuration:
INFO     2025-11-03 14:33:02,146 llama_stack.core.server.server:516 core::server: apis:
         - inference
         image_name: providers-run
         providers:
           inference:
           - config:
               api_key: '********'
               base_url: https://api.openai.com/v1
             provider_id: openai
             provider_type: remote::openai
         registered_resources:
           benchmarks: []
           datasets: []
           models: []
           scoring_fns: []
           shields: []
           tool_groups: []
           vector_stores: []
         server:
           port: 8321
           workers: 1
         storage:
           backends:
             kv_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/kvstore.db
               type: kv_sqlite
             sql_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/sql_store.db
               type: sql_sqlite
           stores:
             conversations:
               backend: sql_default
               table_name: openai_conversations
             inference:
               backend: sql_default
               max_write_queue_size: 10000
               num_writers: 4
               table_name: inference_store
             metadata:
               backend: kv_default
               namespace: registry
             prompts:
               backend: kv_default
               namespace: prompts
         telemetry:
           enabled: false
         version: 2

INFO     2025-11-03 14:33:02,299 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue
         disabled for SQLite to avoid concurrency issues
INFO     2025-11-03 14:33:05,272 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils:
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models
INFO     2025-11-03 14:33:05,368 uvicorn.error:84 uncategorized: Started server process [69109]
INFO     2025-11-03 14:33:05,369 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-11-03 14:33:05,370 llama_stack.core.server.server:172 core::server: Starting up Llama Stack server
         (version: 0.3.0)
INFO     2025-11-03 14:33:05,370 llama_stack.core.stack:495 core: starting registry refresh task
INFO     2025-11-03 14:33:05,370 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-11-03 14:33:05,371 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C
         to quit)
INFO     2025-11-03 14:34:19,242 uvicorn.access:473 uncategorized: 127.0.0.1:63102 - "POST /v1/chat/completions
         HTTP/1.1" 200
```

client:

```
curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
 "model": "openai/gpt-5",
 "messages": [
     {"role": "user", "content": "What is 1 + 2"}
 ]
}'
{"id":"...","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"3","refusal":null,"role":"assistant","annotations":[],"audio":null,"function_call":null,"tool_calls":null}}],"created":1762198455,"model":"openai/gpt-5","object":"chat.completion","service_tier":"default","system_fingerprint":null,"usage":{"completion_tokens":10,"prompt_tokens":13,"total_tokens":23,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0}}}%
```

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:37:58 -08:00
Ashwin Bharambe
415fd9e36b
chore: bump version to 0.4.0.dev0 (#4018)
Some checks failed
Test llama stack list-deps / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test llama stack list-deps / list-deps (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 16s
Pre-commit / pre-commit (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 20s
Test Llama Stack Build / build (push) Failing after 15s
UI Tests / ui-tests (22) (push) Successful in 1m12s
Automated version bump after releasing 0.3.1

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-03 09:36:04 -08:00
Sébastien Han
d4aa348b60
chore: remove HTML generation for openapi spec (#4039)
# What does this PR do?

This seems to be an ancient artifact when we were using readthedocs? Now
docusaurus read the specs directly.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 18:03:40 +01:00
dependabot[bot]
7e294d33d9
chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 7.1.2 (#4023)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.0.1 to 7.1.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.2 🌈 Speed up extraction on Windows</h2>
<h2>Changes</h2>
<p><a href="https://github.com/lazka"><code>@​lazka</code></a> fixed a
bug that caused extracting uv to take up to 30s. Thank you!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Use tar for extracting the uv zip file on Windows too <a
href="https://github.com/lazka"><code>@​lazka</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.5 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li>
<li>Bump github/codeql-action from 4.30.8 to 4.30.9 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li>
</ul>
<h2>v7.1.1 🌈 Fix empty workdir detection and lowest resolution
strategy</h2>
<h2>Changes</h2>
<p>This release fixes a bug where the <code>working-directory</code>
input was not used to detect an empty work dir. It also fixes the
<code>lowest</code> resolution strategy resolving to latest when only a
lower bound was specified.</p>
<p>Special thanks to <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> for the
first contribution!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Fix &quot;lowest&quot; resolution strategy with lower-bound only <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li>Use working-directory to detect empty workdir <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.4 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li>chore: update known checksums for 0.9.3 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Change version in docs to v7 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump github/codeql-action from 4.30.7 to 4.30.8 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li>Bump actions/setup-node from 5.0.0 to 6.0.0 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/641">#641</a>)</li>
<li>Bump eifinger/actionlint-action from 1.9.1 to 1.9.2 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/634">#634</a>)</li>
<li>Update lockfile with latest npm <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/636">#636</a>)</li>
</ul>
<h2>v7.1.0 🌈 Support all the use cases</h2>
<h2>Changes</h2>
<p><strong>Support all the use cases!!!</strong></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="85856786d1"><code>8585678</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li>
<li><a
href="22d500a65c"><code>22d500a</code></a>
Bump github/codeql-action from 4.30.8 to 4.30.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li>
<li><a
href="14d557131d"><code>14d5571</code></a>
chore: update known checksums for 0.9.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li>
<li><a
href="29cd2350cd"><code>29cd235</code></a>
Use tar for extracting the uv zip file on Windows too (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li>
<li><a
href="2ddd2b9cb3"><code>2ddd2b9</code></a>
chore: update known checksums for 0.9.4 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li><a
href="b7bf78939d"><code>b7bf789</code></a>
Fix &quot;lowest&quot; resolution strategy with lower-bound only (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li><a
href="cb6c0a53d9"><code>cb6c0a5</code></a>
Change version in docs to v7 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
<li><a
href="dffc6292f2"><code>dffc629</code></a>
Use working-directory to detect empty workdir (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
<li><a
href="6e346e1653"><code>6e346e1</code></a>
chore: update known checksums for 0.9.3 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
<li><a
href="3ccd0fd498"><code>3ccd0fd</code></a>
Bump github/codeql-action from 4.30.7 to 4.30.8 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/setup-uv/compare/v6.0.1...85856786d1ce8acfbcc2f13a5f3fbd6b938f9f41">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=7.1.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-03 13:43:04 +01:00
Sébastien Han
3dbff6bf3f
fix: help mypy & fix precommit on main (#4037)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 21s
UI Tests / ui-tests (22) (push) Successful in 1m15s
# What does this PR do?

Add type to help mypy figure out.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 05:39:50 -05:00
Ashwin Bharambe
d45137a399
fix(ci): export UV_INDEX_STRATEGY to current shell before running uv sync (#4020)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 1m6s
Fixes latent bug where UV_INDEX_STRATEGY was only exported to GITHUB_ENV
but not to the current shell.

While this bug doesn't currently affect main (since UV_EXTRA_INDEX_URL
is only set on release branches), it's a latent bug that could cause
issues if the logic changes in the future or if someone tests with
UV_EXTRA_INDEX_URL set.

The setup-runner action only exported UV_INDEX_STRATEGY to GITHUB_ENV
(for subsequent steps), not to the current shell environment. Since uv
sync runs in the same step, it would never see the variable if it were
set.

This fix adds `export UV_INDEX_STRATEGY=unsafe-best-match` to make the
variable available in the current shell before running uv commands.

Related: #4019 (same fix for release-0.3.x where the bug is actively
triggered)
2025-11-01 12:57:24 -07:00
Charlie Doern
93401836b7
feat: llama stack run --providers (#3989)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Pre-commit / pre-commit (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 56s
# What does this PR do?

llama stack run --providers takes a list of providers in the format of
api1=provider1,api2=provider2

this allows users to run with a simple list of providers.

given the architecture of `create_app`, this run config needs to be
written to disk. use ~/.llama/distribution/providers-run/run.yaml each
time for consistency

resolves #3956

## Test Plan

new unit tests to ensure --providers.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-31 16:21:32 -07:00
Ashwin Bharambe
b2a5428a14
fix(ci): unset empty UV index env vars to prevent uv errors (#4012)
Fixes container builds failing with UV index strategy errors when build
args are passed with empty values.

Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="")
become environment variables with empty string values in RUN commands.
UV interprets these as if --index-strategy "" was passed on the command
line, causing build failures with "error: a value is required for
'--index-strategy <UV_INDEX_STRATEGY>'".

This is a footgun because empty string ≠ unset variable, and ARGs
silently propagate to all RUN commands, only failing when declared with
empty defaults.

The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of
RUN blocks, saves the values early, and only restores them for editable
installs with RC dependencies. All other install modes (PyPI, test-pypi,
client) now run with a clean environment.
2025-10-31 13:29:14 -07:00
Ashwin Bharambe
f8fe3018af
fix(ci): use test.pypi as extra index for RC dependencies (#4009)
Backports UV index configuration fixes from `release-0.3.x` (PR #4002). 

The main issue: when we created the release branch infrastructure, we
configured UV to use `test.pypi` as the PRIMARY index to resolve RC
dependencies. This caused UV to look for ALL packages there first, which
led to problems - some packages don't have binary wheels on `test.pypi`,
so UV tried building from source and failed (like the `psycopg2-binary`
issue we hit).

The fix is simple: use PyPI as primary (default) and `test.pypi` as an
EXTRA index. UV will check PyPI first for everything, and only fall back
to `test.pypi` for packages not found there (like our RC client
versions).

This PR includes:
- Fixed `install-llama-stack-client` action to output
`UV_EXTRA_INDEX_URL` instead of `UV_INDEX_URL`
- New `uv-run-with-index.sh` wrapper that auto-detects release branches
and sets UV env vars
- Updated pre-commit hooks (`uv-lock`, codegen, etc.) to use the wrapper
- Pass UV env vars as Docker build args in all locations
- Scope UV env vars properly in Containerfile (inline for llama-stack
install, explicitly unset before distribution deps)
- Export UV env vars to `GITHUB_ENV` in setup-runner for cross-step
persistence

The wrapper detects release branches automatically in both CI and local
environments, so this "just works" without manual configuration. On main
(non-release branch), the wrapper becomes a no-op.

Tested and validated on `release-0.3.x` where all CI checks pass.
2025-10-31 12:55:43 -07:00
raghotham
62603d25c2
chore(api)!: /v1/inspect only lists v1 apis by default (#3948)
# What does this PR do?
Allow filtering for v1alpha, v1beta, deprecated and v1. Backward
incompatible change since by default it only returns v1 apis now.

## Test Plan
added unit test
2025-10-31 11:55:46 -07:00
Ashwin Bharambe
61aab1889b
fix(ci): remove precommit trigger workflow (#4008)
Not safe!
2025-10-31 11:41:26 -07:00
Francisco Arceo
7b79cd05d5
feat: Adding Prompts to admin UI (#3987)
# What does this PR do?

1. Updates Llama Stack Typescript client to include `prompts`api in
playground client.
2. Updates the UI to display prompts and execute basic CRUD operations
for prompts.

(2) adds an explicit "Preview" section when creating the prompt to show
users how the Prompts API behaves as you dynamically edit the prompt
content. See example here:

<p align="center"><img width="468.5" height="333" alt="Screenshot
2025-10-31 at 12 22 34 PM"
src="https://github.com/user-attachments/assets/3542ce7f-56fe-4fb4-b0a3-5cfba5917f6d"
/></p>

Some screen shots:

<details><Summary>Click me to expand!</Summary>

### Prompts List with Prompts
<img width="1906" height="1108" alt="Screenshot 2025-10-31 at 12 20
05 PM"
src="https://github.com/user-attachments/assets/494a4748-ea6a-4527-8cfe-8959cb741c0f"
/>

### Empty Prompts List
<img width="1889" height="1123" alt="Screenshot 2025-10-31 at 12 08
44 PM"
src="https://github.com/user-attachments/assets/ac95b807-d311-4725-86da-0258b3cce81a"
/>

### Create Prompt
<img width="1918" height="1167" alt="Screenshot 2025-10-31 at 11 03
29 AM"
src="https://github.com/user-attachments/assets/b3100a78-f4f3-410f-af89-f7e7fe4a89e7"
/>

### Submit Prompt with error
<img width="1901" height="1213" alt="Screenshot 2025-10-31 at 12 09
28 PM"
src="https://github.com/user-attachments/assets/dca71354-a602-449d-a0d8-0ed3d009a275"
/>
</details>

## Closes https://github.com/llamastack/llama-stack/issues/3322

## Test Plan
Added tests and manual testing.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-31 11:37:25 -07:00
Ashwin Bharambe
c2fd17474e fix: stop printing server log, it is confusing
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 54s
2025-10-31 11:22:08 -07:00
Ashwin Bharambe
5f95c1f8cc
fix(ci): install client from release branch before uv sync (#4001)
Fixes CI failures on release branches where uv sync can't resolve RC
dependencies.

The problem: on release branches like `release-0.3.x`, pyproject.toml
requires `llama-stack-client>=0.3.1rc1`. But RC versions only exist on
test.pypi, not PyPI. So uv sync fails before we even get a chance to
install the client from git.

The fix is simple - on release branches, pre-install the client from the
matching git branch first, then run uv sync. This satisfies the RC
requirement and lets dependency resolution succeed.

Modified setup-runner and pre-commit workflows to do this. Also cleaned
up some duplicate logic in setup-test-environment that's now handled
centrally.

Example failure:
5415478835
2025-10-31 06:16:20 -07:00
Ashwin Bharambe
6d80ca4bf7
fix(ci): replace unused LLAMA_STACK_CLIENT_DIR with direct install (#4000)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
UI Tests / ui-tests (22) (push) Successful in 27s
Replace unused `LLAMA_STACK_CLIENT_DIR` env var (from old `llama stack
build`) with direct `uv pip install` for release branch client
installation.

cc @ehhuang
2025-10-30 22:09:25 -07:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
Ashwin Bharambe
c396de57a4
ci: standardize release branch pattern to release-X.Y.x (#3999)
Standardize CI workflows to use `release-X.Y.x` branch pattern instead
of multiple numeric variants.

That's the pattern we are settling on. See
https://github.com/llamastack/llama-stack-ops/pull/20 for reference.
2025-10-30 21:33:32 -07:00
Doug Edgar
e8cd8508b5
fix: handle missing external_providers_dir (#3974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 50s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the handling of the external_providers_dir configuration
field to align with its ongoing deprecation, in favor of the provider
`module` specification approach.

It addresses the issue in #3950, where using the default provided
run.yaml config resulted in the `external_providers_dir` parameter being
set to the literal string `None`, and crashing the llama-stack server
when starting.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3950 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- Built a new container image from `podman build . -f
containers/Containerfile --build-arg DISTRO_NAME=starter --tag
llama-stack:starter`
- Tested it locally with `podman run -it localhost/llama-stack:starter`
- Tested it on an OpenShift 4.19 cluster, deployed via the
llama-stack-k8s-operator.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-10-30 17:01:31 -07:00