Commit graph

3021 commits

Author SHA1 Message Date
mergify[bot]
01736b1f5c
chore: bump mcp package version (backport #4287) (#4288)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / generate-matrix (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 1m35s
Pre-commit / pre-commit (push) Successful in 2m21s
# What does this PR do?

Address

https://github.com/modelcontextprotocol/python-sdk/security/advisories/GHSA-9h52-p55h-vw2f

<hr>This is an automatic backport of pull request #4287 done by
[Mergify](https://mergify.com).

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-12-03 17:48:59 +01:00
github-actions[bot]
2682916d6d chore: update lockfiles for 0.3.4rc2 2025-12-03 15:39:31 +00:00
github-actions[bot]
18ed8cc0c9 Release candidate 0.3.4rc2 2025-12-03 15:30:46 +00:00
mergify[bot]
0899f78943
fix: Avoid model_limits KeyError (backport #4060) (#4283)
# What does this PR do?
It avoids model_limit KeyError while trying to get embedding models for
Watsonx


Closes https://github.com/llamastack/llama-stack/issues/4059

## Test Plan

Start server with watsonx distro:
```bash
llama stack list-deps watsonx | xargs -L1 uv pip install
uv run llama stack run watsonx
```
Run 
```python
client = LlamaStackClient(base_url=base_url)
client.models.list()
```
Check if there is any embedding model available (currently there is not
a single one)<hr>This is an automatic backport of pull request #4060
done by [Mergify](https://mergify.com).

Co-authored-by: Wojciech-Rebisz <147821486+Wojciech-Rebisz@users.noreply.github.com>
2025-12-03 10:56:24 +01:00
mergify[bot]
9b68b38c55
fix: Add policies to adapters (backport #4277) (#4279)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 45s
Unit Tests / unit-tests (3.12) (push) Failing after 54s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 2m11s
Pre-commit / pre-commit (push) Successful in 3m0s
The configured policy wasn't being passed in and instead the default was
being used (e.g. in the s3 file provider)

Closes: #4276
<hr>This is an automatic backport of pull request #4277 done by
[Mergify](https://mergify.com).

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Derek Higgins <derekh@redhat.com>
2025-12-02 13:27:54 -08:00
github-actions[bot]
63e2e7534f chore: update lockfiles for 0.3.4rc1
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 1m28s
Pre-commit / pre-commit (push) Successful in 2m36s
2025-12-02 14:50:13 +00:00
github-actions[bot]
6eac1005ab Release candidate 0.3.4rc1 2025-12-02 14:41:51 +00:00
Sébastien Han
384981094a
fix: uninitialised enable_write_queue (#4264)
# What does this PR do?

- Fix uv.lock
- Fix uninitialised variable

**Against stable branch, main does not have this issue.**

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-02 09:37:21 -05:00
mergify[bot]
c7fd3c4151
chore: bump starlette version (backport #4158) (#4248)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Integration Tests (Replay) / generate-matrix (push) Successful in 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (push) Failing after 1m8s
Unit Tests / unit-tests (3.13) (push) Failing after 1m47s
Unit Tests / unit-tests (3.12) (push) Failing after 2m10s
Pre-commit / pre-commit (push) Successful in 2m50s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m41s
# What does this PR do?

Require at least 0.49.1 which fixes a security vulnerability in the
parsing logic of the Range header in FileResponse. Release note:
https://github.com/Kludex/starlette/releases/tag/0.49.1
<hr>This is an automatic backport of pull request #4158 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-12-01 10:21:16 -08:00
github-actions[bot]
1d251b489a chore: update lockfiles for 0.3.3
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 11s
Integration Tests (Replay) / generate-matrix (push) Successful in 7s
Vector IO Integration Tests / test-matrix (push) Failing after 24s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 41s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 15s
Unit Tests / unit-tests (3.12) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 2m53s
Pre-commit / pre-commit (push) Successful in 3m49s
2025-11-24 21:15:11 +00:00
github-actions[bot]
2424a3d3b2 build: Bump version to 0.3.3 2025-11-24 21:12:52 +00:00
github-actions[bot]
ff6d8d5a50 chore: update lockfiles for 0.3.3rc1 2025-11-24 20:55:14 +00:00
github-actions[bot]
4f19fac36e Release candidate 0.3.3rc1 2025-11-24 20:10:59 +00:00
mergify[bot]
2d5ed5d0f5
fix: update hard-coded google model names (backport #4212) (#4229)
# What does this PR do?
When we send the model names to Google's openai API, we must use the
"google" name prefix. Google does not recognize the "vertexai" model
names.

Closes #4211

## Test Plan
```bash
uv venv --python python312
. .venv/bin/activate
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```

Test that this shows the gemini models with their correct names:
```bash
curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.custom_metadata.provider_id == "vertexai"))'
```

Test that this chat completion works:
```bash
curl -X POST   -H "Content-Type: application/json"   "http://127.0.0.1:8321/v1/chat/completions"   -d '{
        "model": "vertexai/google/gemini-2.5-flash",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "Hello! Can you tell me a joke?"
          }
        ],
        "temperature": 1.0,
        "max_tokens": 256
      }'
```<hr>This is an automatic backport of pull request #4212 done by
[Mergify](https://mergify.com).

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ken Dreyer <kdreyer@redhat.com>
2025-11-24 11:32:14 -08:00
mergify[bot]
05b4394cf9
fix: enforce allowed_models during inference requests (backport #4197) (#4228)
The `allowed_models` configuration was only being applied when listing
models via the `/v1/models` endpoint, but the actual inference requests
weren't checking this restriction. This meant users could directly
request any model the provider supports by specifying it in their
inference call, completely bypassing the intended cost controls.

The fix adds validation to all three inference methods (chat
completions, completions, and embeddings) that checks the requested
model against the allowed_models list before making the provider API
call.

### Test plan

Added unit tests <hr>This is an automatic backport of pull request #4197
done by [Mergify](https://mergify.com).

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-24 11:31:36 -08:00
mergify[bot]
0df6d4601f
fix(docs): fix glob vulnerability (backport #4193) (#4227)
add npm override so docs workspace resolves glob@10.5+
<hr>This is an automatic backport of pull request #4193 done by
[Mergify](https://mergify.com).

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-24 11:31:15 -08:00
mergify[bot]
b9299a20ed
fix: enable SQLite WAL mode to prevent database locking errors (backport #4048) (#4226)
Fixes race condition causing "database is locked" errors during
concurrent writes to SQLite, particularly in streaming responses with
guardrails where multiple inference calls write simultaneously.

Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple
concurrent readers and one writer without blocking. Set busy_timeout to
5s so SQLite retries instead of failing immediately. Remove the logic
that disabled write queues for SQLite since WAL mode eliminates the
locking issues that prompted disabling them.

Fixes: test_output_safety_guardrails_safe_content[stream=True]
flake<hr>This is an automatic backport of pull request #4048 done by
[Mergify](https://mergify.com).

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-24 11:30:57 -08:00
mergify[bot]
46bd95e453
fix: Vector store persistence across server restarts (backport #3977) (#4225)
# What does this PR do?

This PR fixes a bug in LlamaStack 0.3.0 where vector stores created via
the OpenAI-compatible API (`POST /v1/vector_stores`) would fail with
`VectorStoreNotFoundError` after server restart when attempting
operations like `vector_io.insert()` or `vector_io.query()`.

The bug affected **6 vector IO providers**: `pgvector`, `sqlite_vec`,
`chroma`, `milvus`, `qdrant`, and `weaviate`.

Created with the assistance of: claude-4.5-sonnet

## Root Cause

All affected providers had a broken
`_get_and_cache_vector_store_index()` method that:
1. Did not load existing vector stores from persistent storage during
initialization
2. Attempted to use `vector_store_table` (which was either `None` or a
`KVStore` without the required `get_vector_store()` method)
3. Could not reload vector stores after server restart or cache miss

## Solution

This PR implements a consistent pattern across all 6 providers:

1. **Load vector stores during initialization** - Pre-populate the cache
from KV store on startup
2. **Fix lazy loading** - Modified `_get_and_cache_vector_store_index()`
to load directly from KV store instead of relying on
`vector_store_table`
3. **Remove broken dependency** - Eliminated reliance on the
`vector_store_table` pattern

## Testing steps

### 1.1 Configure the stack

Create or use an existing configuration with a vector IO provider.

**Example `run.yaml`:**

```yaml
vector_io_store:
  - provider_id: pgvector
    provider_type: remote::pgvector
    config:
      host: localhost
      port: 5432
      db: llamastack
      user: llamastack
      password: llamastack

inference:
  - provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
    config:
      model: sentence-transformers/all-MiniLM-L6-v2
```

### 1.2 Start the server

```bash
llama stack run run.yaml --port 5000
```

Wait for the server to fully start. You should see:

```
INFO: Started server process
INFO: Application startup complete
```

---

## Step 2: Create a Vector Store

### 2.1 Create via API

```bash
curl -X POST http://localhost:5000/v1/vector_stores \
  -H "Content-Type: application/json" \
  -d '{
    "name": "test-persistence-store",
    "extra_body": {
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "embedding_dimension": 384,
      "provider_id": "pgvector"
    }
  }' | jq
```

### 2.2 Expected Response

```json
{
  "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
  "object": "vector_store",
  "name": "test-persistence-store",
  "status": "completed",
  "created_at": 1730304000,
  "file_counts": {
    "total": 0,
    "completed": 0,
    "in_progress": 0,
    "failed": 0,
    "cancelled": 0
  },
  "usage_bytes": 0
}
```

**Save the `id` field** (e.g.,
`vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d`) — you’ll need it for the next
steps.

---

## Step 3: Insert Data (Before Restart)

### 3.1 Insert chunks into the vector store

```bash
export VS_ID="vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"

curl -X POST http://localhost:5000/vector-io/insert \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"chunks\": [
      {
        \"content\": \"Python is a high-level programming language known for its readability.\",
        \"metadata\": {\"source\": \"doc1\", \"page\": 1}
      },
      {
        \"content\": \"Machine learning enables computers to learn from data without explicit programming.\",
        \"metadata\": {\"source\": \"doc2\", \"page\": 1}
      },
      {
        \"content\": \"Neural networks are inspired by biological neurons in the brain.\",
        \"metadata\": {\"source\": \"doc3\", \"page\": 1}
      }
    ]
  }"
```

### 3.2 Expected Response

Status: **200 OK**  
Response: *Empty or success confirmation*

---

## Step 4: Query Data (Before Restart – Baseline)

### 4.1 Query the vector store

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"What is machine learning?\"
  }" | jq
```

### 4.2 Expected Response

```json
{
  "chunks": [
    {
      "content": "Machine learning enables computers to learn from data without explicit programming.",
      "metadata": {"source": "doc2", "page": 1}
    },
    {
      "content": "Neural networks are inspired by biological neurons in the brain.",
      "metadata": {"source": "doc3", "page": 1}
    }
  ],
  "scores": [0.85, 0.72]
}
```

**Checkpoint:** Works correctly before restart.

---

## Step 5: Restart the Server (Critical Test)

### 5.1 Stop the server

In the terminal where it’s running:

```
Ctrl + C
```

Wait for:

```
Shutting down...
```

### 5.2 Restart the server

```bash
llama stack run run.yaml --port 5000
```

Wait for:

```
INFO: Started server process
INFO: Application startup complete
```

The vector store cache is now empty, but data should persist.

---

## Step 6: Verify Vector Store Exists (After Restart)

### 6.1 List vector stores

```bash
curl http://localhost:5000/v1/vector_stores | jq
```

### 6.2 Expected Response

```json
{
  "object": "list",
  "data": [
    {
      "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
      "name": "test-persistence-store",
      "status": "completed"
    }
  ]
}
```

**Checkpoint:** Vector store should be listed.

---

## Step 7: Insert Data (After Restart – THE BUG TEST)

### 7.1 Insert new chunks

```bash
curl -X POST http://localhost:5000/vector-io/insert \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"chunks\": [
      {
        \"content\": \"This chunk was inserted AFTER the server restart.\",
        \"metadata\": {\"source\": \"post-restart\", \"test\": true}
      }
    ]
  }"
```

### 7.2 Expected Results

**With Fix (Correct):**
```
Status: 200 OK
Response: Success
```

 **Without Fix (Bug):**
```json
{
  "detail": "VectorStoreNotFoundError: Vector Store 'vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d' not found."
}
```

 **Critical Test:** If insertion succeeds, the fix works.

---

## Step 8: Query Data (After Restart – Verification)

### 8.1 Query all data

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"restart\"
  }" | jq
```

### 8.2 Expected Response

```json
{
  "chunks": [
    {
      "content": "This chunk was inserted AFTER the server restart.",
      "metadata": {"source": "post-restart", "test": true}
    }
  ],
  "scores": [0.95]
}
```

**Checkpoint:** Both old and new data are queryable.

---

## Step 9: Multiple Restart Test (Extra Verification)

### 9.1 Restart again

```bash
Ctrl + C
llama stack run run.yaml --port 5000
```

### 9.2 Query after restart

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"programming\"
  }" | jq
```

**Expected:** Works correctly across multiple restarts.



<hr>This is an automatic backport of pull request #3977 done by
[Mergify](https://mergify.com).

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Juan Pérez de Algaba <124347725+jperezdealgaba@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-11-24 11:30:21 -08:00
mergify[bot]
f216eb99be
fix: allowed_models config did not filter models (backport #4030) (#4223)
# What does this PR do?

closes #4022 

## Test Plan

ci w/ new tests<hr>This is an automatic backport of pull request #4030
done by [Mergify](https://mergify.com).

Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-24 11:29:53 -08:00
github-actions[bot]
49a290e53e chore: update lockfiles for 0.3.2
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 1m24s
Pre-commit / pre-commit (push) Successful in 2m33s
2025-11-12 23:21:28 +00:00
github-actions[bot]
1536b8e890 build: Bump version to 0.3.2 2025-11-12 23:19:12 +00:00
github-actions[bot]
dbef00de28 chore: update lockfiles for 0.3.2rc3 2025-11-12 22:48:00 +00:00
github-actions[bot]
01ff0cb9e2 Release candidate 0.3.2rc3 2025-11-12 22:33:56 +00:00
github-actions[bot]
56a723c800 Release candidate 0.3.2rc2 2025-11-12 22:12:27 +00:00
github-actions[bot]
096a3c6013 Release candidate 0.3.2rc1 2025-11-12 21:43:15 +00:00
mergify[bot]
641d5144be
fix(inference): enable routing of models with provider_data alone (backport #3928) (#4142)
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.

Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.

Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.

Also, updated inference router so that the responses have the _exact_
model that the request had.

## Test Plan

Added an integration test

Closes #3929<hr>This is an automatic backport of pull request #3928 done
by [Mergify](https://mergify.com).

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-11-12 13:41:27 -08:00
mergify[bot]
a6c3a9cadf
fix: harden storage semantics (backport #4118) (#4138)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests (Replay) / generate-matrix (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 48s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (push) Failing after 1m10s
Unit Tests / unit-tests (3.13) (push) Failing after 2m41s
Unit Tests / unit-tests (3.12) (push) Failing after 2m44s
Pre-commit / pre-commit (push) Successful in 3m22s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m16s
Fixes issues in the storage system by guaranteeing immediate durability
for responses and ensuring background writers stay alive. Three related
fixes:

* Responses to the OpenAI-compatible API now write directly to
Postgres/SQLite inside the request instead of detouring through an async
queue that might never drain; this restores the expected
read-after-write behavior and removes the "response not found" races
reported by users.

* The access-control shim was stamping owner_principal/access_attributes
as SQL NULL, which Postgres interprets as non-public rows; fixing it to
use the empty-string/JSON-null pattern means conversations and responses
stored without an authenticated user stay queryable (matching SQLite).

* The inference-store queue remains for batching, but its worker tasks
now start lazily on the live event loop so server startup doesn't cancel
them—writes keep flowing even when the stack is launched via llama stack
run.

Closes #4115 

### Test Plan

Added a matrix entry to test our "base" suite against Postgres as the
store.<hr>This is an automatic backport of pull request #4118 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-12 13:01:21 -08:00
mergify[bot]
56d87f5133
chore(ci): remove unused recordings (backport #4074) (#4141)
Added a script to cleanup recordings. While doing this, moved the CI
matrix generation to a separate script so there is a single source of
truth for the matrix.

Ran the cleanup script as:
```
PYTHONPATH=. python scripts/cleanup_recordings.py
```

Also added this as part of the pre-commit workflow to ensure that the
recordings are always up to date and that no stale recordings are left
in the repo.
<hr>This is an automatic backport of pull request #4074 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-12 12:36:28 -08:00
mergify[bot]
0d525d9a24
docs: clarify model identification uses provider_model_id not model_id (backport #4128) (#4137)
Updated documentation to accurately reflect current behavior where
models are identified as provider_id/provider_model_id in the system.

Changes:
o Clarify that model_id is for configuration purposes only o Explain
models are accessed as provider_id/provider_model_id o Remove outdated
aliasing example that suggested model_id could be used
  as a custom identifier

This corrects the documentation which previously suggested model_id
could be used to create friendly aliases, which is not how the code
actually works.
<hr>This is an automatic backport of pull request #4128 done by
[Mergify](https://mergify.com).

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Derek Higgins <derekh@redhat.com>
2025-11-12 10:41:23 -08:00
mergify[bot]
bae22060de
docs: use 'uv pip' to avoid pitfalls of using 'pip' in virtual environment (backport #4122) (#4136)
# What does this PR do?
In the **Detailed Tutorial**, at **Step 3**, the **Install with venv**
option creates a new virtual environment `client`, activates it then
attempts to install the llama-stack-client using pip.
```
uv venv client --python 3.12
source client/bin/activate
pip install llama-stack-client    <- this is the problematic line
```
However, the pip command will likely fail because the `uv venv` command
doesn't, by default, include adding the pip command to the virtual
environment that is created. The pip command will error either because
pip doesn't exist at all, or, if the pip command does exist outside of
the virtual environment, return a different error message. The latter
may be unclear to the user why it is failing.

This PR changes 'pip' to 'uv pip', allowing the install action to
function in the virtual environment as intended, and without the need
for pip to be installed.




## Test Plan
1. Use linux or WSL (virtual environments on Windows use `Scripts`
folder instead of `bin` [virtualenv
#993ba13](993ba1316a)
which doesn't align with the tutorial)
2. Clone the `llama-stack` repo
3. Run the following and verify success:
```
uv venv client --python 3.12
source client/bin/activate
```
5. Run the updated command:
```
uv pip install llama-stack-client
```
6. Observe the console output confirms that the virtual environment
`client` was used:

> Using Python 3.12.3 environment at: **client**<hr>This is an automatic
backport of pull request #4122 done by [Mergify](https://mergify.com).

Co-authored-by: paulengineer <154521137+paulengineer@users.noreply.github.com>
2025-11-12 10:41:15 -08:00
mergify[bot]
a380b5fcb1
fix: print help for list-deps if no args (backport #4078) (#4083)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
# What does this PR do?

list-deps takes  positional args OR things like --providers

the issue with this, is that these args need to be optional since by
nature, one or the other can be specified.

add a check to list-deps that checks `if not args.providers and not
args.config`. If this is true, help is printed and we exit.

resolves #4075

## Test Plan
before:

```
╰─ llama stack list-deps
Traceback (most recent call last):
  File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
    parser.run(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run
    args.func(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command
    return run_stack_list_deps_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command
    normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config)
                                                                                          ^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value

```

after:

```
╰─ llama stack list-deps
usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config | distro]

list the dependencies for a llama stack distribution

positional arguments:
  config | distro       Path to config file to use or name of known distro (llama stack list for a list). (default: None)

options:
  -h, --help            show this help message and exit
  --providers PROVIDERS
                        sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple
                        providers per API. (default: None)
  --format {uv,deps-only}
                        Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only)
 ```
<hr>This is an automatic backport of pull request #4078 done by [Mergify](https://mergify.com).

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Charlie Doern <cdoern@redhat.com>
2025-11-05 14:58:47 -08:00
Ashwin Bharambe
8b878e9d48
fix(ci): export UV_INDEX_STRATEGY to current shell before running uv sync (#4019)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Pre-commit / pre-commit (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Fixes #4017 follow-up issue where UV_INDEX_STRATEGY was only exported to
GITHUB_ENV but not to the current shell.

The commit e0bb7529 fixed the empty string issue but introduced a new
bug: UV_INDEX_STRATEGY was only exported to GITHUB_ENV (for subsequent
steps), not to the current shell environment. Since uv sync runs in the
same step, it never saw the variable.

This caused all CI runs on release-0.3.x to fail with dependency
resolution errors like:

```
setuptools was found on https://test.pypi.org/simple/, but not at the requested version.
A compatible version may be available on PyPI. Use --index-strategy unsafe-best-match.
```

This fix adds `export UV_INDEX_STRATEGY=unsafe-best-match` to make the
variable available in the current shell before running uv commands.

Note: Main branch doesn't hit this bug because UV_EXTRA_INDEX_URL is
only set on release branches.
2025-11-01 12:54:19 -07:00
Ashwin Bharambe
e0bb7529ed
fix: only set UV_INDEX_STRATEGY when UV_EXTRA_INDEX_URL is present (#4017)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Pre-commit / pre-commit (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 15s
Cherry-pick of bc12fe6c4 to release-0.3.x

Fixes GitHub Actions workflows failing with UV index strategy errors
when testing on RC tags and non-release branches.

The issue was that UV_INDEX_STRATEGY was being set to an empty string in
the environment, causing UV to fail with "error: a value is required for
'--index-strategy'".

The fix removes UV_INDEX_STRATEGY from the env block and only sets it to
'unsafe-best-match' when UV_EXTRA_INDEX_URL is actually present.
2025-10-31 16:22:01 -07:00
github-actions[bot]
bdd330a94a chore: update lockfiles for 0.3.1 2025-10-31 22:56:35 +00:00
github-actions[bot]
033c1abf29 build: Bump version to 0.3.1 2025-10-31 22:54:10 +00:00
github-actions[bot]
dd6aee179d chore: update lockfiles for 0.3.1rc5 2025-10-31 22:43:57 +00:00
github-actions[bot]
dc3674a82d Release candidate 0.3.1rc5 2025-10-31 22:35:23 +00:00
Ashwin Bharambe
7ac81f69fe build: fix uv.lock 2025-10-31 15:31:01 -07:00
github-actions[bot]
7fdfeef44a Release candidate 0.3.1rc4 2025-10-31 22:15:04 +00:00
Ashwin Bharambe
bf1693c2ee build: fix uv.lock 2025-10-31 15:07:39 -07:00
github-actions[bot]
13c6695fd3 Release candidate 0.3.1rc3 2025-10-31 21:29:01 +00:00
Ashwin Bharambe
e8a3dfbe96
docs: A getting started notebook featuring simple agent examples (#4015)
Cherry-pick of #3955 to release-0.3.x

Adds a getting started notebook with simple agent examples to help users
get started with llama-stack agents.

Co-authored-by: Omar Abdelwahab <omaryashraf10@gmail.com>
Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-10-31 14:27:22 -07:00
Ashwin Bharambe
637f8bef9c build: fix uv.lock 2025-10-31 14:01:45 -07:00
github-actions[bot]
a5372dbdf5 Release candidate 0.3.1rc2 2025-10-31 20:50:32 +00:00
Ashwin Bharambe
9f1e4a07c9
feat: support workers in run config (#4014)
Cherry-pick of #3992 to release-0.3.x

Adds support for configuring the number of workers in run.yaml
configuration files.

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-10-31 13:48:55 -07:00
Ashwin Bharambe
b088665227
fix(ci): unset empty UV index env vars to prevent uv errors (#4013)
Cherry-pick of #4012 to release-0.3.x

Fixes container builds failing with UV index strategy errors when build
args are passed with empty values.

Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="")
become environment variables with empty string values in RUN commands.
UV interprets these as if --index-strategy "" was passed on the command
line, causing build failures with "error: a value is required for
'--index-strategy <UV_INDEX_STRATEGY>'".

This is a footgun because empty string ≠ unset variable, and ARGs
silently propagate to all RUN commands, only failing when declared with
empty defaults.

The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of
RUN blocks, saves the values early, and only restores them for editable
installs with RC dependencies. All other install modes (PyPI, test-pypi,
client) now run with a clean environment.
2025-10-31 13:45:47 -07:00
Ashwin Bharambe
73d70546d4
chore(release-0.3.x): handle missing external_providers_dir (#4011)
Cherry-pick of #3974 to release-0.3.x branch.

## Summary
- Fixes handling of missing external_providers_dir in stack
configuration

## Original PR
Fixes from #3974

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Doug Edgar <dedgar@redhat.com>
2025-10-31 12:55:34 -07:00
Ashwin Bharambe
a488d8ce10
fix(ci): install client from release branch before uv sync (#4002)
Backport of #4001 to release-0.3.x branch.

Fixes CI failures on release branches where uv sync can't resolve RC
dependencies.

## The Problem

On release branches like `release-0.3.x`, pyproject.toml requires
`llama-stack-client>=0.3.1rc1`. RC versions only exist on test.pypi, not
PyPI. This causes multiple CI failures:

1. `uv sync` fails because it can't resolve RC versions from PyPI
2. pre-commit hooks (uv-lock, codegen) fail for the same reason  
3. mypy workflow section needs uv installed

## The Solution

Configure UV to use test.pypi when on release branches:

- Set `UV_INDEX_URL=https://test.pypi.org/simple/` (primary)
- Set `UV_EXTRA_INDEX_URL=https://pypi.org/simple/` (fallback)
- Set `UV_INDEX_STRATEGY=unsafe-best-match` to check both indexes

This allows `uv sync` to resolve common packages from PyPI and RC
versions from test.pypi.

## Additional Fixes

- Export UV env vars to `GITHUB_ENV` so pre-commit hooks inherit them
- Install uv in pre-commit workflow for mypy section
- Handle missing `type_checking` dependency group on release-0.3.x
- Regenerate uv.lock with RC versions for the release branch

## Changes

- Created reusable `install-llama-stack-client` action for configuration
- Modified `setup-runner` to set UV environment variables before sync
- Modified `pre-commit` workflow to configure client and export env vars
- Updated uv.lock with RC versions from test.pypi

This is a cherry-pick of commits afa9f0882, c86e6e906, 626639bee, and
081566321 from main, plus additional fixes for release branch
compatibility.
2025-10-31 11:44:05 -07:00
github-actions[bot]
f8272b2faf Release candidate 0.3.1rc1 2025-10-31 04:54:54 +00:00
Ashwin Bharambe
39f33f7f12
feat(cherry-pick): fixes for 0.3.1 release (#3998)
## Summary

Cherry-picks 5 critical fixes from main to the release-0.3.x branch for
the v0.3.1 release, plus CI workflow updates.

**Note**: This recreates the cherry-picks from the closed PR #3991, now
targeting the renamed `release-0.3.x` branch (previously
`release-0.3.x-maint`).

## Commits

1. **2c56a8560** - fix(context): prevent provider data leak between
streaming requests (#3924)
- **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking
between requests
   - Fixed import path for 0.3.0 compatibility

2. **ddd32b187** - fix(inference): enable routing of models with
provider_data alone (#3928)
   - Enables routing for fully qualified model IDs with provider_data
   - Resolved merge conflicts, adapted for 0.3.0 structure

3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens
(#3667)
- Fixes failures with Gemini and other providers that reject
max_tokens=0
   - Non-breaking API change

4. **d7f9da616** - fix(responses): sync conversation before yielding
terminal events in streaming (#3888)
- Ensures conversation sync executes even when streaming consumers break
early

5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey
levels (#3885)
   - Fixes logging infrastructure

6. **75b49cb3c** - ci: support release branches and match client branch
(#3990)
   - Updates CI workflows to support release-X.Y.x branches
- Matches client branch from llama-stack-client-python for release
testing
   - Fixes artifact name collisions

## Adaptations for 0.3.0

- Fixed import paths: `llama_stack.core.telemetry.tracing` →
`llama_stack.providers.utils.telemetry.tracing`
- Fixed import paths: `llama_stack.core.telemetry.telemetry` →
`llama_stack.apis.telemetry`
- Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute
name)
- Removed `rerank()` method that doesn't exist in 0.3.0

## Testing

All imports verified and tests should pass once CI is set up.
2025-10-30 21:51:42 -07:00