Commit graph

3219 commits

Author SHA1 Message Date
Ashwin Bharambe
aa2bd82b1d
fix(ci): add recordings for responses suite due to web search type changing (#4104)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 1m3s
#4103 broke (even though the PR itself was green) trunk
2025-11-07 10:42:07 -08:00
Aakanksha Duggal
b83184f7ef
feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103)
# What does this PR do?
Resolves #4102 

1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and
the `OpenAIResponseInputToolWebSearch.type` Literal union
2. No changes needed to tool execution logic - all `web_search` types
map to the same underlying tool
3. Backward compatibility is maintained - existing `web_search`,
`web_search_preview`, and `web_search_preview_2025_03_11` types continue
to work
4. Added an integration test case using {"type":
"web_search_2025_08_26"} to verify it works correctly
5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to
reflect that `web_search_2025_08_26` is now supported.
6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist
in the codebase)


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-07 10:01:12 -08:00
Ashwin Bharambe
f49cb0b717
chore: Stack server no longer depends on llama-stack-client (#4094)
This dependency has been bothering folks for a long time (cc @leseb). We
really needed it due to "library client" which is primarily used for our
tests and is not a part of the Stack server. Anyone who needs to use the
library client can certainly install `llama-stack-client` in their
environment to make that work.

Updated the notebook references to install `llama-stack-client`
additionally when setting things up.
2025-11-07 09:54:09 -08:00
Lê Nam Khánh
68c976a2d8
docs: fix typos in some files (#4101)
This PR fixes typos in the file file using codespell.
2025-11-07 16:07:46 +01:00
Ashwin Bharambe
b68a25d377
fix(tests): bring back some responses tests (#4098)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 1m6s
https://github.com/llamastack/llama-stack/pull/4055 cleaned the agents
implementation but while doing so it removed some tests which actually
corresponded to the responses implementation. This PR brings those tests
and assocated recordings back.

(We should likely combine all responses tests into one suite, but that
is beyond the scope of this PR.)
2025-11-07 07:49:38 +01:00
Sumanth Kamenani
e894e36eea
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.

Closes: #3410


## What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.

## Test Plan

**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format

# Bedrock OpenAI-Compatible Provider - Test Results


**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`


---

## Test 1: Model Listing

**Request:**
```http
GET /v1/models HTTP/1.1
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}
```

---

## Test 2: Non-Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```

---

## Test 3: Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```

---

## Test 4: Error Handling - Invalid Model

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```

---

## Test 5: Multi-Turn Conversation

**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}
```

**Response 1:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}
```

**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}
```

**Response 2:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```

**Context retained across turns**

---

## Test 6: System Messages

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}
```


---

## Test 7: Tool Calling

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}
```

---

## Test 8: Sampling Parameters

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}
```

---

## Test 9: Authentication Error Handling

### Subtest A: Invalid API Key

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```

---

### Subtest B: Empty API Key (Fallback to Config)

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}
```

 **Fell back to config key**

---

### Subtest C: Malformed Token

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
2025-11-06 17:18:18 -08:00
Ashwin Bharambe
a2c4c12384
chore(ui): remove the Streamlit UI (#4097) 2025-11-06 15:51:57 -08:00
Sébastien Han
939a2db58f
chore: update stainless config (#4096)
# What does this PR do?

Removed in https://github.com/llamastack/llama-stack/pull/4067

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-06 15:58:13 -05:00
Charlie Doern
9df073450f
feat: remove core.telemetry as a dependency of llama_stack.apis (#4064)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 55s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
# What does this PR do?

Remove circular dependency by moving tracing from API protocol
definitions
 to router implementation layer.

This gets us closer to having a self contained API package with no other
cross-cutting dependencies to other parts of the llama stack codebase.
To the best of our ability, the llama_stack.api should only be type and
protocol definitions.

  Changes:
- Create apis/common/tracing.py with marker decorator (zero core
dependencies)
- Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol
classes
- Apply actual tracing in core/resolver.py in `instantiate_provider`
based on protocol marker
- Move MetricResponseMixin from core to apis (it's an API response type)
  - APIs package is now self-contained with zero core dependencies

The tracing functionality remains identical - actual trace_protocol from
core
is applied to router implementations at runtime when both telemetry is
enabled
  and the protocol has the `__marked_for_tracing__` marker.

  ## Test Plan

  Manual integration test confirms identical behavior to main branch:

  ```bash
  llama stack list-deps --format uv starter | sh
  export OLLAMA_URL=http://localhost:11434
  llama stack run starter

  curl -X POST http://localhost:8321/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "ollama/gpt-oss:20b",
         "messages": [{"role": "user", "content": "Say hello"}],
         "max_tokens": 10}'
         
```

  Verified identical between main and this branch:
  - trace_id present in response
  - metrics array with prompt_tokens, completion_tokens, total_tokens
  - Server logs show trace_protocol applied to all routers

  Existing telemetry integration tests (tests/integration/telemetry/) validate
  trace context propagation and span attributes.


relates to #3895

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-06 10:58:30 -08:00
Derek Higgins
dc9497a3b2
ci: Temperarily disable Telemetry during tests (#4090)
Closes: #4089

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 17:53:02 +01:00
Derek Higgins
03d23db910
ci: vllm ci job update (#4088)
Add missing recording for vllm in library mode
Add Docker env (missed during rebase)

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 16:59:55 +01:00
Enrico Zimuel
98a3df09bb
Added Elasticsearch in docs + ci-test build 2025-11-06 15:39:23 +01:00
Enrico Zimuel
9bfe5245a0
Fixed issue with steps variable in CI 2025-11-06 15:39:23 +01:00
Enrico Zimuel
16674d6756
Added Elasticsearch in CI + minor fix 2025-11-06 15:39:23 +01:00
Enrico Zimuel
12f8d96f48
Added tests + docs + CI for Elasticsearch 2025-11-06 15:39:21 +01:00
Enrico Zimuel
f1c8a200c8
Draft for Elasticsearch as vector db 2025-11-06 15:37:40 +01:00
Derek Higgins
c62a09ab76
ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 22s
UI Tests / ui-tests (22) (push) Successful in 57s
o Introduces vLLM provider support to the record/replay testing
framework
o Enabling both recording and replay of vLLM API interactions alongside
existing Ollama support.

The changes enable testing of vLLM functionality. vLLM tests focus on
inference capabilities, while Ollama continues to exercise the full API
surface
including vision features.

--
This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B
appears to be more capable at structure output and tool calls.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-06 10:36:40 +01:00
Ashwin Bharambe
bef1b044bd
refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 48s
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.

A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models

## Test Plan

Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581

Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']

Using LLM model: passthrough/ollama/llama3.2-vision:11b

Making inference request...

Response: 4.

--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
2025-11-05 18:15:11 -08:00
ehhuang
b335419faa
fix: actualize chunking strategy in vector store create API (#4086)
# What does this PR do?

- when create vector store is called without chunk strategy, we actually
the strategy used so that the value is persisted instead of
strategy='None'

## Test Plan
updated tests
2025-11-05 15:47:54 -08:00
Roy Belio
c672a5d792
feat: ability to use postgres as store for starter distro (#4076)
## What does this PR do?

The starter distribution now comes with all the required packages to
support persistent stores—like the agent store, metadata, and
inference—using PostgreSQL. Users can enable PostgreSQL support by
setting the `ENABLE_POSTGRES_STORE=1` environment variable.

This PR consolidates the functionality from the removed `postgres-demo`
distribution into the starter distribution, reducing maintenance
overhead.

**Closes: #2619**  
**Supersedes: #2851** (rebased and updated)

## Changes Made

1. **Added PostgreSQL support to starter distribution**
   - New `run-with-postgres-store.yaml` configuration
- Automatic config switching via `ENABLE_POSTGRES_STORE` environment
variable
   - Removed separate `postgres-demo` distribution

2. **Updated to new build system**
   - Integrated postgres switching logic into Containerfile entrypoint
   - Uses new `storage_backends` and `storage_stores` API
   - Properly configured both PostgreSQL KV store and SQL store

3. **Updated dependencies**
   - Added `psycopg2-binary` and `asyncpg` to starter distribution
   - All postgres-related dependencies automatically included

## How to Use

### With Docker (PostgreSQL):
```bash
docker run \
  -e ENABLE_POSTGRES_STORE=1 \
  -e POSTGRES_HOST=your_postgres_host \
  -e POSTGRES_PORT=5432 \
  -e POSTGRES_DB=llamastack \
  -e POSTGRES_USER=llamastack \
  -e POSTGRES_PASSWORD=llamastack \
  -e OPENAI_API_KEY=your_key \
  llamastack/distribution-starter
```

### PostgreSQL environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)

## Test Plan

All pre-commit hooks pass (mypy, ruff, distro-codegen)  
`llama stack list-deps starter` confirms psycopg2-binary is included  
Storage configuration correctly uses PostgreSQL backends  
Container builds successfully with postgres support  

## Credits

Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to
work with latest main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Sébastien Han @leseb

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-11-05 15:37:06 -08:00
ehhuang
9d5c34af27
fix!: BREAKING CHANGE: vector_store: search API response fix (#4080)
# What does this PR do?
- search_query in the vector store search API should be a list,
according to https://github.com/openai/openai-openapi


## Test Plan
modified tests


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080).
* #4086
* __->__ #4080
2025-11-05 15:01:48 -08:00
ehhuang
84a84ee85c
fix: last_id when listing files in vector store (#4079)
# What does this PR do?
the last_id should be the id of the last item in the returned list, not
the unfiltered list.

## Test Plan
fixed test
2025-11-05 14:10:10 -08:00
Ashwin Bharambe
d9cf5cd480
fix(ci): use --no-cache instead of --no-cache-dir (#4081)
This is necessary to make sure GPU dockers can be built on CI without
running out of space.
2025-11-05 12:14:02 -08:00
Charlie Doern
c899b50723
fix: print help for list-deps if no args (#4078)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test llama stack list-deps / generate-matrix (push) Successful in 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test llama stack list-deps / list-deps (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 57s
# What does this PR do?

list-deps takes  positional args OR things like --providers

the issue with this, is that these args need to be optional since by
nature, one or the other can be specified.

add a check to list-deps that checks `if not args.providers and not
args.config`. If this is true, help is printed and we exit.

resolves #4075

## Test Plan
before:

```
╰─ llama stack list-deps
Traceback (most recent call last):
  File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
    parser.run(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run
    args.func(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command
    return run_stack_list_deps_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command
    normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config)
                                                                                          ^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value

```

after:

```
╰─ llama stack list-deps
usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config | distro]

list the dependencies for a llama stack distribution

positional arguments:
  config | distro       Path to config file to use or name of known distro (llama stack list for a list). (default: None)

options:
  -h, --help            show this help message and exit
  --providers PROVIDERS
                        sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple
                        providers per API. (default: None)
  --format {uv,deps-only}
                        Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only)
 ```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-05 11:34:08 -08:00
Wojciech-Rebisz
07c28cd519
fix: Avoid model_limits KeyError (#4060)
# What does this PR do?
It avoids model_limit KeyError while trying to get embedding models for
Watsonx

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/llamastack/llama-stack/issues/4059

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Start server with watsonx distro:
```bash
llama stack list-deps watsonx | xargs -L1 uv pip install
uv run llama stack run watsonx
```
Run 
```python
client = LlamaStackClient(base_url=base_url)
client.models.list()
```
Check if there is any embedding model available (currently there is not
a single one)
2025-11-05 10:34:40 -08:00
Emilio Garcia
ba50790a28
feat(tests): metrics tests (#3966)
# What does this PR do?
1. Make telemetry tests as easy as possible for users by expanding the
`SpanStub` data class and creating the `MetricStub` dataclass as a way
to consistently marshal telemetry data in test fixtures and unmarshal
and handle it in tests.
2. Structure server and client tests to always follow the same standards
for consistent testing experience by using the `SpanStub` and
`MetricStub` data class objects.
3. Enable Metrics Testing for completions endpoint
4. Correct token metrics to use histograms instead of counts to capture
tokens per request rather than a cumulative count of tokens over the
lifecycle of the server.

## Test Plan
These are tests
2025-11-05 10:26:15 -08:00
Roy Belio
2619f3552e
fix: show built-in distributions in llama stack list (#4040)
# What does this PR do?
Fixes issue #3922 where `llama stack list` only showed distributions
after they were run. This PR makes the command show all available
distributions immediately on a fresh install.

Closes #3922

## Changes
- **Updated `_get_distribution_dirs()`** to discover both built-in and
built distributions:
- Built-in distributions from `src/llama_stack/distributions/` (e.g.,
starter, nvidia, dell)
  - Built distributions from `~/.llama/distributions`
- **Added a "Source" column** to distinguish between "built-in" and
"built" distributions
- **Built distributions override built-in ones** with the same name
(expected behavior)
- **Updated config file detection logic** to handle both naming
conventions:
  - Built-in: `build.yaml` and `run.yaml`
  - Built: `{name}-build.yaml` and `{name}-run.yaml`

## Test Plan
### Unit Tests
Added comprehensive unit tests in
`tests/unit/distribution/test_stack_list.py`:
```bash
uv run pytest tests/unit/distribution/test_stack_list.py -v
```
**Result**:  All 8 tests pass
- `test_builtin_distros_shown_without_running` - Verifies the core fix
for issue #3922
- `test_builtin_and_built_distros_shown_together` - Ensures both types
are shown
- `test_built_distribution_overrides_builtin` - Tests override behavior
- `test_empty_distributions` - Edge case handling
- `test_config_files_detection_builtin` - Config file detection for
built-in distros
- `test_config_files_detection_built` - Config file detection for built
distros
- `test_llamastack_prefix_stripped` - Name normalization
- `test_hidden_directories_ignored` - Filters hidden directories

### Manual Testing
**Before the fix** (simulated with empty `~/.llama/distributions`):
```bash
$ llama stack list
No stacks found in ~/.llama/distributions
```

**After the fix**:
```bash
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ci-tests          │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ dell              │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ meta-reference-g… │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ nvidia            │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ open-benchmark    │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ postgres-demo     │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter-gpu       │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ watsonx           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```

**After running a distribution**:
```bash
$ llama stack run starter  # Creates ~/.llama/distributions/starter
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
│ starter           │ built    │ ~/.llama/distri…  │ No           │ No         │
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```
Note how `starter` now shows as "built" and points to
`~/.llama/distributions`, overriding the built-in version.

## Breaking Changes
**No breaking changes** - This is a bug fix that improves user
experience with minimal risk:
- No programmatic parsing of output found in the codebase
- Table format is clearly for human consumption
- The new "Source" column helps users understand where distributions
come from
- The behavior change is exactly what users expect (seeing all available
distributions)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-05 10:16:28 -08:00
Ashwin Bharambe
4d3069bfa5
chore(ci): remove unused recordings (#4074)
Added a script to cleanup recordings. While doing this, moved the CI
matrix generation to a separate script so there is a single source of
truth for the matrix.

Ran the cleanup script as:
```
PYTHONPATH=. python scripts/cleanup_recordings.py
```

Also added this as part of the pre-commit workflow to ensure that the
recordings are always up to date and that no stale recordings are left
in the repo.
2025-11-05 09:21:58 -08:00
Sébastien Han
fd1603beef
chore: remove unused classes (#4077)
# What does this PR do?

These were maybe be included in the webmethod?
The unit test was pointless too since the request was never used
anywhere?

This shouldn't be in the API definition, if we never consume it.

## Test Plan

CI with pre-commit on OpenAPI spec generation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-05 16:45:23 +01:00
Enrico Zimuel
e238529c00
Merge branch 'elasticsearch-integration' of github.com:ezimuel/llama-stack into elasticsearch-integration 2025-11-05 15:00:51 +01:00
Enrico Zimuel
153e5f8971
Merge remote-tracking branch 'upstream/main' into elasticsearch-integration 2025-11-05 14:58:43 +01:00
Enrico Zimuel
ce4fa525f8
Fixed issue with steps variable in CI 2025-11-05 14:58:34 +01:00
Ashwin Bharambe
392e01dc79 chore: add stainless config
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 1m13s
name it to indicate it is not yet source of truth to avoid confusion
2025-11-04 15:44:07 -08:00
ehhuang
95b0493fae
chore: move src/llama_stack/ui to src/llama_stack_ui (#4068)
# What does this PR do?
This better separates UI from backend code, which was a point of
confusion often for our beloved AI friends.


## Test Plan
CI
2025-11-04 15:21:49 -08:00
Ashwin Bharambe
5850e3473f fix: remove straggler openapi HTML file 2025-11-04 14:54:33 -08:00
Ashwin Bharambe
0c49a53c97
chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067)
RAG aka file search is implemented via the Responses API by specifying
the file-search tool. The backend implementation remains unchanged. This
PR merely removes the directly exposed API surface which allowed users
to directly perform searches from the client.

This facility is now available via the `client.vector_store.search()`
OpenAI compatible API.
2025-11-04 14:50:54 -08:00
Ashwin Bharambe
7cb2121c60
Merge branch 'main' into elasticsearch-integration 2025-11-04 11:40:21 -08:00
Ashwin Bharambe
a8a8aa56c0
chore!: remove the agents (sessions and turns) API (#4055)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m10s
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed

The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.

Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
2025-11-04 09:38:39 -08:00
Enrico Zimuel
a6995508e3
Added Elasticsearch in CI + minor fix 2025-11-04 16:14:09 +01:00
Enrico Zimuel
1478e672c8
Added elasticsearch in conftest 2025-11-04 14:33:41 +01:00
Mustafa Elbehery
a6ddbae0ed
chore(test): migrate unit tests from unittest to pytest nvidia test eval (#3249)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 1m16s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR migrates `unittest` to `pytest` in
`tests/unit/providers/nvidia/test_eval.py`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Part of https://github.com/llamastack/llama-stack/issues/2680

Supersedes https://github.com/llamastack/llama-stack/pull/2791

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-11-04 10:29:07 +01:00
Ashwin Bharambe
053fc0ac39
chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m13s
This PR removes all routes which we had marked deprecated for the 0.3.0
release.

This includes:
- all the `/v1/openai/v1/` routes (the corresponding /v1 routes still
exist of course)
- the /agents API (which is superseded completely by Responses +
Conversations)
- several alpha routes which had a "v1" route to aide transitioning to
"v1alpha"

This is the corresponding client-python change:
https://github.com/llamastack/llama-stack-client-python/pull/294
2025-11-03 19:00:59 -08:00
Nathan Weinberg
62b3ad349a
fix: return to hardcoded model IDs for Vertex AI (#4041)
# What does this PR do?
partial revert of b67aef2

Vertex AI doesn't offer an endpoint for listing models from Google's
Model Garden

Return to hardcoded values until such an endpoint is available

Closes #3988 

## Test Plan
Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`,
`VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the
starter distribution
```bash
$ llama stack list-deps starter | xargs -L1 uv pip install
$ llama stack run starter
```

Client side, formerly broken cURL requests now working
```bash
$ curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.provider_id == "vertexai"))'
[
  {
    "identifier": "vertexai/vertex_ai/gemini-2.0-flash",
    "provider_resource_id": "vertex_ai/gemini-2.0-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-flash",
    "provider_resource_id": "vertex_ai/gemini-2.5-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-pro",
    "provider_resource_id": "vertex_ai/gemini-2.5-pro",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  }
]
$ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a
i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" | jq 
{                                                                                                                                    
  "id": "p8oIaYiQF8_PptQPo-GH8QQ",                                                                                                   
  "choices": [                                                                                                                       
    {                                                                                                                                
      "finish_reason": "stop",                                                                                                       
      "index": 0,                                                                                                                    
      "logprobs": null,                                                                                                              
      "message": {                                                                                                                   
        "content": "Hello there! How can I help you today?",                                                                         
        "refusal": null,                                                                                                             
        "role": "assistant",                                                                                                         
        "annotations": null,                                                                                                         
        "audio": null,                                                                                                               
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
...
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-03 17:38:16 -08:00
Ashwin Bharambe
cb40da210f
fix: update tests for OpenAI-style models endpoint (#4053)
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.

**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
2025-11-03 17:30:08 -08:00
Sébastien Han
4a5ef65286
chore!: remove SDG API (#4035)
# What does this PR do?

This API hasn't received any traction and close to zero interest from
the community. Let's revisit in the future if things change.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 16:12:06 -08:00
Ashwin Bharambe
44096512b5
feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051)
We need to remove `/v1/openai/v1` paths shortly. There is one trouble --
our current `/v1/openai/v1/models` endpoint provides different data than
`/v1/models`. Unfortunately our tests target the latter (llama-stack
customized) behavior. We need to get to true OpenAI compatibility.

This is step 1: adding `custom_metadata` field to `OpenAIModel` that
includes all the extra stuff we add in the native `/v1/models` response.
This can be extracted on the consumer end by look at
`__pydantic_extra__` or other similar fields.

This PR:
- Adds `custom_metadata` field to `OpenAIModel` class in
`src/llama_stack/apis/models/models.py`
- Modified `openai_list_models()` in
`src/llama_stack/core/routing_tables/models.py` to populate
custom_metadata

Next Steps
1. Update stainless client to use `/v1/openai/v1/models` instead of
`/v1/models`
2. Migrate tests to read from `custom_metadata`
3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single
`/v1/models` endpoint
2025-11-03 15:56:07 -08:00
Ashwin Bharambe
2381714904
fix: enable SQLite WAL mode to prevent database locking errors (#4048)
Fixes race condition causing "database is locked" errors during
concurrent writes to SQLite, particularly in streaming responses with
guardrails where multiple inference calls write simultaneously.

Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple
concurrent readers and one writer without blocking. Set busy_timeout to
5s so SQLite retries instead of failing immediately. Remove the logic
that disabled write queues for SQLite since WAL mode eliminates the
locking issues that prompted disabling them.

Fixes: test_output_safety_guardrails_safe_content[stream=True] flake
2025-11-03 15:27:41 -08:00
ehhuang
628e38b3d5
test: always start a new server in integration-tests.sh (#4050)
# What does this PR do?
This prevents interference from already running servers, and allows
multiple concurrent integration test runs. Unleash the AIs!

## Test Plan
start a LS server at port 8321

Then observe test uses port 8322:

❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config
server:ci-tests --inference-mode replay --setup ollama --suite base
--pattern '(telemetry or safety)'
=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: ollama
Inference Mode: replay
Test Suite: base
Test Subdirs:
Test Pattern: (telemetry or safety)

Checking llama packages
llama-stack 0.4.0.dev0 /Users/erichuang/projects/new_test_server
llama-stack-client                       0.3.0
ollama                                   0.6.0
=== Applying Setup Environment Variables ===
Setting SQLITE_STORE_DIR:
/var/folders/cz/vyh7y1d11xg881lsxsshnc5c0000gn/T/tmp.bKLsaVAxyU
Setting stack config type: server
Setting up environment variables:
export OLLAMA_URL='http://0.0.0.0:11434'
export SAFETY_MODEL='ollama/llama-guard3:1b'

Will use port: 8322
=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start on port 8322...
 Llama Stack Server started successfully
2025-11-03 15:23:10 -08:00
Sébastien Han
da57b51fb6
ci: introduce Mergify bot to notify on PR conflicts (#4043)
This commit introduces Mergify, a powerful bot designed to assist with
automated merging and other CI-related tasks. As an initial step, we
enable a basic feature: automatically notifying users when a pull
request has merge conflicts.

When a conflict is detected, Mergify will add a label to the PR. This
label will be removed once the conflict is resolved.
This is foundation PR to activate the bot and start using it for
backports too.

In the future, we plan to expand Mergify’s role to include auto-merging,
as discussed in #1667, once the project is ready.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 12:21:19 -08:00
Derek Higgins
1562277cfd
ci: test adjustments for Qwen3-0.6B (#3978)
Without this hint Qwen3-0.6B tends to reply with the full name
and sometimes doesn't reply with the correct drafted year.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 12:19:35 -08:00