Commit graph

58 commits

Author SHA1 Message Date
Ken Dreyer
dabebdd230
fix: update hard-coded google model names (#4212)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test External API and Providers / test-external (venv) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (push) Failing after 36s
UI Tests / ui-tests (22) (push) Successful in 44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m21s
Unit Tests / unit-tests (3.12) (push) Failing after 1m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m33s
Pre-commit / pre-commit (push) Successful in 3m0s
# What does this PR do?
When we send the model names to Google's openai API, we must use the
"google" name prefix. Google does not recognize the "vertexai" model
names.

Closes #4211

## Test Plan
```bash
uv venv --python python312
. .venv/bin/activate
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```

Test that this shows the gemini models with their correct names:
```bash
curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.custom_metadata.provider_id == "vertexai"))'
```

Test that this chat completion works:
```bash
curl -X POST   -H "Content-Type: application/json"   "http://127.0.0.1:8321/v1/chat/completions"   -d '{
        "model": "vertexai/google/gemini-2.5-flash",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "Hello! Can you tell me a joke?"
          }
        ],
        "temperature": 1.0,
        "max_tokens": 256
      }'
```
2025-11-21 13:12:01 -08:00
Ken Dreyer
dc4665af17
feat!: change bedrock bearer token env variable to match AWS docs & boto3 convention (#4152)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Successful in 50s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test Llama Stack Build / build (push) Successful in 49s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Test External API and Providers / test-external (venv) (push) Failing after 1m18s
Unit Tests / unit-tests (3.13) (push) Failing after 1m58s
Unit Tests / unit-tests (3.12) (push) Failing after 2m5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m28s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m37s
Pre-commit / pre-commit (push) Successful in 3m50s
Rename `AWS_BEDROCK_API_KEY` to `AWS_BEARER_TOKEN_BEDROCK` to align with
the naming convention used in AWS Bedrock documentation and the AWS web
console UI. This reduces confusion when developers compare LLS docs with
AWS docs.

Closes #4147
2025-11-21 09:48:05 -05:00
Ashwin Bharambe
d649c3663e
fix: enforce allowed_models during inference requests (#4197)
The `allowed_models` configuration was only being applied when listing
models via the `/v1/models` endpoint, but the actual inference requests
weren't checking this restriction. This meant users could directly
request any model the provider supports by specifying it in their
inference call, completely bypassing the intended cost controls.

The fix adds validation to all three inference methods (chat
completions, completions, and embeddings) that checks the requested
model against the allowed_models list before making the provider API
call.

### Test plan

Added unit tests
2025-11-19 14:49:44 -08:00
Ian Miller
0757d5a917
feat(responses)!: implement support for OpenAI compatible prompts in Responses API (#3965)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for providing actual implementation of OpenAI
compatible prompts in Responses API. This is the follow up PR with
actual implementation after introducing #3942

The need of this functionality was initiated in #3514.

> Note, https://github.com/llamastack/llama-stack/pull/3514 is divided
on three separate PRs. Current PR is the third of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3321

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing, CI workflow with added unit tests

Comprehensive manual testing with new implementation:

**Test Prompts with Images with text on them in Responses API:**

I used this image for testing purposes: [iphone 17
image](https://github.com/user-attachments/assets/9e2ee821-e394-4bbd-b1c8-d48a3fa315de)

1. Upload an image:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/iphone.jpeg" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-d6d375f238e14f21952cc40246bc8504","bytes":556241,"created_at":1761750049,"expires_at":1793286049,"filename":"iphone.jpeg","purpose":"assistants"}%`

2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.",
    "variables": ["product_name", "description", "product_photo"]
  }'
```

`{"prompt":"You are a product analysis expert. Analyze the following
product:\n\nProduct Name: {{product_name}}\nDescription:
{{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed
analysis including quality assessment, target audience, and pricing
recommendations.","version":1,"prompt_id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":["product_name","description","product_photo"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please analyze this product",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62",
      "version": "1",
      "variables": {
        "product_name": {
          "type": "input_text",
          "text": "iPhone 17 Pro Max"
        },
         "product_photo": {
          "type": "input_image",
          "file_id": "file-d6d375f238e14f21952cc40246bc8504",
          "detail": "high"
        }
      }
    }
  }'
```


`{"created_at":1761750427,"error":null,"id":"resp_f897f914-e3b8-4783-8223-3ed0d32fcbc6","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"###
Product Analysis: iPhone 17 Pro Max\n\n**Quality Assessment:**\n\n-
**Display & Design:**\n - The 6.9-inch display is large, ideal for
streaming and productivity.\n - Anti-reflective technology and 120Hz
refresh rate enhance viewing experience, providing smoother visuals and
reducing glare.\n - Titanium frame suggests a premium build, offering
durability and a sleek appearance.\n\n- **Performance:**\n - The Apple
A19 Pro chip promises significant performance improvements, likely
leading to faster processing and efficient multitasking.\n - 12GB RAM is
substantial for a smartphone, ensuring smooth operation for demanding
apps and games.\n\n- **Camera System:**\n - The triple 48MP camera setup
(wide, ultra-wide, telephoto) is designed for versatile photography
needs, capturing high-resolution photos and videos.\n - The 24MP front
camera will appeal to selfie enthusiasts and content creators needing
quality front-facing shots.\n\n- **Connectivity:**\n - Wi-Fi 7 support
indicates future-proof wireless capabilities, providing faster and more
reliable internet connectivity.\n\n**Target Audience:**\n\n- **Tech
Enthusiasts:** Individuals interested in cutting-edge technology and
performance.\n- **Content Creators:** Users who need a robust camera
system for photo and video production.\n- **Luxury Consumers:** Those
who prefer premium materials and top-of-the-line specs.\n-
**Professionals:** Users who require efficient multitasking and
productivity features.\n\n**Pricing Recommendations:**\n\n- Given the
premium specifications, a higher price point is expected. Consider
pricing competitively within the high-end smartphone market while
justifying cost through unique features like the titanium frame and
advanced connectivity options.\n- Positioning around the $1,200 to
$1,500 range would align with expectations for top-tier devices,
catering to its target audience while ensuring
profitability.\n\nOverall, the iPhone 17 Pro Max showcases a blend of
innovative features and premium design, aimed at users seeking high
performance and superior
aesthetics.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_66f4d844-4d9e-4102-80fc-eb75b34b6dbd","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":{"product_name":{"text":"iPhone
17 Pro
Max","type":"input_text"},"product_photo":{"detail":"high","type":"input_image","file_id":"file-d6d375f238e14f21952cc40246bc8504","image_url":null}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":830,"output_tokens":394,"total_tokens":1224,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test Prompts with PDF files in Responses API:**

I used this PDF file for testing purposes:
[invoicesample.pdf](https://github.com/user-attachments/files/22958943/invoicesample.pdf)

1. Upload PDF:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/invoicesample.pdf" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","bytes":149568,"created_at":1761750730,"expires_at":1793286730,"filename":"invoicesample.pdf","purpose":"assistants"}%`


2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis",
    "variables": ["invoice_doc"]
  }'
```

`{"prompt":"You are an accounting and financial analysis expert. Analyze
the following invoice document:\n\nInvoice Document:
{{invoice_doc}}\n\nProvide a comprehensive
analysis","version":1,"prompt_id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":["invoice_doc"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please provide a detailed analysis of this invoice",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc",
      "version": "1",
      "variables": {
        "invoice_doc": {
          "type": "input_file",
          "file_id": "file-7fbb1043a4bb468cab60ffe4b8631d8e",
          "filename": "invoicesample.pdf"
        }
      }
    }
  }'
```


`{"created_at":1761750881,"error":null,"id":"resp_da866913-db06-4702-8000-174daed9dbbb","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"Here's
a detailed analysis of the invoice provided:\n\n### Seller
Information\n- **Business Name:** The invoice features a logo with
\"Sunny Farm\" indicating the business identity.\n- **Address:** 123
Somewhere St, Melbourne VIC 3000\n- **Contact Information:** Phone
number (03) 1234 5678\n\n### Buyer Information\n- **Name:** Denny
Gunawan\n- **Address:** 221 Queen St, Melbourne VIC 3000\n\n###
Transaction Details\n- **Invoice Number:** #20130304\n- **Date of
Transaction:** Not explicitly mentioned, likely inferred from the
invoice number or needs clarification.\n\n### Items Purchased\n1.
**Apple**\n - Price: $5.00/kg\n - Quantity: 1 kg\n - Subtotal:
$5.00\n\n2. **Orange**\n - Price: $1.99/kg\n - Quantity: 2 kg\n -
Subtotal: $3.98\n\n3. **Watermelon**\n - Price: $1.69/kg\n - Quantity: 3
kg\n - Subtotal: $5.07\n\n4. **Mango**\n - Price: $9.56/kg\n - Quantity:
2 kg\n - Subtotal: $19.12\n\n5. **Peach**\n - Price: $2.99/kg\n -
Quantity: 1 kg\n - Subtotal: $2.99\n\n### Financial Summary\n-
**Subtotal for Items:** $36.00\n- **GST (Goods and Services Tax):** 10%
of $36.00, which amounts to $3.60\n- **Total Amount Due:** $39.60\n\n###
Notes\n- The invoice includes a placeholder text: \"Lorem ipsum dolor
sit amet...\" which is typically used as filler text. This might
indicate a section intended for terms, conditions, or additional notes
that haven’t been completed.\n\n### Visual and Design Elements\n- The
invoice uses a simple and clear layout, featuring the business logo
prominently and stating essential information such as contact and
transaction details in a structured manner.\n- There is a \"Thank You\"
note at the bottom, which adds a professional and courteous
touch.\n\n### Considerations\n- Ensure the date of the transaction is
clear if there are any future references needed.\n- Replace filler text
with relevant terms and conditions or any special instructions
pertaining to the transaction.\n\nThis invoice appears standard,
representing a small business transaction with clearly itemized products
and applicable
taxes.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_39f3b39e-4684-4444-8e4d-e7395f88c9dc","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":{"invoice_doc":{"type":"input_file","file_data":null,"file_id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","file_url":null,"filename":"invoicesample.pdf"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":529,"output_tokens":513,"total_tokens":1042,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test simple text Prompt in Responses API:**

1. Create prompt:

```
 curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":["name","company","role","tone"],"is_default":false}%`

2. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "What is the capital of Ireland?",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef",
      "version": "1",
      "variables": {
        "name": {
          "type": "input_text",
          "text": "Alice"
        },
        "company": {
          "type": "input_text",
          "text": "Dummy Company"
        },
        "role": {
          "type": "input_text",
          "text": "Geography expert"
        },
        "tone": {
          "type": "input_text",
          "text": "professional and helpful"
        }
      }
    }
  }'

```


`{"created_at":1761751097,"error":null,"id":"resp_1b037b95-d9ae-4ad0-8e76-d953897ecaef","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"The
capital of Ireland is
Dublin.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_8e7c72b6-2aa2-4da6-8e57-da4e12fa3ce2","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":{"name":{"text":"Alice","type":"input_text"},"company":{"text":"Dummy
Company","type":"input_text"},"role":{"text":"Geography
expert","type":"input_text"},"tone":{"text":"professional and
helpful","type":"input_text"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":47,"output_tokens":7,"total_tokens":54,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`
2025-11-19 11:48:11 -08:00
Ashwin Bharambe
8852666982
chore: remove dead code from openai_compat utility (#4194)
Removes a bunch of dead code from `openai_compat.py`
2025-11-19 11:23:33 -08:00
Shabana Baig
72ea95e2e0
fix: Fix max_tool_calls for openai provider and add integration tests for the max_tool_calls feat (#4190)
# Problem

OpenAI gpt-4 returned an error when built-in and mcp calls were skipped
due to max_tool_calls parameter. Following is from the server log:
```
RuntimeError: OpenAI response failed: Error code: 400 - {'error': {'message': "An assistant message with       
'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids  
did not have response messages: call_Yi9V1QNpN73dJCAgP2Arcjej", 'type': 'invalid_request_error', 'param':      
'messages', 'code': None}}
```

# What does this PR do?

- Fixes error returned by openai/gpt when calls were skipped due to
max_tool_calls. We now return a tool message that explicitly mentions
that the call is skipped.
- Adds integration tests as a follow-up to
PR#[4062](https://github.com/llamastack/llama-stack/pull/4062)

<!-- If resolving an issue, uncomment and update the line below -->
Part 2 for issue
#[3563](https://github.com/llamastack/llama-stack/issues/3563)

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- Added integration tests
- Added new recordings

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-19 10:27:56 -08:00
Anik
4e9633f7c3
feat: Make Safety API an optional dependency for meta-reference agents provider (#4169)
# What does this PR do?

Change Safety API from required to optional dependency, following the
established pattern used for other optional dependencies in Llama Stack.
    
The provider now starts successfully without Safety API configured.
Requests that explicitly include guardrails will receive a clear error
message when Safety API is unavailable.
    
This enables local development and testing without Safety API while
maintaining clear error messages when guardrail features are requested.
    
Closes #4165
    
Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

1. New unit tests added in
`tests/unit/providers/agents/meta_reference/test_safety_optional.py`

2. Integration tests performed with the files in
https://gist.github.com/anik120/c33cef497ec7085e1fe2164e0705b8d6

 (i) test with `test_integration_no_safety_fail.yaml`:
 
Config WITHOUT Safety API, should fail with helpful error since
`required_safety_api` is `true` by default
```
$ uv run llama stack run test_integration_no_safety_fail.yaml 2>&1 | grep -B 5 -A 15 "ValueError.*Safety\|Safety API is 
  required"
File "/Users/anbhatta/go/src/github.com/llamastack/llama-stack/src/llama_stack/providers/inline/agents/meta_reference
  /__init__.py", line 27, in get_provider_impl
      raise ValueError(
      ...<9 lines>...
      )
  ValueError: Safety API is required but not configured.

  To run without safety checks, explicitly set in your configuration:
    providers:
      agents:
        - provider_id: meta-reference
          provider_type: inline::meta-reference
          config:
            require_safety_api: false

  Warning: This disables all safety guardrails for this agents provider.
```

(ii) test with `test_integration_no_safety_works.yaml`

Config WITHOUT Safety API, **but** `require_safety_api=false` is
explicitly set, should succeed

```
$ uv run llama stack run test_integration_no_safety_works.yaml
 INFO     2025-11-16 09:49:10,044 llama_stack.cli.stack.run:169 cli: Using run configuration:                           
   
           /Users/anbhatta/go/src/github.com/llamastack/llama-stack/test_integration_no_safety_works.yaml                
   
  INFO     2025-11-16 09:49:10,052 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates:

             Key: None

             Cert: None

  .
  .
  .
  INFO     2025-11-16 09:49:38,528 llama_stack.core.stack:495 core: starting registry refresh task

  INFO     2025-11-16 09:49:38,534 uvicorn.error:62 uncategorized: Application startup complete.

  INFO     2025-11-16 09:49:38,535 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C
```


Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
2025-11-19 10:04:24 -08:00
Charlie Doern
d5cd0eea14
feat!: standardize base_url for inference (#4177)
# What does this PR do?

Completes #3732 by removing runtime URL transformations and requiring
users to provide full URLs in configuration. All providers now use
'base_url' consistently and respect the exact URL provided without
appending paths like /v1 or /openai/v1 at runtime.

BREAKING CHANGE: Users must update configs to include full URL paths
(e.g., http://localhost:11434/v1 instead of http://localhost:11434).

Closes #3732 

## Test Plan

Existing tests should pass even with the URL changes, due to default
URLs being altered.

Add unit test to enforce URL standardization across remote inference
providers (verifies all use 'base_url' field with HttpUrl | None type)

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-19 08:44:28 -08:00
Ashwin Bharambe
bd5ad2963e
refactor(storage): make { kvstore, sqlstore } as llama stack "internal" APIs (#4181)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 7s
Test llama stack list-deps / show-single-provider (push) Successful in 28s
Test llama stack list-deps / list-deps-from-config (push) Successful in 33s
Test External API and Providers / test-external (venv) (push) Failing after 33s
Vector IO Integration Tests / test-matrix (push) Failing after 43s
Test llama stack list-deps / list-deps (push) Failing after 34s
Test Llama Stack Build / build-single-provider (push) Successful in 46s
Test Llama Stack Build / build (push) Successful in 55s
UI Tests / ui-tests (22) (push) Successful in 1m17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m32s
Unit Tests / unit-tests (3.13) (push) Failing after 2m12s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m21s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m46s
Pre-commit / pre-commit (push) Successful in 3m7s
These primitives (used both by the Stack as well as provider
implementations) can be thought of fruitfully as internal-only APIs
which can themselves have multiple implementations. We use the new
`llama_stack_api.internal` namespace for this.

In addition: the change moves kv/sql store impls, configs, and
dependency helpers under `core/storage`

## Testing

`pytest tests/unit/utils/test_authorized_sqlstore.py`, other existing CI
2025-11-18 13:15:16 -08:00
Anastas Stoyanovsky
a3580e6bc0
feat!: Wire through parallel_tool_calls to Responses API (#4124)
# What does this PR do?
Initial PR against #4123
Adds `parallel_tool_calls` spec to Responses API and basic initial
implementation where no more than one function call is generated when
set to `False`.

## Test Plan
* Unit tests have been added to verify no more than one function call is
generated.
* A followup PR will verify passing through `parallel_tool_calls` to
providers.
* A followup PR will address verification and/or implementation of
incremental function calling across multiple conversational turns.

---------

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>
2025-11-18 11:25:08 -08:00
Omar Abdelwahab
fe91d331ef
fix: Remove authorization from provider data (#4161)
# What does this PR do?
- Remove backward compatibility for authorization in mcp_headers
- Enforce authorization must use dedicated parameter  
- Add validation error if Authorization found in provider_data headers
- Update test_mcp.py to use authorization parameter
- Update test_mcp_json_schema.py to use authorization parameter
- Update test_tools_with_schemas.py to use authorization parameter
- Update documentation to show the change in the authorization approach

Breaking Change:
- Authorization can no longer be passed via mcp_headers in provider_data
- Users must use the dedicated 'authorization' parameter instead
- Clear error message guides users to the new approach"

## Test Plan
CI

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-17 12:16:35 -08:00
slekkala1
f596f850bf
fix: Propagate the runtime error message to user (#4150)
# What does this PR do?
For Runtime Exception the error is not propagated to the user and can be
opaque.
Before fix:
`ERROR - Error processing message: Error code: 500 - {'detail':
'Internal server error: An unexpected error occurred.'}
`
After fix:
`[ERROR] Error code: 404 - {'detail': "Model
'claude-sonnet-4-5-20250929' not found. Use 'client.models.list()' to
list available Models."}
`

(Ran into this few times, while working with OCI + LLAMAStack and Sabre:
Agentic framework integrations with LLAMAStack)

## Test Plan
CI
2025-11-14 13:14:49 -08:00
Omar Abdelwahab
eb545034ab
fix: MCP authorization parameter implementation (#4052)
# What does this PR do?
Adding a user-facing `authorization ` parameter to MCP tool definitions
that allows users to explicitly configure credentials per MCP server,
addressing GitHub Issue #4034 in a secure manner.


## Test Plan
tests/integration/responses/test_mcp_authentication.py

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-14 08:54:42 -08:00
Charlie Doern
a078f089d9
fix: rename llama_stack_api dir (#4155)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test llama stack list-deps / generate-matrix (push) Successful in 29s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / list-deps-from-config (push) Successful in 32s
UI Tests / ui-tests (22) (push) Successful in 39s
Test Llama Stack Build / build (push) Successful in 39s
Test llama stack list-deps / show-single-provider (push) Successful in 46s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test External API and Providers / test-external (venv) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test llama stack list-deps / list-deps (push) Failing after 47s
Unit Tests / unit-tests (3.12) (push) Failing after 1m42s
Unit Tests / unit-tests (3.13) (push) Failing after 1m55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
Pre-commit / pre-commit (push) Successful in 5m17s
# What does this PR do?

the directory structure was src/llama-stack-api/llama_stack_api

instead it should just be src/llama_stack_api to match the other
packages.

update the structure and pyproject/linting config

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-13 15:04:36 -08:00
Charlie Doern
840ad75fe9
feat: split API and provider specs into separate llama-stack-api pkg (#3895)
# What does this PR do?

Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.


see: https://github.com/llamastack/llama-stack/pull/2978 and
https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942

Motivation

External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:

- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently

This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.

Changes

- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages

Next Steps

- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests


Pre-cursor PRs to this one:

- #4093 
- #3954 
- #4064 

These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.


relates to #3237 

## Test Plan

Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-13 11:51:17 -08:00
Ashwin Bharambe
fcf649b97a
feat(storage): share sql/kv instances and add upsert support (#4140)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 2s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test Llama Stack Build / build-single-provider (push) Successful in 31s
Test External API and Providers / test-external (venv) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (push) Failing after 45s
Test Llama Stack Build / build (push) Successful in 47s
UI Tests / ui-tests (22) (push) Successful in 1m42s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m8s
Unit Tests / unit-tests (3.13) (push) Failing after 2m7s
Unit Tests / unit-tests (3.12) (push) Failing after 2m28s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m32s
Pre-commit / pre-commit (push) Successful in 3m20s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m33s
A few changes to the storage layer to ensure we reduce unnecessary
contention arising out of our design choices (and letting the database
layer do its correct thing):

- SQL stores now share a single `SqlAlchemySqlStoreImpl` per backend,
and `kvstore_impl` caches instances per `(backend, namespace)`. This
avoids spawning multiple SQLite connections for the same file, reducing
lock contention and aligning the cache story for all backends.

- Added an async upsert API (with SQLite/Postgres dialect inserts) and
routed it through `AuthorizedSqlStore`, then switched conversations and
responses to call it. Using native `ON CONFLICT DO UPDATE` eliminates
the insert-then-update retry window that previously caused long WAL lock
retries.

### Test Plan

Existing tests, added a unit test for `upsert()`
2025-11-12 12:14:26 -08:00
Ashwin Bharambe
492f79ca9b
fix: harden storage semantics (#4118)
Fixes issues in the storage system by guaranteeing immediate durability
for responses and ensuring background writers stay alive. Three related
fixes:

* Responses to the OpenAI-compatible API now write directly to
Postgres/SQLite inside the request instead of detouring through an async
queue that might never drain; this restores the expected
read-after-write behavior and removes the "response not found" races
reported by users.

* The access-control shim was stamping owner_principal/access_attributes
as SQL NULL, which Postgres interprets as non-public rows; fixing it to
use the empty-string/JSON-null pattern means conversations and responses
stored without an authenticated user stay queryable (matching SQLite).

* The inference-store queue remains for batching, but its worker tasks
now start lazily on the live event loop so server startup doesn't cancel
them—writes keep flowing even when the stack is launched via llama stack
run.

Closes #4115 

### Test Plan

Added a matrix entry to test our "base" suite against Postgres as the
store.
2025-11-12 10:35:39 -08:00
Francisco Arceo
eb3f9ac278
feat: allow returning embeddings and metadata from /vector_stores/ methods; disallow changing Provider ID (#4046)
# What does this PR do?

- Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to
allow returning `embeddings` and `metadata` using the `extra_query`
    -  Updates the UI accordingly to display them.

- Update UI to support CRUD operations in the Vector Stores section and
adds a new modal exposing the functionality.

- Updates Vector Store update to fail if a user tries to update Provider
ID (which doesn't make sense to allow)

```python
In  [1]: client.vector_stores.files.content(
    vector_store_id=vector_store.id, 
    file_id=file.id, 
    extra_query={"include_embeddings": True, "include_metadata": True}
)
Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic
-ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt')
```

Screenshots of UI are displayed below:

### List Vector Store with Added "Create New Vector Store"
<img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47
25 PM"
src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3"
/>

### Create New Vector Store
<img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47
49 PM"
src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158"
/>

### Edit Vector Store
<img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48
32 PM"
src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414"
/>


### Vector Store Files Contents page (with Embeddings)
<img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54
32 PM"
src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27"
/>

### Vector Store Files Contents Details page (with Embeddings)
<img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55
00 PM"
src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Tests added for Middleware extension and Provider failures.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-12 09:59:48 -08:00
Charlie Doern
43adc23ef6
refactor: remove dead inference API code and clean up imports (#4093)
# What does this PR do?

Delete ~2,000 lines of dead code from the old bespoke inference API that
was replaced by OpenAI-only API. This includes removing unused type
conversion functions, dead provider methods, and event_logger.py.

Clean up imports across the codebase to remove references to deleted
types. This eliminates unnecessary
code and dependencies, helping isolate the API package as a
self-contained module.

This is the last interdependency between the .api package and "exterior"
packages, meaning that now every other package in llama stack imports
the API, not the other way around.

## Test Plan

this is a structural change, no tests needed.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-10 15:29:24 -08:00
Shabana Baig
433438cfc0
feat: Implement the 'max_tool_calls' parameter for the Responses API (#4062)
# Problem
Responses API uses max_tool_calls parameter to limit the number of tool
calls that can be generated in a response. Currently, LLS implementation
of the Responses API does not support this parameter.

# What does this PR do?
This pull request adds the max_tool_calls field to the response object
definition and updates the inline provider. it also ensures that:

- the total number of calls to built-in and mcp tools do not exceed
max_tool_calls
- an error is thrown if max_tool_calls < 1 (behavior seen with the
OpenAI Responses API, but we can change this if needed)

Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563)

## Test Plan
- Tested manually for change in model response w.r.t supplied
max_tool_calls field.
- Added integration tests to test invalid max_tool_calls parameter.
- Added integration tests to check max_tool_calls parameter with
built-in and function tools.
- Added integration tests to check max_tool_calls parameter in the
returned response object.
- Recorded OpenAI Responses API behavior using a sample script:
https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-10 13:21:27 -08:00
Dennis Kennetz
209a78b618
feat: add oci genai service as chat inference provider (#3876)
# What does this PR do?
Adds OCI GenAI PaaS models for openai chat completion endpoints.

## Test Plan
In an OCI tenancy with access to GenAI PaaS, perform the following
steps:

1. Ensure you have IAM policies in place to use service (check docs
included in this PR)
2. For local development, [setup OCI
cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm)
and configure the CLI with your region, tenancy, and auth
[here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm)
3. Once configured, go through llama-stack setup and run llama-stack
(uses config based auth) like:
```bash
OCI_AUTH_TYPE=config_file \
OCI_CLI_PROFILE=CHICAGO \
OCI_REGION=us-chicago-1 \
OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \
llama stack run oci
```
4. Hit the `models` endpoint to list models after server is running:
```bash
curl http://localhost:8321/v1/models | jq
...
{
      "identifier": "meta.llama-4-scout-17b-16e-instruct",
      "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q",
      "provider_id": "oci",
      "type": "model",
      "metadata": {
        "display_name": "meta.llama-4-scout-17b-16e-instruct",
        "capabilities": [
          "CHAT"
        ],
        "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q"
      },
      "model_type": "llm"
},
   ...
```
5. Use the "display_name" field to use the model in a
`/chat/completions` request:
```bash
# Streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": true,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'

# Non-streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": false,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'
```
6. Try out other models from the `/models` endpoint.
2025-11-10 16:16:24 -05:00
ehhuang
d4ecbfd092
fix(vector store)!: fix file content API (#4105)
# What does this PR do?
- changed to match
https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml

## Test Plan
updated test CI
2025-11-10 10:16:35 -08:00
Juan Pérez de Algaba
6147321083
fix: Vector store persistence across server restarts (#3977)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Integration Tests (Replay) / generate-matrix (push) Successful in 21s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
Pre-commit / pre-commit (push) Failing after 23s
Test External API and Providers / test-external (venv) (push) Failing after 22s
API Conformance Tests / check-schema-compatibility (push) Successful in 30s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s
UI Tests / ui-tests (22) (push) Successful in 1m10s
# What does this PR do?

This PR fixes a bug in LlamaStack 0.3.0 where vector stores created via
the OpenAI-compatible API (`POST /v1/vector_stores`) would fail with
`VectorStoreNotFoundError` after server restart when attempting
operations like `vector_io.insert()` or `vector_io.query()`.

The bug affected **6 vector IO providers**: `pgvector`, `sqlite_vec`,
`chroma`, `milvus`, `qdrant`, and `weaviate`.

Created with the assistance of: claude-4.5-sonnet

## Root Cause

All affected providers had a broken
`_get_and_cache_vector_store_index()` method that:
1. Did not load existing vector stores from persistent storage during
initialization
2. Attempted to use `vector_store_table` (which was either `None` or a
`KVStore` without the required `get_vector_store()` method)
3. Could not reload vector stores after server restart or cache miss

## Solution

This PR implements a consistent pattern across all 6 providers:

1. **Load vector stores during initialization** - Pre-populate the cache
from KV store on startup
2. **Fix lazy loading** - Modified `_get_and_cache_vector_store_index()`
to load directly from KV store instead of relying on
`vector_store_table`
3. **Remove broken dependency** - Eliminated reliance on the
`vector_store_table` pattern

## Testing steps

### 1.1 Configure the stack

Create or use an existing configuration with a vector IO provider.

**Example `run.yaml`:**

```yaml
vector_io_store:
  - provider_id: pgvector
    provider_type: remote::pgvector
    config:
      host: localhost
      port: 5432
      db: llamastack
      user: llamastack
      password: llamastack

inference:
  - provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
    config:
      model: sentence-transformers/all-MiniLM-L6-v2
```

### 1.2 Start the server

```bash
llama stack run run.yaml --port 5000
```

Wait for the server to fully start. You should see:

```
INFO: Started server process
INFO: Application startup complete
```

---

## Step 2: Create a Vector Store

### 2.1 Create via API

```bash
curl -X POST http://localhost:5000/v1/vector_stores \
  -H "Content-Type: application/json" \
  -d '{
    "name": "test-persistence-store",
    "extra_body": {
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "embedding_dimension": 384,
      "provider_id": "pgvector"
    }
  }' | jq
```

### 2.2 Expected Response

```json
{
  "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
  "object": "vector_store",
  "name": "test-persistence-store",
  "status": "completed",
  "created_at": 1730304000,
  "file_counts": {
    "total": 0,
    "completed": 0,
    "in_progress": 0,
    "failed": 0,
    "cancelled": 0
  },
  "usage_bytes": 0
}
```

**Save the `id` field** (e.g.,
`vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d`) — you’ll need it for the next
steps.

---

## Step 3: Insert Data (Before Restart)

### 3.1 Insert chunks into the vector store

```bash
export VS_ID="vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"

curl -X POST http://localhost:5000/vector-io/insert \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"chunks\": [
      {
        \"content\": \"Python is a high-level programming language known for its readability.\",
        \"metadata\": {\"source\": \"doc1\", \"page\": 1}
      },
      {
        \"content\": \"Machine learning enables computers to learn from data without explicit programming.\",
        \"metadata\": {\"source\": \"doc2\", \"page\": 1}
      },
      {
        \"content\": \"Neural networks are inspired by biological neurons in the brain.\",
        \"metadata\": {\"source\": \"doc3\", \"page\": 1}
      }
    ]
  }"
```

### 3.2 Expected Response

Status: **200 OK**  
Response: *Empty or success confirmation*

---

## Step 4: Query Data (Before Restart – Baseline)

### 4.1 Query the vector store

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"What is machine learning?\"
  }" | jq
```

### 4.2 Expected Response

```json
{
  "chunks": [
    {
      "content": "Machine learning enables computers to learn from data without explicit programming.",
      "metadata": {"source": "doc2", "page": 1}
    },
    {
      "content": "Neural networks are inspired by biological neurons in the brain.",
      "metadata": {"source": "doc3", "page": 1}
    }
  ],
  "scores": [0.85, 0.72]
}
```

**Checkpoint:** Works correctly before restart.

---

## Step 5: Restart the Server (Critical Test)

### 5.1 Stop the server

In the terminal where it’s running:

```
Ctrl + C
```

Wait for:

```
Shutting down...
```

### 5.2 Restart the server

```bash
llama stack run run.yaml --port 5000
```

Wait for:

```
INFO: Started server process
INFO: Application startup complete
```

The vector store cache is now empty, but data should persist.

---

## Step 6: Verify Vector Store Exists (After Restart)

### 6.1 List vector stores

```bash
curl http://localhost:5000/v1/vector_stores | jq
```

### 6.2 Expected Response

```json
{
  "object": "list",
  "data": [
    {
      "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
      "name": "test-persistence-store",
      "status": "completed"
    }
  ]
}
```

**Checkpoint:** Vector store should be listed.

---

## Step 7: Insert Data (After Restart – THE BUG TEST)

### 7.1 Insert new chunks

```bash
curl -X POST http://localhost:5000/vector-io/insert \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"chunks\": [
      {
        \"content\": \"This chunk was inserted AFTER the server restart.\",
        \"metadata\": {\"source\": \"post-restart\", \"test\": true}
      }
    ]
  }"
```

### 7.2 Expected Results

**With Fix (Correct):**
```
Status: 200 OK
Response: Success
```

 **Without Fix (Bug):**
```json
{
  "detail": "VectorStoreNotFoundError: Vector Store 'vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d' not found."
}
```

 **Critical Test:** If insertion succeeds, the fix works.

---

## Step 8: Query Data (After Restart – Verification)

### 8.1 Query all data

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"restart\"
  }" | jq
```

### 8.2 Expected Response

```json
{
  "chunks": [
    {
      "content": "This chunk was inserted AFTER the server restart.",
      "metadata": {"source": "post-restart", "test": true}
    }
  ],
  "scores": [0.95]
}
```

**Checkpoint:** Both old and new data are queryable.

---

## Step 9: Multiple Restart Test (Extra Verification)

### 9.1 Restart again

```bash
Ctrl + C
llama stack run run.yaml --port 5000
```

### 9.2 Query after restart

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"programming\"
  }" | jq
```

**Expected:** Works correctly across multiple restarts.

---------

Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-11-09 00:05:00 -05:00
Ashwin Bharambe
f49cb0b717
chore: Stack server no longer depends on llama-stack-client (#4094)
This dependency has been bothering folks for a long time (cc @leseb). We
really needed it due to "library client" which is primarily used for our
tests and is not a part of the Stack server. Anyone who needs to use the
library client can certainly install `llama-stack-client` in their
environment to make that work.

Updated the notebook references to install `llama-stack-client`
additionally when setting things up.
2025-11-07 09:54:09 -08:00
Lê Nam Khánh
68c976a2d8
docs: fix typos in some files (#4101)
This PR fixes typos in the file file using codespell.
2025-11-07 16:07:46 +01:00
Sumanth Kamenani
e894e36eea
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.

Closes: #3410


## What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.

## Test Plan

**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format

# Bedrock OpenAI-Compatible Provider - Test Results


**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`


---

## Test 1: Model Listing

**Request:**
```http
GET /v1/models HTTP/1.1
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}
```

---

## Test 2: Non-Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```

---

## Test 3: Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```

---

## Test 4: Error Handling - Invalid Model

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```

---

## Test 5: Multi-Turn Conversation

**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}
```

**Response 1:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}
```

**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}
```

**Response 2:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```

**Context retained across turns**

---

## Test 6: System Messages

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}
```


---

## Test 7: Tool Calling

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}
```

---

## Test 8: Sampling Parameters

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}
```

---

## Test 9: Authentication Error Handling

### Subtest A: Invalid API Key

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```

---

### Subtest B: Empty API Key (Fallback to Config)

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}
```

 **Fell back to config key**

---

### Subtest C: Malformed Token

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
2025-11-06 17:18:18 -08:00
Ashwin Bharambe
bef1b044bd
refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 48s
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.

A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models

## Test Plan

Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581

Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']

Using LLM model: passthrough/ollama/llama3.2-vision:11b

Making inference request...

Response: 4.

--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
2025-11-05 18:15:11 -08:00
ehhuang
b335419faa
fix: actualize chunking strategy in vector store create API (#4086)
# What does this PR do?

- when create vector store is called without chunk strategy, we actually
the strategy used so that the value is persisted instead of
strategy='None'

## Test Plan
updated tests
2025-11-05 15:47:54 -08:00
ehhuang
9d5c34af27
fix!: BREAKING CHANGE: vector_store: search API response fix (#4080)
# What does this PR do?
- search_query in the vector store search API should be a list,
according to https://github.com/openai/openai-openapi


## Test Plan
modified tests


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080).
* #4086
* __->__ #4080
2025-11-05 15:01:48 -08:00
ehhuang
84a84ee85c
fix: last_id when listing files in vector store (#4079)
# What does this PR do?
the last_id should be the id of the last item in the returned list, not
the unfiltered list.

## Test Plan
fixed test
2025-11-05 14:10:10 -08:00
Wojciech-Rebisz
07c28cd519
fix: Avoid model_limits KeyError (#4060)
# What does this PR do?
It avoids model_limit KeyError while trying to get embedding models for
Watsonx

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/llamastack/llama-stack/issues/4059

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Start server with watsonx distro:
```bash
llama stack list-deps watsonx | xargs -L1 uv pip install
uv run llama stack run watsonx
```
Run 
```python
client = LlamaStackClient(base_url=base_url)
client.models.list()
```
Check if there is any embedding model available (currently there is not
a single one)
2025-11-05 10:34:40 -08:00
Ashwin Bharambe
0c49a53c97
chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067)
RAG aka file search is implemented via the Responses API by specifying
the file-search tool. The backend implementation remains unchanged. This
PR merely removes the directly exposed API surface which allowed users
to directly perform searches from the client.

This facility is now available via the `client.vector_store.search()`
OpenAI compatible API.
2025-11-04 14:50:54 -08:00
Ashwin Bharambe
a8a8aa56c0
chore!: remove the agents (sessions and turns) API (#4055)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m10s
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed

The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.

Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
2025-11-04 09:38:39 -08:00
Nathan Weinberg
62b3ad349a
fix: return to hardcoded model IDs for Vertex AI (#4041)
# What does this PR do?
partial revert of b67aef2

Vertex AI doesn't offer an endpoint for listing models from Google's
Model Garden

Return to hardcoded values until such an endpoint is available

Closes #3988 

## Test Plan
Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`,
`VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the
starter distribution
```bash
$ llama stack list-deps starter | xargs -L1 uv pip install
$ llama stack run starter
```

Client side, formerly broken cURL requests now working
```bash
$ curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.provider_id == "vertexai"))'
[
  {
    "identifier": "vertexai/vertex_ai/gemini-2.0-flash",
    "provider_resource_id": "vertex_ai/gemini-2.0-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-flash",
    "provider_resource_id": "vertex_ai/gemini-2.5-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-pro",
    "provider_resource_id": "vertex_ai/gemini-2.5-pro",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  }
]
$ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a
i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" | jq 
{                                                                                                                                    
  "id": "p8oIaYiQF8_PptQPo-GH8QQ",                                                                                                   
  "choices": [                                                                                                                       
    {                                                                                                                                
      "finish_reason": "stop",                                                                                                       
      "index": 0,                                                                                                                    
      "logprobs": null,                                                                                                              
      "message": {                                                                                                                   
        "content": "Hello there! How can I help you today?",                                                                         
        "refusal": null,                                                                                                             
        "role": "assistant",                                                                                                         
        "annotations": null,                                                                                                         
        "audio": null,                                                                                                               
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
...
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-03 17:38:16 -08:00
Ashwin Bharambe
2381714904
fix: enable SQLite WAL mode to prevent database locking errors (#4048)
Fixes race condition causing "database is locked" errors during
concurrent writes to SQLite, particularly in streaming responses with
guardrails where multiple inference calls write simultaneously.

Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple
concurrent readers and one writer without blocking. Set busy_timeout to
5s so SQLite retries instead of failing immediately. Remove the logic
that disabled write queues for SQLite since WAL mode eliminates the
locking issues that prompted disabling them.

Fixes: test_output_safety_guardrails_safe_content[stream=True] flake
2025-11-03 15:27:41 -08:00
Matthew Farrellee
1263448de2
fix: allowed_models config did not filter models (#4030)
# What does this PR do?

closes #4022 

## Test Plan

ci w/ new tests

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:43:39 -08:00
Sébastien Han
3dbff6bf3f
fix: help mypy & fix precommit on main (#4037)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 21s
UI Tests / ui-tests (22) (push) Successful in 1m15s
# What does this PR do?

Add type to help mypy figure out.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 05:39:50 -05:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
Ashwin Bharambe
174ef162b3
fix(mypy): add fast and full mypy modes (#3975)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test llama stack list-deps / generate-matrix (push) Successful in 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Test Llama Stack Build / build (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
UI Tests / ui-tests (22) (push) Successful in 38s
`mypy` became very slow for the common path. This can make local
pre-commit runs very slow. Let's restore that.

- restore fast mirrors-mypy hook for local runs
- add optional mypy-full hook and docs so devs can match CI
- run full mypy in CI with a hint when failures occur

### Test Plan
- uv run pre-commit run mypy --all-files
- uv run pre-commit run mypy-full --hook-stage manual --all-files
- uv run --group dev --group type_checking mypy
2025-10-29 19:02:32 -07:00
Charlie Doern
e8ecc99524
fix!: remove chunk_id property from Chunk class (#3954)
# What does this PR do?

chunk_id in the Chunk class executes actual logic to compute a chunk ID.
This sort of logic should not live in the API spec.

Instead, the providers should be in charge of calling generate_chunk_id,
and pass it to `Chunk`.

this removes the incorrect dependency between Provider impl and API impl

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-29 18:59:59 -07:00
Ashwin Bharambe
da8f014b96
feat(models): list models available via provider_data header (#3968)
## Summary

When users provide API keys via `X-LlamaStack-Provider-Data` header,
`models.list()` now returns models they can access from those providers,
not just pre-registered models from the registry.

This complements the routing fix from f88416ef8 which enabled inference
calls with `provider_id/model_id` format for unregistered models. Users
can now discover which models are available to them before making
inference requests.

The implementation reuses
`NeedsRequestProviderData.get_request_provider_data()` to validate
credentials, then dynamically fetches models from providers without
caching them since they're user-specific. Registry models take
precedence to respect any pre-configured aliases.

## Test Script

```python
#!/usr/bin/env python3
import json
import os
from openai import OpenAI

# Test 1: Without provider_data header
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy")
models = client.models.list()
anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id]
print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic")

# Test 2: With provider_data header containing Anthropic API key
anthropic_api_key = os.environ["ANTHROPIC_API_KEY"]
client_with_key = OpenAI(
    base_url="http://localhost:8321/v1/openai/v1",
    api_key="dummy",
    default_headers={
        "X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key})
    }
)
models_with_key = client_with_key.models.list()
anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id]
print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic")
print(f"Anthropic models: {anthropic_with}")

assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key"
print("\n✓ Test passed!")
```

Run with a stack that has Anthropic provider configured (but without API
key in config):
```bash
ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py
```
2025-10-29 14:03:03 -07:00
Ashwin Bharambe
c9d4b6c54f
chore(mypy): part-04 resolve mypy errors in meta_reference agents (#3969)
## Summary
Fixes all mypy type errors in `providers/inline/agents/meta_reference/`
and removes exclusions from pyproject.toml.

## Changes
- Fix type annotations for Safety API message parameters
(OpenAIMessageParam)
- Add Action enum usage in access control checks
- Correct method signatures to match API supertype (parameter ordering)
- Handle optional return types with proper None checks
- Remove 3 meta_reference exclusions from mypy config

**Files fixed:** 25 errors across 3 files (safety.py, persistence.py,
agents.py)
2025-10-29 13:37:28 -07:00
Ashwin Bharambe
a4f97559d1
fix(mypy): part-03 completely resolve meta reference responses impl typing issues (#3951)
## Summary
Resolves all mypy errors in meta reference agent OpenAI responses
implementation by adding proper type narrowing, None checks, and
Sequence type support.

## Changes
- Fixed streaming.py, openai_responses.py, utils.py, tool_executor.py,
agent_instance.py
- Added Sequence type support to schema generator (ensures correct JSON
schema generation)
- Applied union type narrowing and None checks throughout

## Test plan
- All modified files pass mypy type checking (0 errors)
- Schema generator produces correct `type: array` for Sequence types

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 08:07:15 -07:00
Ashwin Bharambe
e5c27dbcbf
fix(mypy): part-02 resolve OpenAI compatibility layer type issues (#3947)
## Summary

Fixes 111 mypy type errors in OpenAI compatibility layer (PR3 in mypy
remediation series).

**Changes:**
- `litellm_openai_mixin.py`: Added type annotations, None checks for
tool_config/model_store access
- `openai_compat.py`: Added None checks throughout, fixed TypedDict
expansions, proper type conversions for messages/tool_calls

**Result:** 23 → 1 errors in litellm file, 88 → 0 errors in
openai_compat file

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 08:06:40 -07:00
Ashwin Bharambe
ce31aa1704
fix(mypy-cleanup): part-01 resolve meta reference agent type issues (126 errors) (#3945)
Error fixes in Agents implementation (`meta-reference` provider) --
adding proper type annotations and using type narrowing for optional
attributes. Essentially a bunch of `if x and x_foo := getattr(x, "foo")`
instead of `x.foo` directly

Part of ongoing mypy remediation effort.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 07:54:30 -07:00
ehhuang
1f9d48cd54
feat: openai files provider (#3946)
# What does this PR do?
- Adds OpenAI files provider 
- Note that file content retrieval is pretty limited by `purpose`
https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com

## Test Plan
Modify run yaml to use openai files provider:
```
  files:
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY:=}
      metadata_store:
        backend: sql_default
        table_name: openai_files_metadata

# Then run files tests
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files
```
2025-10-28 16:25:03 -07:00
Ashwin Bharambe
f88416ef87
fix(inference): enable routing of models with provider_data alone (#3928)
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.

Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.

Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.

Also, updated inference router so that the responses have the _exact_
model that the request had.

## Test Plan

Added an integration test

Closes #3929

---------

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-10-28 11:16:37 -07:00
Ashwin Bharambe
94b0592240
fix(mypy): add type stubs and fix typing issues (#3938)
Adds type stubs and fixes mypy errors for better type coverage.

Changes:
- Added type_checking dependency group with type stubs (torchtune, trl,
etc.)
- Added lm-format-enforcer to pre-commit hook
- Created HFAutoModel Protocol for type-safe HuggingFace model handling
- Added mypy.overrides for untyped libraries (torchtune, fairscale,
etc.)
- Fixed type issues in post-training providers, databricks, and
api_recorder

Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude
list).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 11:00:09 -07:00
Ashwin Bharambe
1d385b5b75
fix(mypy): resolve OpenAI SDK and provider type issues (#3936)
## Summary
- Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls
- Fix incorrect OpenAIChatCompletionChunk import in vllm provider
- Refactor to avoid type:ignore comments by using conditional kwargs

## Changes
**openai_mixin.py (9 errors fixed):**
- Build kwargs conditionally for embeddings.create() to avoid
NotGiven/Omit mismatch
- Only include parameters when they have actual values (not None)

**gemini.py (9 errors fixed):**
- Apply same conditional kwargs pattern
- Add missing Any import

**vllm.py (2 errors fixed):**
- Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference
- Remove incorrect alias from openai package

## Technical Notes
The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type
`NotGiven` but parameter signatures expect `Omit`. By only passing
parameters with actual values, we avoid this mismatch entirely without
needing `# type: ignore` comments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:54:29 -07:00
Ashwin Bharambe
d009dc29f7
fix(mypy): resolve provider utility and testing type issues (#3935)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 51s
Pre-commit / pre-commit (push) Successful in 2m0s
Fixes mypy type errors in provider utilities and testing infrastructure:
- `mcp.py`: Cast incompatible client types, wrap image data properly
- `batches.py`: Rename walrus variable to avoid shadowing
- `api_recorder.py`: Use cast for Pydantic field annotation

No functional changes.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:37:27 -07:00