Commit graph

941 commits

Author SHA1 Message Date
Sébastien Han
30cab02083
chore: refactor Batches protocol to use request models
This commit refactors the Batches protocol to use Pydantic request
models for both create_batch and list_batches methods, improving
consistency, readability, and maintainability.

- create_batch now accepts a single CreateBatchRequest parameter instead
  of individual arguments. This aligns the protocol with FastAPI’s
  request model pattern, allowing the router to pass the request object
  directly without unpacking parameters. Provider implementations now
  access fields via request.input_file_id, request.endpoint, etc.

- list_batches now accepts a single ListBatchesRequest parameter,
  replacing individual query parameters. The model includes after and
  limit fields with proper OpenAPI descriptions. FastAPI automatically
  parses query parameters into the model for GET requests, keeping
  router code clean. Provider implementations access fields via
  request.after and request.limit.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-20 16:00:34 +01:00
Sébastien Han
eb3cab1eec
feat: Implement FastAPI router system
This commit introduces a new FastAPI router-based system for defining
API endpoints, enabling a migration path away from the legacy @webmethod
decorator system. The implementation includes router infrastructure,
migration of the Batches API as the first example, and updates
to server, OpenAPI generation, and inspection systems to
support both routing approaches.

The router infrastructure consists of a router registry system
that allows APIs to register FastAPI router factories, which
are then automatically discovered and included in the server
application. Standard error responses are centralized in
router_utils to ensure consistent OpenAPI specification
generation with proper $ref references to component responses.

The Batches API has been migrated to demonstrate the new
pattern. The protocol definition and models remain in
llama_stack_api/batches, maintaining clear separation between
API contracts and server implementation. The FastAPI router
implementation lives in
llama_stack/core/server/routers/batches, following the
established pattern where API contracts are defined in
llama_stack_api and server routing logic lives in
llama_stack/core/server.

The server now checks for registered routers before falling
back to the legacy webmethod-based route discovery, ensuring
backward compatibility during the migration period. The
OpenAPI generator has been updated to handle both router-based
and webmethod-based routes, correctly extracting metadata from
FastAPI route decorators and Pydantic Field descriptions. The
inspect endpoint now includes routes from both systems, with
proper filtering for deprecated routes and API levels.

Response descriptions are now explicitly defined in router decorators,
ensuring the generated OpenAPI specification matches the
previous format. Error responses use $ref references to
component responses (BadRequest400, TooManyRequests429, etc.)
as required by the specification. This is neat and will allow us to
remove a lot of boiler plate code from our generator once the
migration is done.

This implementation provides a foundation for incrementally migrating
other APIs to the router system while maintaining full backward
compatibility with existing webmethod-based APIs.

Closes: https://github.com/llamastack/llama-stack/issues/4188
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-19 17:07:24 +01:00
Sébastien Han
97f535c4f1
feat(openapi): switch to fastapi-based generator (#3944)
Some checks failed
Pre-commit / pre-commit (push) Successful in 3m27s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / show-single-provider (push) Successful in 25s
Test External API and Providers / test-external (venv) (push) Failing after 34s
Vector IO Integration Tests / test-matrix (push) Failing after 43s
Test Llama Stack Build / build (push) Successful in 37s
Test Llama Stack Build / build-single-provider (push) Successful in 48s
Test llama stack list-deps / list-deps-from-config (push) Successful in 52s
Test llama stack list-deps / list-deps (push) Failing after 52s
Python Package Build Test / build (3.13) (push) Failing after 1m2s
UI Tests / ui-tests (22) (push) Successful in 1m15s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 1m29s
Unit Tests / unit-tests (3.12) (push) Failing after 1m45s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 1m54s
Unit Tests / unit-tests (3.13) (push) Failing after 2m13s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s
# What does this PR do?
This replaces the legacy "pyopenapi + strong_typing" pipeline with a
FastAPI-backed generator that has an explicit schema registry inside
`llama_stack_api`. The key changes:

1. **New generator architecture.** FastAPI now builds the OpenAPI schema
directly from the real routes, while helper modules
(`schema_collection`, `endpoints`, `schema_transforms`, etc.)
post-process the result. The old pyopenapi stack and its strong_typing
helpers are removed entirely, so we no longer rely on fragile AST
analysis or top-level import side effects.

2. **Schema registry in `llama_stack_api`.** `schema_utils.py` keeps a
`SchemaInfo` record for every `@json_schema_type`, `register_schema`,
and dynamically created request model. The OpenAPI generator and other
tooling query this registry instead of scanning the package tree,
producing deterministic names (e.g., `{MethodName}Request`), capturing
all optional/nullable fields, and making schema discovery testable. A
new unit test covers the registry behavior.

3. **Regenerated specs + CI alignment.** All docs/Stainless specs are
regenerated from the new pipeline, so optional/nullable fields now match
reality (expect the API Conformance workflow to report breaking
changes—this PR establishes the new baseline). The workflow itself is
back to the stock oasdiff invocation so future regressions surface
normally.

*Conformance will be RED on this PR; we choose to accept the
deviations.*

## Test Plan
- `uv run pytest tests/unit/server/test_schema_registry.py`
- `uv run python -m scripts.openapi_generator.main docs/static`

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-14 15:53:53 -08:00
Omar Abdelwahab
eb545034ab
fix: MCP authorization parameter implementation (#4052)
# What does this PR do?
Adding a user-facing `authorization ` parameter to MCP tool definitions
that allows users to explicitly configure credentials per MCP server,
addressing GitHub Issue #4034 in a secure manner.


## Test Plan
tests/integration/responses/test_mcp_authentication.py

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-14 08:54:42 -08:00
Ashwin Bharambe
2441ca9389
fix(api): ensure openapi spec has deprecated routes (#4156)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / generate-matrix (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test External API and Providers / test-external (venv) (push) Failing after 30s
Test llama stack list-deps / list-deps-from-config (push) Successful in 36s
Test Llama Stack Build / build-single-provider (push) Successful in 40s
Test llama stack list-deps / show-single-provider (push) Successful in 48s
Vector IO Integration Tests / test-matrix (push) Failing after 55s
Test Llama Stack Build / build (push) Successful in 48s
UI Tests / ui-tests (22) (push) Successful in 54s
Test llama stack list-deps / list-deps (push) Failing after 1m34s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m6s
Unit Tests / unit-tests (3.13) (push) Failing after 2m38s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m38s
Unit Tests / unit-tests (3.12) (push) Failing after 2m44s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m50s
Pre-commit / pre-commit (push) Successful in 3m51s
Deprecated doesn't mean it's "gone", it just means it is "going away" in
the next major version of the package.
2025-11-13 13:16:02 -08:00
Charlie Doern
840ad75fe9
feat: split API and provider specs into separate llama-stack-api pkg (#3895)
# What does this PR do?

Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.


see: https://github.com/llamastack/llama-stack/pull/2978 and
https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942

Motivation

External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:

- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently

This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.

Changes

- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages

Next Steps

- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests


Pre-cursor PRs to this one:

- #4093 
- #3954 
- #4064 

These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.


relates to #3237 

## Test Plan

Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-13 11:51:17 -08:00
Francisco Arceo
4442b24de7
chore: Fix docs so can be deployed (#4149)
# What does this PR do?
Building/Deploying docs is failing here:
5530320962 (step):8:49

Needs the playground file. Updated it to reflect current admin status.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-13 09:15:32 -08:00
Akram Ben Aissi
9eb81439d2
docs: Add comprehensive Files API and Vector Store integration doc (#3279)
docs: Add comprehensive Files API and Vector Store integration
documentation

- Add Files API documentation with OpenAI-compatible endpoints
- Create comprehensive guide for OpenAI-compatible file operations
- Reorganize documentation structure: move file operations to files/
directory
- Add vector store provider documentation for Milvus, SQLite-vec, FAISS
- Clean up redundant files and improve navigation
- Update cross-references and eliminate documentation duplication
- Support for release 0.2.14 FileResponse and Vector Store API features

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-11-13 08:50:06 -05:00
Derek Higgins
356f37b1ba
docs: clarify model identification uses provider_model_id not model_id (#4128)
Updated documentation to accurately reflect current behavior where
models are identified as provider_id/provider_model_id in the system.

Changes:
o Clarify that model_id is for configuration purposes only o Explain
models are accessed as provider_id/provider_model_id o Remove outdated
aliasing example that suggested model_id could be used
  as a custom identifier

This corrects the documentation which previously suggested model_id
could be used to create friendly aliases, which is not how the code
actually works.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-12 10:13:26 -08:00
Francisco Arceo
eb3f9ac278
feat: allow returning embeddings and metadata from /vector_stores/ methods; disallow changing Provider ID (#4046)
# What does this PR do?

- Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to
allow returning `embeddings` and `metadata` using the `extra_query`
    -  Updates the UI accordingly to display them.

- Update UI to support CRUD operations in the Vector Stores section and
adds a new modal exposing the functionality.

- Updates Vector Store update to fail if a user tries to update Provider
ID (which doesn't make sense to allow)

```python
In  [1]: client.vector_stores.files.content(
    vector_store_id=vector_store.id, 
    file_id=file.id, 
    extra_query={"include_embeddings": True, "include_metadata": True}
)
Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic
-ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt')
```

Screenshots of UI are displayed below:

### List Vector Store with Added "Create New Vector Store"
<img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47
25 PM"
src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3"
/>

### Create New Vector Store
<img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47
49 PM"
src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158"
/>

### Edit Vector Store
<img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48
32 PM"
src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414"
/>


### Vector Store Files Contents page (with Embeddings)
<img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54
32 PM"
src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27"
/>

### Vector Store Files Contents Details page (with Embeddings)
<img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55
00 PM"
src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Tests added for Middleware extension and Provider failures.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-12 09:59:48 -08:00
ehhuang
71b328fc4b
chore(ui): add npm package and dockerfile (#4100)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 53s
# What does this PR do?
- sets up package.json for npm `llama-stack-ui` package (will update
llama-stack-ops)
- adds dockerfile for UI docker image

## Test Plan
npx:
npm build && npm pack
LLAMA_STACK_UI_PORT=8322 npx
/Users/erichuang/projects/ui/src/llama_stack_ui/llama-stack-ui-0.4.0-alpha.2.tgz

docker:
cd src/llama_stack_ui
docker build . -f Dockerfile  --tag test_ui --no-cache

❯ docker run -p 8322:8322 \
      -e LLAMA_STACK_UI_PORT=8322 \
      test_ui:latest
2025-11-11 10:40:31 -08:00
paulengineer
e5a55f3677
docs: use 'uv pip' to avoid pitfalls of using 'pip' in virtual environment (#4122)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
UI Tests / ui-tests (22) (push) Successful in 53s
# What does this PR do?
In the **Detailed Tutorial**, at **Step 3**, the **Install with venv**
option creates a new virtual environment `client`, activates it then
attempts to install the llama-stack-client using pip.
```
uv venv client --python 3.12
source client/bin/activate
pip install llama-stack-client    <- this is the problematic line
```
However, the pip command will likely fail because the `uv venv` command
doesn't, by default, include adding the pip command to the virtual
environment that is created. The pip command will error either because
pip doesn't exist at all, or, if the pip command does exist outside of
the virtual environment, return a different error message. The latter
may be unclear to the user why it is failing.

This PR changes 'pip' to 'uv pip', allowing the install action to
function in the virtual environment as intended, and without the need
for pip to be installed.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
1. Use linux or WSL (virtual environments on Windows use `Scripts`
folder instead of `bin` [virtualenv
#993ba13](993ba1316a)
which doesn't align with the tutorial)
2. Clone the `llama-stack` repo
3. Run the following and verify success:
```
uv venv client --python 3.12
source client/bin/activate
```
5. Run the updated command:
```
uv pip install llama-stack-client
```
6. Observe the console output confirms that the virtual environment
`client` was used:

> Using Python 3.12.3 environment at: **client**
2025-11-11 07:49:03 -05:00
Nathan Weinberg
97ccfb5e62
refactor: inspect routes now shows all non-deprecated APIs (#4116)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 21s
UI Tests / ui-tests (22) (push) Successful in 46s
# What does this PR do?
the inspect API lacked any mechanism to get all
non-deprecated APIs (v1, v1alpha, v1beta)
change default to this behavior

'v1' filter can be used for user' wanting a list
of stable APIs

## Test Plan
1. pull the PR
2. launch a LLS server
3. run `curl http://beanlab3.bss.redhat.com:8321/v1/inspect/routes`
4. note there are APIs for `v1`, `v1alpha`, and `v1beta` but no
deprecated APIs

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-10 15:57:17 -08:00
Shabana Baig
433438cfc0
feat: Implement the 'max_tool_calls' parameter for the Responses API (#4062)
# Problem
Responses API uses max_tool_calls parameter to limit the number of tool
calls that can be generated in a response. Currently, LLS implementation
of the Responses API does not support this parameter.

# What does this PR do?
This pull request adds the max_tool_calls field to the response object
definition and updates the inline provider. it also ensures that:

- the total number of calls to built-in and mcp tools do not exceed
max_tool_calls
- an error is thrown if max_tool_calls < 1 (behavior seen with the
OpenAI Responses API, but we can change this if needed)

Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563)

## Test Plan
- Tested manually for change in model response w.r.t supplied
max_tool_calls field.
- Added integration tests to test invalid max_tool_calls parameter.
- Added integration tests to check max_tool_calls parameter with
built-in and function tools.
- Added integration tests to check max_tool_calls parameter in the
returned response object.
- Recorded OpenAI Responses API behavior using a sample script:
https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-10 13:21:27 -08:00
Dennis Kennetz
209a78b618
feat: add oci genai service as chat inference provider (#3876)
# What does this PR do?
Adds OCI GenAI PaaS models for openai chat completion endpoints.

## Test Plan
In an OCI tenancy with access to GenAI PaaS, perform the following
steps:

1. Ensure you have IAM policies in place to use service (check docs
included in this PR)
2. For local development, [setup OCI
cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm)
and configure the CLI with your region, tenancy, and auth
[here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm)
3. Once configured, go through llama-stack setup and run llama-stack
(uses config based auth) like:
```bash
OCI_AUTH_TYPE=config_file \
OCI_CLI_PROFILE=CHICAGO \
OCI_REGION=us-chicago-1 \
OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \
llama stack run oci
```
4. Hit the `models` endpoint to list models after server is running:
```bash
curl http://localhost:8321/v1/models | jq
...
{
      "identifier": "meta.llama-4-scout-17b-16e-instruct",
      "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q",
      "provider_id": "oci",
      "type": "model",
      "metadata": {
        "display_name": "meta.llama-4-scout-17b-16e-instruct",
        "capabilities": [
          "CHAT"
        ],
        "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q"
      },
      "model_type": "llm"
},
   ...
```
5. Use the "display_name" field to use the model in a
`/chat/completions` request:
```bash
# Streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": true,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'

# Non-streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": false,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'
```
6. Try out other models from the `/models` endpoint.
2025-11-10 16:16:24 -05:00
Ashwin Bharambe
fadf17daf3
feat(api)!: deprecate register/unregister resource APIs (#4099)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Pre-commit / pre-commit (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 1m10s
Mark all register_* / unregister_* APIs as deprecated across models,
shields, tool groups, datasets, benchmarks, and scoring functions. This
is the first step toward moving resource mutations to an `/admin`
namespace as outlined in
https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585.

The deprecation flag will be reflected in the OpenAPI schema to warn API
users that these endpoints are being phased out. Next step will be
implementing the `/admin` route namespace for these resource management
operations.

- `register_model` / `unregister_model`
- `register_shield` / `unregister_shield`
- `register_tool_group` / `unregister_toolgroup`
- `register_dataset` / `unregister_dataset`
- `register_benchmark` / `unregister_benchmark`
- `register_scoring_function` / `unregister_scoring_function`
2025-11-10 10:36:33 -08:00
ehhuang
d4ecbfd092
fix(vector store)!: fix file content API (#4105)
# What does this PR do?
- changed to match
https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml

## Test Plan
updated test CI
2025-11-10 10:16:35 -08:00
Vaishnavi Hire
4341c4c2ac
docs: Add Llama Stack Operator docs (#3983)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Add documentation for llama-stack-k8s-operator under kubernetes
deployment guide.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
2025-11-10 15:29:15 +01:00
Aakanksha Duggal
b83184f7ef
feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103)
# What does this PR do?
Resolves #4102 

1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and
the `OpenAIResponseInputToolWebSearch.type` Literal union
2. No changes needed to tool execution logic - all `web_search` types
map to the same underlying tool
3. Backward compatibility is maintained - existing `web_search`,
`web_search_preview`, and `web_search_preview_2025_03_11` types continue
to work
4. Added an integration test case using {"type":
"web_search_2025_08_26"} to verify it works correctly
5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to
reflect that `web_search_2025_08_26` is now supported.
6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist
in the codebase)


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-07 10:01:12 -08:00
Ashwin Bharambe
f49cb0b717
chore: Stack server no longer depends on llama-stack-client (#4094)
This dependency has been bothering folks for a long time (cc @leseb). We
really needed it due to "library client" which is primarily used for our
tests and is not a part of the Stack server. Anyone who needs to use the
library client can certainly install `llama-stack-client` in their
environment to make that work.

Updated the notebook references to install `llama-stack-client`
additionally when setting things up.
2025-11-07 09:54:09 -08:00
Sumanth Kamenani
e894e36eea
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.

Closes: #3410


## What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.

## Test Plan

**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format

# Bedrock OpenAI-Compatible Provider - Test Results


**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`


---

## Test 1: Model Listing

**Request:**
```http
GET /v1/models HTTP/1.1
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}
```

---

## Test 2: Non-Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```

---

## Test 3: Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```

---

## Test 4: Error Handling - Invalid Model

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```

---

## Test 5: Multi-Turn Conversation

**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}
```

**Response 1:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}
```

**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}
```

**Response 2:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```

**Context retained across turns**

---

## Test 6: System Messages

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}
```


---

## Test 7: Tool Calling

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}
```

---

## Test 8: Sampling Parameters

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}
```

---

## Test 9: Authentication Error Handling

### Subtest A: Invalid API Key

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```

---

### Subtest B: Empty API Key (Fallback to Config)

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}
```

 **Fell back to config key**

---

### Subtest C: Malformed Token

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
2025-11-06 17:18:18 -08:00
Ashwin Bharambe
a2c4c12384
chore(ui): remove the Streamlit UI (#4097) 2025-11-06 15:51:57 -08:00
Ashwin Bharambe
bef1b044bd
refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 48s
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.

A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models

## Test Plan

Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581

Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']

Using LLM model: passthrough/ollama/llama3.2-vision:11b

Making inference request...

Response: 4.

--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
2025-11-05 18:15:11 -08:00
Roy Belio
c672a5d792
feat: ability to use postgres as store for starter distro (#4076)
## What does this PR do?

The starter distribution now comes with all the required packages to
support persistent stores—like the agent store, metadata, and
inference—using PostgreSQL. Users can enable PostgreSQL support by
setting the `ENABLE_POSTGRES_STORE=1` environment variable.

This PR consolidates the functionality from the removed `postgres-demo`
distribution into the starter distribution, reducing maintenance
overhead.

**Closes: #2619**  
**Supersedes: #2851** (rebased and updated)

## Changes Made

1. **Added PostgreSQL support to starter distribution**
   - New `run-with-postgres-store.yaml` configuration
- Automatic config switching via `ENABLE_POSTGRES_STORE` environment
variable
   - Removed separate `postgres-demo` distribution

2. **Updated to new build system**
   - Integrated postgres switching logic into Containerfile entrypoint
   - Uses new `storage_backends` and `storage_stores` API
   - Properly configured both PostgreSQL KV store and SQL store

3. **Updated dependencies**
   - Added `psycopg2-binary` and `asyncpg` to starter distribution
   - All postgres-related dependencies automatically included

## How to Use

### With Docker (PostgreSQL):
```bash
docker run \
  -e ENABLE_POSTGRES_STORE=1 \
  -e POSTGRES_HOST=your_postgres_host \
  -e POSTGRES_PORT=5432 \
  -e POSTGRES_DB=llamastack \
  -e POSTGRES_USER=llamastack \
  -e POSTGRES_PASSWORD=llamastack \
  -e OPENAI_API_KEY=your_key \
  llamastack/distribution-starter
```

### PostgreSQL environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)

## Test Plan

All pre-commit hooks pass (mypy, ruff, distro-codegen)  
`llama stack list-deps starter` confirms psycopg2-binary is included  
Storage configuration correctly uses PostgreSQL backends  
Container builds successfully with postgres support  

## Credits

Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to
work with latest main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Sébastien Han @leseb

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-11-05 15:37:06 -08:00
ehhuang
9d5c34af27
fix!: BREAKING CHANGE: vector_store: search API response fix (#4080)
# What does this PR do?
- search_query in the vector store search API should be a list,
according to https://github.com/openai/openai-openapi


## Test Plan
modified tests


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080).
* #4086
* __->__ #4080
2025-11-05 15:01:48 -08:00
ehhuang
95b0493fae
chore: move src/llama_stack/ui to src/llama_stack_ui (#4068)
# What does this PR do?
This better separates UI from backend code, which was a point of
confusion often for our beloved AI friends.


## Test Plan
CI
2025-11-04 15:21:49 -08:00
Ashwin Bharambe
5850e3473f fix: remove straggler openapi HTML file 2025-11-04 14:54:33 -08:00
Ashwin Bharambe
0c49a53c97
chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067)
RAG aka file search is implemented via the Responses API by specifying
the file-search tool. The backend implementation remains unchanged. This
PR merely removes the directly exposed API surface which allowed users
to directly perform searches from the client.

This facility is now available via the `client.vector_store.search()`
OpenAI compatible API.
2025-11-04 14:50:54 -08:00
Ashwin Bharambe
a8a8aa56c0
chore!: remove the agents (sessions and turns) API (#4055)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m10s
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed

The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.

Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
2025-11-04 09:38:39 -08:00
Ashwin Bharambe
053fc0ac39
chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m13s
This PR removes all routes which we had marked deprecated for the 0.3.0
release.

This includes:
- all the `/v1/openai/v1/` routes (the corresponding /v1 routes still
exist of course)
- the /agents API (which is superseded completely by Responses +
Conversations)
- several alpha routes which had a "v1" route to aide transitioning to
"v1alpha"

This is the corresponding client-python change:
https://github.com/llamastack/llama-stack-client-python/pull/294
2025-11-03 19:00:59 -08:00
Ashwin Bharambe
cb40da210f
fix: update tests for OpenAI-style models endpoint (#4053)
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.

**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
2025-11-03 17:30:08 -08:00
Sébastien Han
4a5ef65286
chore!: remove SDG API (#4035)
# What does this PR do?

This API hasn't received any traction and close to zero interest from
the community. Let's revisit in the future if things change.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 16:12:06 -08:00
Ashwin Bharambe
44096512b5
feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051)
We need to remove `/v1/openai/v1` paths shortly. There is one trouble --
our current `/v1/openai/v1/models` endpoint provides different data than
`/v1/models`. Unfortunately our tests target the latter (llama-stack
customized) behavior. We need to get to true OpenAI compatibility.

This is step 1: adding `custom_metadata` field to `OpenAIModel` that
includes all the extra stuff we add in the native `/v1/models` response.
This can be extracted on the consumer end by look at
`__pydantic_extra__` or other similar fields.

This PR:
- Adds `custom_metadata` field to `OpenAIModel` class in
`src/llama_stack/apis/models/models.py`
- Modified `openai_list_models()` in
`src/llama_stack/core/routing_tables/models.py` to populate
custom_metadata

Next Steps
1. Update stainless client to use `/v1/openai/v1/models` instead of
`/v1/models`
2. Migrate tests to read from `custom_metadata`
3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single
`/v1/models` endpoint
2025-11-03 15:56:07 -08:00
Sébastien Han
d4aa348b60
chore: remove HTML generation for openapi spec (#4039)
# What does this PR do?

This seems to be an ancient artifact when we were using readthedocs? Now
docusaurus read the specs directly.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 18:03:40 +01:00
raghotham
62603d25c2
chore(api)!: /v1/inspect only lists v1 apis by default (#3948)
# What does this PR do?
Allow filtering for v1alpha, v1beta, deprecated and v1. Backward
incompatible change since by default it only returns v1 apis now.

## Test Plan
added unit test
2025-10-31 11:55:46 -07:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
Sébastien Han
b4ea05ada9
chore: add batches to openapi schema (#3980)
# What does this PR do?

While working on https://github.com/llamastack/llama-stack/pull/3944 I
realized that the batches API wasn't generated.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-30 07:08:35 -07:00
Charlie Doern
e8ecc99524
fix!: remove chunk_id property from Chunk class (#3954)
# What does this PR do?

chunk_id in the Chunk class executes actual logic to compute a chunk ID.
This sort of logic should not live in the API spec.

Instead, the providers should be in charge of calling generate_chunk_id,
and pass it to `Chunk`.

this removes the incorrect dependency between Provider impl and API impl

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-29 18:59:59 -07:00
Omar Abdelwahab
e6b27db30a
docs: A getting started notebook featuring simple agent examples. (#3955)
# What does this PR do?
Getting started notebook featuring simple agent examples.

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-10-29 14:13:34 -04:00
Nathan Weinberg
b90c6a2c8b
fix(docs): remove leftover telemetry sidebar section (#3961)
Leftover telemetry section was preventing `npm run build` from
completing successfully

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-29 11:20:13 -04:00
ehhuang
1f9d48cd54
feat: openai files provider (#3946)
# What does this PR do?
- Adds OpenAI files provider 
- Note that file content retrieval is pretty limited by `purpose`
https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com

## Test Plan
Modify run yaml to use openai files provider:
```
  files:
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY:=}
      metadata_store:
        backend: sql_default
        table_name: openai_files_metadata

# Then run files tests
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files
```
2025-10-28 16:25:03 -07:00
raghotham
feabcdd67b
docs: add documentation on how to use custom run yaml in docker (#3949)
as title

test plan:

```yaml
# custom-ollama-run.yaml
version: 2
image_name: starter
external_providers_dir: /.llama/providers.d
apis:
- inference
- vector_io
- files
- safety
- tool_runtime
- agents

providers:
  inference:
  # Single Ollama provider for all models
  - provider_id: ollama
    provider_type: remote::ollama
    config:
      url: ${env.OLLAMA_URL:=http://localhost:11434}

  vector_io:
  - provider_id: faiss
    provider_type: inline::faiss
    config:
      persistence:
        namespace: vector_io::faiss
        backend: kv_default

  files:
  - provider_id: meta-reference-files
    provider_type: inline::localfs
    config:
      storage_dir: /.llama/files
      metadata_store:
        table_name: files_metadata
        backend: sql_default

  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
    config:
      excluded_categories: []

  tool_runtime:
  - provider_id: rag-runtime
    provider_type: inline::rag-runtime

  agents:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      persistence:
        agent_state:
          namespace: agents
          backend: kv_default
        responses:
          table_name: responses
          backend: sql_default
          max_write_queue_size: 10000
          num_writers: 4

storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: /.llama/kvstore.db
    sql_default:
      type: sql_sqlite
      db_path: /.llama/sql_store.db
  stores:
    metadata:
      namespace: registry
      backend: kv_default
    inference:
      table_name: inference_store
      backend: sql_default
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      table_name: openai_conversations
      backend: sql_default

registered_resources:
  models:
  # All models use the same 'ollama' provider
  - model_id: llama3.2-vision:latest
    provider_id: ollama
    provider_model_id: llama3.2-vision:latest
    model_type: llm
  - model_id: llama3.2:3b
    provider_id: ollama
    provider_model_id: llama3.2:3b
    model_type: llm
  # Embedding models
  - model_id: nomic-embed-text-v2-moe
    provider_id: ollama
    provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
    model_type: embedding
    metadata:
      embedding_dimension: 768
  shields: []
  vector_dbs: []
  datasets: []
  scoring_fns: []
  benchmarks: []
  tool_groups: []

server:
  port: 8321

telemetry:
  enabled: true

vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: ollama
    model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
```

```bash
docker run
     -it
     --pull always
     -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
     -v ~/.llama:/root/.llama
     -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml
     -e RUN_CONFIG_PATH=/app/custom-run.yaml
     -e OLLAMA_URL=http://host.docker.internal:11434/
     llamastack/distribution-starter:0.3.0
     --port $LLAMA_STACK_PORT
```
2025-10-28 16:05:44 -07:00
Ian Miller
5598f61e12
feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for making changes to Responses API scheme to
introduce OpenAI compatible prompts there. Change to the API only,
therefore currently no implementation at all. However, the follow up PR
with actual implementation will be submitted after current PR lands.

The need of this functionality was initiated in #3514. 

> Note, #3514 is divided on three separate PRs. Current PR is the second
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
CI
2025-10-28 09:31:27 -07:00
ehhuang
b7dd3f5c56
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do?


## Test Plan
CI
vector_io tests will fail until next client sync

passed with
https://github.com/llamastack/llama-stack-client-python/pull/286 checked
out locally
2025-10-27 14:26:06 -07:00
IAN MILLER
98a5047f9d
feat(prompts): attach prompts to storage stores in run configs (#3893)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for attaching prompts to storage stores in run
configs. It allows to specify prompts as stores in different
distributions. The need of this functionality was initiated in #3514

> Note, #3514 is divided on three separate PRs. Current PR is the first
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing and updated CI unit tests

Prerequisites:

1. `uv run --with llama-stack llama stack list-deps starter | xargs -L1
uv pip install`

2. `llama stack run starter `

```
INFO     2025-10-23 15:36:17,387 llama_stack.cli.stack.run:100 cli: Using run configuration:                            
         /Users/ianmiller/llama-stack/llama_stack/distributions/starter/run.yaml                                        
INFO     2025-10-23 15:36:17,423 llama_stack.cli.stack.run:157 cli: HTTPS enabled with certificates:                    
           Key: None                                                                                                    
           Cert: None                                                                                                   
INFO     2025-10-23 15:36:17,424 llama_stack.cli.stack.run:159 cli: Listening on ['::', '0.0.0.0']:8321                 
INFO     2025-10-23 15:36:17,749 llama_stack.core.server.server:521 core::server: Run configuration:                    
INFO     2025-10-23 15:36:17,756 llama_stack.core.server.server:524 core::server: apis:                                 
         - agents                                                                                                       
         - batches                                                                                                      
         - datasetio                                                                                                    
         - eval                                                                                                         
         - files                                                                                                        
         - inference                                                                                                    
         - post_training                                                                                                
         - safety                                                                                                       
         - scoring                                                                                                      
         - tool_runtime                                                                                                 
         - vector_io                                                                                                    
         image_name: starter                                                                                            
         providers:                                                                                                     
           agents:                                                                                                      
           - config:                                                                                                    
               persistence:                                                                                             
                 agent_state:                                                                                           
                   backend: kv_default                                                                                  
                   namespace: agents                                                                                    
                 responses:                                                                                             
                   backend: sql_default                                                                                 
                   max_write_queue_size: 10000                                                                          
                   num_writers: 4                                                                                       
                   table_name: responses                                                                                
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           batches:                                                                                                     
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: batches                                                                                     
             provider_id: reference                                                                                     
             provider_type: inline::reference                                                                           
           datasetio:                                                                                                   
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::huggingface                                                                      
             provider_id: huggingface                                                                                   
             provider_type: remote::huggingface                                                                         
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::localfs                                                                          
             provider_id: localfs                                                                                       
             provider_type: inline::localfs                                                                             
           eval:                                                                                                        
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: eval                                                                                        
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           files:                                                                                                       
           - config:                                                                                                    
               metadata_store:                                                                                          
                 backend: sql_default                                                                                   
                 table_name: files_metadata                                                                             
               storage_dir: /Users/ianmiller/.llama/distributions/starter/files                                         
             provider_id: meta-reference-files                                                                          
             provider_type: inline::localfs                                                                             
           inference:                                                                                                   
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.fireworks.ai/inference/v1                                                               
             provider_id: fireworks                                                                                     
             provider_type: remote::fireworks                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.together.xyz/v1                                                                         
             provider_id: together                                                                                      
             provider_type: remote::together                                                                            
           - config: {}                                                                                                 
             provider_id: bedrock                                                                                       
             provider_type: remote::bedrock                                                                             
           - config:                                                                                                    
               api_key: '********'                                                                                      
               base_url: https://api.openai.com/v1                                                                      
             provider_id: openai                                                                                        
             provider_type: remote::openai                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: anthropic                                                                                     
             provider_type: remote::anthropic                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: gemini                                                                                        
             provider_type: remote::gemini                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.groq.com                                                                                
             provider_id: groq                                                                                          
             provider_type: remote::groq                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.sambanova.ai/v1                                                                         
             provider_id: sambanova                                                                                     
             provider_type: remote::sambanova                                                                           
           - config: {}                                                                                                 
             provider_id: sentence-transformers                                                                         
             provider_type: inline::sentence-transformers                                                               
           post_training:                                                                                               
           - config:                                                                                                    
               checkpoint_format: meta                                                                                  
             provider_id: torchtune-cpu                                                                                 
             provider_type: inline::torchtune-cpu                                                                       
           safety:                                                                                                      
           - config:                                                                                                    
               excluded_categories: []                                                                                  
             provider_id: llama-guard                                                                                   
             provider_type: inline::llama-guard                                                                         
           - config: {}                                                                                                 
             provider_id: code-scanner                                                                                  
             provider_type: inline::code-scanner                                                                        
           scoring:                                                                                                     
           - config: {}                                                                                                 
             provider_id: basic                                                                                         
             provider_type: inline::basic                                                                               
           - config: {}                                                                                                 
             provider_id: llm-as-judge                                                                                  
             provider_type: inline::llm-as-judge                                                                        
           - config:                                                                                                    
               openai_api_key: '********'                                                                               
             provider_id: braintrust                                                                                    
             provider_type: inline::braintrust                                                                          
           tool_runtime:                                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: brave-search                                                                                  
             provider_type: remote::brave-search                                                                        
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: tavily-search                                                                                 
             provider_type: remote::tavily-search                                                                       
           - config: {}                                                                                                 
             provider_id: rag-runtime                                                                                   
             provider_type: inline::rag-runtime                                                                         
           - config: {}                                                                                                 
             provider_id: model-context-protocol                                                                        
             provider_type: remote::model-context-protocol                                                              
           vector_io:                                                                                                   
           - config:                                                                                                    
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::faiss                                                                            
             provider_id: faiss                                                                                         
             provider_type: inline::faiss                                                                               
           - config:                                                                                                    
               db_path: /Users/ianmiller/.llama/distributions/starter/sqlite_vec.db                                     
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::sqlite_vec                                                                       
             provider_id: sqlite-vec                                                                                    
             provider_type: inline::sqlite-vec                                                                          
         registered_resources:                                                                                          
           benchmarks: []                                                                                               
           datasets: []                                                                                                 
           models: []                                                                                                   
           scoring_fns: []                                                                                              
           shields: []                                                                                                  
           tool_groups:                                                                                                 
           - provider_id: tavily-search                                                                                 
             toolgroup_id: builtin::websearch                                                                           
           - provider_id: rag-runtime                                                                                   
             toolgroup_id: builtin::rag                                                                                 
           vector_stores: []                                                                                            
         server:                                                                                                        
           port: 8321                                                                                                   
         storage:                                                                                                       
           backends:                                                                                                    
             kv_default:                                                                                                
               db_path: /Users/ianmiller/.llama/distributions/starter/kvstore.db                                        
               type: kv_sqlite                                                                                          
             sql_default:                                                                                               
               db_path: /Users/ianmiller/.llama/distributions/starter/sql_store.db                                      
               type: sql_sqlite                                                                                         
           stores:                                                                                                      
             conversations:                                                                                             
               backend: sql_default                                                                                     
               table_name: openai_conversations                                                                         
             inference:                                                                                                 
               backend: sql_default                                                                                     
               max_write_queue_size: 10000                                                                              
               num_writers: 4                                                                                           
               table_name: inference_store                                                                              
             metadata:                                                                                                  
               backend: kv_default                                                                                      
               namespace: registry                                                                                      
             prompts:                                                                                                   
               backend: kv_default                                                                                      
               namespace: prompts                                                                                       
         telemetry:                                                                                                     
           enabled: true                                                                                                
         vector_stores:                                                                                                 
           default_embedding_model:                                                                                     
             model_id: nomic-ai/nomic-embed-text-v1.5                                                                   
             provider_id: sentence-transformers                                                                         
           default_provider_id: faiss                                                                                   
         version: 2                                                                                                     
                                                                                                                        
INFO     2025-10-23 15:36:20,032 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue        
         disabled for SQLite to avoid concurrency issues                                                                
WARNING  2025-10-23 15:36:20,422 llama_stack.providers.inline.telemetry.meta_reference.telemetry:84 telemetry:          
         OTEL_EXPORTER_OTLP_ENDPOINT is not set, skipping telemetry                                                     
INFO     2025-10-23 15:36:22,379 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils:               
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models                                           
INFO     2025-10-23 15:36:22,703 uvicorn.error:84 uncategorized: Started server process [17328]                         
INFO     2025-10-23 15:36:22,704 uvicorn.error:48 uncategorized: Waiting for application startup.                       
INFO     2025-10-23 15:36:22,706 llama_stack.core.server.server:179 core::server: Starting up Llama Stack server        
         (version: 0.3.0)                                                                                               
INFO     2025-10-23 15:36:22,707 llama_stack.core.stack:470 core: starting registry refresh task                        
INFO     2025-10-23 15:36:22,708 uvicorn.error:62 uncategorized: Application startup complete.                          
INFO     2025-10-23 15:36:22,708 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)   
```
As you can see, prompts are attached to stores in config

Testing:

1. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \                 
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

2. Get prompt:

`curl -X GET
http://localhost:8321/v1/prompts/pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f`

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

3. Query sqlite KV storage to check created prompt:

```
sqlite> .mode column
sqlite> .headers on
sqlite> SELECT * FROM kvstore WHERE key LIKE 'prompts:v1:%';
key                                                           value                                                         expiration
------------------------------------------------------------  ------------------------------------------------------------  ----------
prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  {"prompt_id": "pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab            
163f:1                                                        5f6e163f", "prompt": "Hello {{name}}! You are working at {{c            
                                                              ompany}}. Your role is {{role}} at {{company}}. Remember, {{            
                                                              name}}, to be {{tone}}.", "version": 1, "variables": ["name"            
                                                              , "company", "role", "tone"], "is_default": false}                      

prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  1                                                                       
163f:default                                                                                                                          
sqlite> 
```
2025-10-27 11:12:12 -07:00
Luis Tomas Bolivar
63422e5b36
fix!: Enhance response API support to not fail with tool calling (#3385)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 1m3s
Vector IO Integration Tests / test-matrix (push) Failing after 1m6s
API Conformance Tests / check-schema-compatibility (push) Successful in 1m17s
UI Tests / ui-tests (22) (push) Successful in 1m18s
Pre-commit / pre-commit (push) Successful in 3m5s
# What does this PR do?
Introduces two main fixes to enhance the stability of Responses API when
dealing with tool calling responses and structured outputs.

### Changes Made

1. It added OpenAIResponseOutputMessageMCPCall and ListTools to
OpenAIResponseInput but
https://github.com/llamastack/llama-stack/pull/3810 got merge that did
the same in a different way. Still this PR does it in a way that keep
the sync between OpenAIResponsesOutput and the allowed objects in
OpenAIResponseInput.

2. Add protection in case self.ctx.response_format does not have type
attribute

BREAKING CHANGE: OpenAIResponseInput now uses OpenAIResponseOutput union
type.
This is semantically equivalent - all previously accepted types are
still supported
via the OpenAIResponseOutput union. This improves type consistency and
maintainability.
2025-10-27 09:33:02 -07:00
Luis Tomas Bolivar
f18b5eb537
fix: Avoid BadRequestError due to invalid max_tokens (#3667)
This patch ensures if max tokens is not defined, then is set to None
instead of 0 when calling openai_chat_completion. This way some
providers (like gemini) that cannot handle the `max_tokens = 0` will not
fail

Issue: #3666
2025-10-27 09:27:21 -07:00
ehhuang
509676641a
chore: update run configs (#3902)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m34s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
# What does this PR do?
telemetry was deprecated


## Test Plan
2025-10-24 15:03:06 -07:00
ehhuang
2a1a813308
chore: update docs for telemetry api removal (#3900)
# What does this PR do?
Telemetry is no longer an API/provider.

## Test Plan
2025-10-24 13:57:28 -07:00
Francisco Arceo
4566eebe05
feat: Add static file import system for docs (#3882)
# What does this PR do?

Add static file import system for docs

- Use `remark-code-import` plugin to embed code at build time
- Support importing Python code with syntax highlighting using
`raw-loader` + `ReactMarkdown`

One caveat is that currently when embedding markdown with code used the
syntax highlighting isn't behaving but I'll investigate that in a follow
up.

## Test Plan

Python Example:
<img width="1372" height="995" alt="Screenshot 2025-10-23 at 9 22 18 PM"
src="https://github.com/user-attachments/assets/656d2c78-4d9b-45a4-bd5e-3f8490352b85"
/>

Markdown example:
<img width="1496" height="1070" alt="Screenshot 2025-10-23 at 9 22
38 PM"
src="https://github.com/user-attachments/assets/6c0a07ec-ff7c-45aa-b05f-8c46acd4445c"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-24 14:01:33 -04:00