# What does this PR do?
Building/Deploying docs is failing here:
5530320962 (step):8:49
Needs the playground file. Updated it to reflect current admin status.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Fixed bug where models with No provider_model_id were incorrectly
filtered from the startup config display. The function was checking
multiple fields when it should only filter items with explicitly
disabled provider_id.
Changes:
o Modified remove_disabled_providers to only check provider_id field o
Changed condition from checking multiple fields with None to only
checking provider_id for "__disabled__", None or empty string
o Added comprehensive unit tests
Closes: #4131
Signed-off-by: Derek Higgins <derekh@redhat.com>
We would like to run all OpenAI compatibility tests using only the
openai-client library. This is most friendly for contributors since they
can run tests without needing to update the client-sdks (which is
getting easier but still a long pole.)
This is the first step in enabling that -- no using "library client" for
any of the Responses tests. This seems like a reasonable trade-off since
the usage of an embeddeble library client for Responses (or any
OpenAI-compatible) behavior seems to be not very common. To do this, we
needed to enable MCP tests (which only worked in library client mode)
for server mode.
docs: Add comprehensive Files API and Vector Store integration
documentation
- Add Files API documentation with OpenAI-compatible endpoints
- Create comprehensive guide for OpenAI-compatible file operations
- Reorganize documentation structure: move file operations to files/
directory
- Add vector store provider documentation for Milvus, SQLite-vec, FAISS
- Clean up redundant files and improve navigation
- Update cross-references and eliminate documentation duplication
- Support for release 0.2.14 FileResponse and Vector Store API features
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Add authorization parameter to list_runtime_tools() method to support
MCP servers that require authentication for listing tools.
Changes:
- Updated ToolRuntime protocol to include authorization parameter on list_runtime_tools()
- Updated all provider implementations (MCP, Tavily, Brave, Bing, Wolfram Alpha)
- Updated router and routing table to pass authorization through
- Updated API recorder patched methods to include authorization parameter
This enables authenticated tool listing for enterprise MCP deployments
where IT administrators pre-configure connectors requiring authentication.
Note: Client SDK will need to be regenerated from updated OpenAPI spec
to support passing this parameter from client code. Tests will pass once
client SDK is updated.
Updated all tool runtime provider implementations to remove the authorization
parameter from list_runtime_tools():
- tavily_search.py
- brave_search.py
- wolfram_alpha.py
- bing_search.py
These providers were missing in the previous commit. Tool listing typically
doesn't require authentication - only invoke_tool() needs the authorization
parameter for authenticated tool execution.
This ensures all tool runtime providers have consistent signatures matching
the updated protocol definition.
The authorization parameter should only be on invoke_tool(), not on
list_runtime_tools(). Tool listing typically doesn't require authentication,
and the client SDK doesn't have this parameter yet.
Changes:
1. Removed authorization parameter from ToolRuntime.list_runtime_tools() protocol method
2. Updated all implementations to remove the authorization parameter:
- MCPProviderImpl.list_runtime_tools()
- ToolRuntimeRouter.list_runtime_tools()
- ToolGroupsRoutingTable.list_tools() and _index_tools()
3. Updated test to remove authorization from list_tools() call
This ensures compatibility with the llama-stack-client SDK which doesn't
support authorization on list_tools() yet. Only invoke_tool() requires
and accepts the authorization parameter for authenticated tool execution.
Removing the debug logging that was added to diagnose signature mismatch errors.
The logging served its purpose - it helped us identify that the error was coming
from api_recorder.py patched methods, not the actual provider implementations.
With the root cause now fixed in api_recorder.py, this debug logging is no longer
needed and can be safely removed to keep the code clean.
Now that we've fixed the actual root cause (api_recorder.py missing the
authorization parameter), we can revert all the CI workarounds that were
added during troubleshooting:
Removed changes:
- Cache clearing (venv, pycache, UV cache)
- PYTHONDONTWRITEBYTECODE environment variable
- --no-install-project flag
- Force reinstalling llama-stack
- Installing ci-tests distribution dependencies via llama CLI
- Final bytecode cache cleanup
These were all based on incorrect diagnosis (missing dependencies or module
caching) and are no longer needed. The real fix was updating api_recorder.py
to include the authorization parameter in patched tool runtime methods.
Restoring the simpler, original CI setup that just runs 'uv sync --all-groups'.
The ACTUAL root cause of the signature mismatch errors was found!
The api_recorder.py module patches tool runtime invoke_tool methods for test
recording/replay, but the patched methods were missing the new 'authorization'
parameter. The debug logging revealed:
Object method: patched_tavily_invoke_tool (from api_recorder module)
Object method's module: llama_stack.testing.api_recorder
Changes made:
1. Updated _patched_tool_invoke_method() to accept authorization parameter
2. Updated patched_tavily_invoke_tool() signature to include authorization
3. Added debug logging to resolver to help identify similar issues in the future
This fix ensures that when tests run in record/replay mode, the patched methods
preserve the full signature including the authorization parameter, allowing the
protocol compliance checks to pass.
Adding comprehensive debug logging to understand what's causing the persistent
signature mismatch errors in CI. The logging will show:
- Provider class name and module
- Both protocol and object signatures
- The actual method object
- The method's source module
This will help us identify if the issue is:
1. A cached module being loaded
2. A parent class overriding the method
3. Some other source of the wrong signature
Once we see the debug output, we can pinpoint the exact root cause.
The signature mismatch error persists because 'uv sync' installs and potentially imports
the llama-stack package, caching provider modules in memory BEFORE we do the editable
install with fresh source code.
This fix adds the --no-install-project flag to 'uv sync', which:
1. Installs all dependencies but skips installing the project itself
2. Prevents Python from importing and caching provider modules
3. Ensures the subsequent 'uv pip install -e .' loads fresh source code
This should finally resolve the persistent signature mismatch errors in CI where
the protocol has 'authorization' parameter but provider implementations appear not to.
The previous commit tried to run 'llama stack list-deps' directly, but the 'llama' command
wasn't in PATH yet since the virtual environment hadn't been activated.
This fix uses 'uv run llama' instead, which executes the command within the uv virtual
environment context, ensuring the llama CLI is accessible.
The CI integration tests were failing with a signature mismatch error, but the root cause was missing dependencies (specifically the 'together' package). The signature mismatch was a misleading error that occurred because the provider modules failed to load properly due to missing dependencies.
This fix adds a step to install all ci-tests distribution dependencies using:
llama stack list-deps ci-tests | xargs -L1 uv pip install
This ensures all required provider dependencies are installed before running tests.
The issue was timing - we were clearing cache before installations,
but uv sync/pip install were creating new .pyc files. This commit:
1. Adds PYTHONDONTWRITEBYTECODE=1 to prevent .pyc generation
2. Clears bytecode cache AFTER all installations complete
3. Ensures no stale .pyc files exist before tests run
For editable installs (-e .), Python loads from source directory,
so clearing cache after installation ensures the resolver sees the
latest method signatures with the authorization parameter.
The GitHub Actions cache was restoring a cached virtual environment
(.venv) with old code. This commit clears all caching layers:
1. Removes cached .venv directory (the main culprit)
2. Clears Python bytecode cache (.pyc files)
3. Clears UV cache directory
This forces uv sync to create a completely fresh virtual environment
with the latest source code changes, ensuring the authorization
parameter is picked up across all tool runtime providers.
The previous approach of removing uv.lock caused dependency resolution
failures. The real issue is the UV_CACHE_DIR that contains pre-built
wheels with old code. This commit:
1. Keeps uv.lock (it's part of the project)
2. Clears UV_CACHE_DIR (where compiled wheels are cached)
3. Forces uv to rebuild wheels from source
This ensures the latest source code changes are picked up without
breaking dependency resolution.
The uv.lock file contains cached dependency resolutions that prevent
source code changes from being picked up. By removing it before uv sync,
we force a fresh resolution and rebuild of dependencies.
This should fix the 73 CI test failures where the resolver was loading
stale method signatures without the authorization parameter.
The real issue was stale .pyc bytecode files in __pycache__ directories.
These cached files contained the old method signatures without the
authorization parameter, causing signature mismatch errors even though
the source .py files were correct.
Now clearing all __pycache__ directories and .pyc files before the
force-reinstall to ensure Python loads fresh bytecode from the updated
source files.
The CI was using a cached/stale version of the package that didn't
include our authorization parameter changes. Add explicit force
reinstall step to ensure the latest source code is used.
The auto-routing layer was missing the authorization parameter:
- ToolRuntimeRouter.invoke_tool() now accepts and passes authorization
- ToolRuntimeRouter.list_runtime_tools() now accepts and passes authorization
- ToolGroupsRoutingTable.list_tools() now accepts and forwards authorization
- ToolGroupsRoutingTable._index_tools() now accepts and uses authorization
This fixes the '__autorouted__' provider signature mismatch error in CI.
All ToolRuntime provider implementations now have 'authorization' parameter.
Verified locally that signatures are correct after fresh pip install.
CI note: Ensure pip install -e . runs to pick up latest code changes.
Fixed syntax errors in test files that were introduced by batch sed replacement:
- test_tools_with_schemas.py: Removed leftover broken comments and closing brace
- test_mcp_json_schema.py: Removed all instances of broken comment blocks
The sed command left remnants that broke Python syntax.
Updated all ToolRuntime provider implementations to match the protocol signature:
- BraveSearchToolRuntimeImpl
- TavilySearchToolRuntimeImpl
- BingSearchToolRuntimeImpl
- WolframAlphaToolRuntimeImpl
- MemoryToolRuntimeImpl
This fixes the signature mismatch error in CI where protocol had 'authorization' parameter but implementations didn't.
- Add authorization parameter to Tool Runtime API signatures (list_runtime_tools, invoke_tool)
- Update MCP provider implementation to use authorization from request body instead of provider-data
- Deprecate mcp_authorization and mcp_headers from provider-data (MCPProviderDataValidator now empty)
- Update all Tool Runtime tests to pass authorization as request body parameter
- Responses API already uses request body authorization (no changes needed)
This provides a single, consistent way to pass MCP authentication tokens across both APIs, addressing reviewer feedback about avoiding multiple configuration paths.
A few changes to the storage layer to ensure we reduce unnecessary
contention arising out of our design choices (and letting the database
layer do its correct thing):
- SQL stores now share a single `SqlAlchemySqlStoreImpl` per backend,
and `kvstore_impl` caches instances per `(backend, namespace)`. This
avoids spawning multiple SQLite connections for the same file, reducing
lock contention and aligning the cache story for all backends.
- Added an async upsert API (with SQLite/Postgres dialect inserts) and
routed it through `AuthorizedSqlStore`, then switched conversations and
responses to call it. Using native `ON CONFLICT DO UPDATE` eliminates
the insert-then-update retry window that previously caused long WAL lock
retries.
### Test Plan
Existing tests, added a unit test for `upsert()`
Fixes issues in the storage system by guaranteeing immediate durability
for responses and ensuring background writers stay alive. Three related
fixes:
* Responses to the OpenAI-compatible API now write directly to
Postgres/SQLite inside the request instead of detouring through an async
queue that might never drain; this restores the expected
read-after-write behavior and removes the "response not found" races
reported by users.
* The access-control shim was stamping owner_principal/access_attributes
as SQL NULL, which Postgres interprets as non-public rows; fixing it to
use the empty-string/JSON-null pattern means conversations and responses
stored without an authenticated user stay queryable (matching SQLite).
* The inference-store queue remains for batching, but its worker tasks
now start lazily on the live event loop so server startup doesn't cancel
them—writes keep flowing even when the stack is launched via llama stack
run.
Closes#4115
### Test Plan
Added a matrix entry to test our "base" suite against Postgres as the
store.
Updated documentation to accurately reflect current behavior where
models are identified as provider_id/provider_model_id in the system.
Changes:
o Clarify that model_id is for configuration purposes only o Explain
models are accessed as provider_id/provider_model_id o Remove outdated
aliasing example that suggested model_id could be used
as a custom identifier
This corrects the documentation which previously suggested model_id
could be used to create friendly aliases, which is not how the code
actually works.
Signed-off-by: Derek Higgins <derekh@redhat.com>
Help users find the comprehensive integration testing docs by linking to
the record-replay documentation. This clarifies that the technical
README complements the main docs.
# What does this PR do?
- Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to
allow returning `embeddings` and `metadata` using the `extra_query`
- Updates the UI accordingly to display them.
- Update UI to support CRUD operations in the Vector Stores section and
adds a new modal exposing the functionality.
- Updates Vector Store update to fail if a user tries to update Provider
ID (which doesn't make sense to allow)
```python
In [1]: client.vector_stores.files.content(
vector_store_id=vector_store.id,
file_id=file.id,
extra_query={"include_embeddings": True, "include_metadata": True}
)
Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic
-ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt')
```
Screenshots of UI are displayed below:
### List Vector Store with Added "Create New Vector Store"
<img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47
25 PM"
src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3"
/>
### Create New Vector Store
<img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47
49 PM"
src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158"
/>
### Edit Vector Store
<img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48
32 PM"
src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414"
/>
### Vector Store Files Contents page (with Embeddings)
<img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54
32 PM"
src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27"
/>
### Vector Store Files Contents Details page (with Embeddings)
<img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55
00 PM"
src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c"
/>
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Tests added for Middleware extension and Provider failures.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Add explicit connection cleanup and shorter timeouts to OpenAI client
fixtures. Fixes CI deadlock after 25+ tests due to connection pool
exhaustion. Also adds 60s timeout to test_conversation_context_loading
as safety net.
## Test Plan
tests pass
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
This PR adds Stainless config to specify the Meta copyright file header
for generated files.
Doing it via config instead of custom code will reduce the probability
of git conflict.
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
- review preview builds
Update pypdf dependency to address vulnerabilities causing potential
denial of service through infinite loops or excessive memory usage when
handling malicious PDFs. The update remains fully backward compatible,
with no changes to the PdfReader API.
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Fixes#4120
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
# What does this PR do?
In the **Detailed Tutorial**, at **Step 3**, the **Install with venv**
option creates a new virtual environment `client`, activates it then
attempts to install the llama-stack-client using pip.
```
uv venv client --python 3.12
source client/bin/activate
pip install llama-stack-client <- this is the problematic line
```
However, the pip command will likely fail because the `uv venv` command
doesn't, by default, include adding the pip command to the virtual
environment that is created. The pip command will error either because
pip doesn't exist at all, or, if the pip command does exist outside of
the virtual environment, return a different error message. The latter
may be unclear to the user why it is failing.
This PR changes 'pip' to 'uv pip', allowing the install action to
function in the virtual environment as intended, and without the need
for pip to be installed.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
1. Use linux or WSL (virtual environments on Windows use `Scripts`
folder instead of `bin` [virtualenv
#993ba13](993ba1316a)
which doesn't align with the tutorial)
2. Clone the `llama-stack` repo
3. Run the following and verify success:
```
uv venv client --python 3.12
source client/bin/activate
```
5. Run the updated command:
```
uv pip install llama-stack-client
```
6. Observe the console output confirms that the virtual environment
`client` was used:
> Using Python 3.12.3 environment at: **client**
# What does this PR do?
the inspect API lacked any mechanism to get all
non-deprecated APIs (v1, v1alpha, v1beta)
change default to this behavior
'v1' filter can be used for user' wanting a list
of stable APIs
## Test Plan
1. pull the PR
2. launch a LLS server
3. run `curl http://beanlab3.bss.redhat.com:8321/v1/inspect/routes`
4. note there are APIs for `v1`, `v1alpha`, and `v1beta` but no
deprecated APIs
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
Delete ~2,000 lines of dead code from the old bespoke inference API that
was replaced by OpenAI-only API. This includes removing unused type
conversion functions, dead provider methods, and event_logger.py.
Clean up imports across the codebase to remove references to deleted
types. This eliminates unnecessary
code and dependencies, helping isolate the API package as a
self-contained module.
This is the last interdependency between the .api package and "exterior"
packages, meaning that now every other package in llama stack imports
the API, not the other way around.
## Test Plan
this is a structural change, no tests needed.
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# Problem
Responses API uses max_tool_calls parameter to limit the number of tool
calls that can be generated in a response. Currently, LLS implementation
of the Responses API does not support this parameter.
# What does this PR do?
This pull request adds the max_tool_calls field to the response object
definition and updates the inline provider. it also ensures that:
- the total number of calls to built-in and mcp tools do not exceed
max_tool_calls
- an error is thrown if max_tool_calls < 1 (behavior seen with the
OpenAI Responses API, but we can change this if needed)
Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563)
## Test Plan
- Tested manually for change in model response w.r.t supplied
max_tool_calls field.
- Added integration tests to test invalid max_tool_calls parameter.
- Added integration tests to check max_tool_calls parameter with
built-in and function tools.
- Added integration tests to check max_tool_calls parameter in the
returned response object.
- Recorded OpenAI Responses API behavior using a sample script:
https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Adds OCI GenAI PaaS models for openai chat completion endpoints.
## Test Plan
In an OCI tenancy with access to GenAI PaaS, perform the following
steps:
1. Ensure you have IAM policies in place to use service (check docs
included in this PR)
2. For local development, [setup OCI
cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm)
and configure the CLI with your region, tenancy, and auth
[here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm)
3. Once configured, go through llama-stack setup and run llama-stack
(uses config based auth) like:
```bash
OCI_AUTH_TYPE=config_file \
OCI_CLI_PROFILE=CHICAGO \
OCI_REGION=us-chicago-1 \
OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \
llama stack run oci
```
4. Hit the `models` endpoint to list models after server is running:
```bash
curl http://localhost:8321/v1/models | jq
...
{
"identifier": "meta.llama-4-scout-17b-16e-instruct",
"provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q",
"provider_id": "oci",
"type": "model",
"metadata": {
"display_name": "meta.llama-4-scout-17b-16e-instruct",
"capabilities": [
"CHAT"
],
"oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q"
},
"model_type": "llm"
},
...
```
5. Use the "display_name" field to use the model in a
`/chat/completions` request:
```bash
# Streaming result
curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "meta.llama-4-scout-17b-16e-instruct",
"stream": true,
"temperature": 0.9,
"messages": [
{
"role": "system",
"content": "You are a funny comedian. You can be crass."
},
{
"role": "user",
"content": "Tell me a funny joke about programming."
}
]
}'
# Non-streaming result
curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "meta.llama-4-scout-17b-16e-instruct",
"stream": false,
"temperature": 0.9,
"messages": [
{
"role": "system",
"content": "You are a funny comedian. You can be crass."
},
{
"role": "user",
"content": "Tell me a funny joke about programming."
}
]
}'
```
6. Try out other models from the `/models` endpoint.
Mark all register_* / unregister_* APIs as deprecated across models,
shields, tool groups, datasets, benchmarks, and scoring functions. This
is the first step toward moving resource mutations to an `/admin`
namespace as outlined in
https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585.
The deprecation flag will be reflected in the OpenAPI schema to warn API
users that these endpoints are being phased out. Next step will be
implementing the `/admin` route namespace for these resource management
operations.
- `register_model` / `unregister_model`
- `register_shield` / `unregister_shield`
- `register_tool_group` / `unregister_toolgroup`
- `register_dataset` / `unregister_dataset`
- `register_benchmark` / `unregister_benchmark`
- `register_scoring_function` / `unregister_scoring_function`