Fixed incorrect import in test_mcp_authentication.py:
- Changed: from llama_stack import LlamaStackAsLibraryClient
- To: from llama_stack.core.library_client import LlamaStackAsLibraryClient
This aligns with the correct import pattern used in other test files.
Updates integration tests to use the new mcp_authorization field
instead of the old method of passing Authorization in mcp_headers.
Changes:
- tests/integration/tool_runtime/test_mcp.py
- tests/integration/inference/test_tools_with_schemas.py
- tests/integration/tool_runtime/test_mcp_json_schema.py (6 occurrences)
All tests now use:
provider_data = {"mcp_authorization": {uri: AUTH_TOKEN}}
Instead of the old rejected format:
provider_data = {"mcp_headers": {uri: {"Authorization": f"Bearer {AUTH_TOKEN}"}}}
This aligns with the security architecture that prevents
accidentally leaking inference tokens to MCP servers.
Based on user feedback, improved comments to distinguish between
the two security layers:
1. PRIMARY: Line 89 - Architectural prevention
- get_request_provider_data() only reads from request body
- Never accesses HTTP Authorization header
- This is what actually prevents inference token leakage
2. SECONDARY: Lines 97-104 - Validation prevention
- Rejects Authorization in mcp_headers dict
- Enforces using dedicated mcp_authorization field
- Prevents users from misusing the API
Previous comment was misleading by suggesting the validation
prevented inference token leakage, when the architecture
already ensures that isolation.
Adds inline documentation to help users understand:
- How to structure provider_data in HTTP requests
- Where to place mcp_headers vs mcp_authorization
- Security requirements (no Authorization in headers)
- Token format requirements (without Bearer prefix)
- Example usage with multiple MCP endpoints
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This pull request adds a new workflow that does 2 things:
1. generate [SDK preview
builds](https://www.stainless.com/docs/guides/automate-updates#set-up-automatic-preview-builds)
whenever the OpenAPI spec file is modified in a PR
2. on PR merge, generate SDK builds that will be pushed to the different
SDK repos (i.e start the release process)
> [!NOTE]
> No repo secret `STAINLESS_API_KEY` is needed, the authentication is
done automatically via GitHub OIDC.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
I tested in my fork: https://github.com/stainless-api/llama-stack/pull/3
Completes the TODO for extracting authorization from a dedicated field.
What changed:
- Added mcp_authorization field to MCPProviderDataValidator
- Updated get_headers_from_request() to extract from mcp_authorization
- Authorization is now properly isolated per MCP endpoint
API usage example:
{
"provider_data": {
"mcp_headers": {
"http://mcp-server.com": {
"X-Trace-ID": "trace-123"
}
},
"mcp_authorization": {
"http://mcp-server.com": "mcp_token_xyz789"
}
}
}
Security guarantees:
- Authorization cannot be in mcp_headers (validation rejects it)
- Each MCP endpoint gets its own dedicated token
- No cross-service token leakage possible
Addresses reviewer concern about token isolation between services.
The remote provider now rejects Authorization headers in mcp_headers
to prevent accidentally passing inference tokens to MCP servers.
This makes the remote provider consistent with the inline provider:
- Both reject Authorization in headers dict
- Both require dedicated authorization parameter
- Prevents token leakage across service boundaries
Related changes:
- Added validation in get_headers_from_request()
- Throws ValueError if Authorization found in mcp_headers
- Added TODO for dedicated authorization field in provider_data
Per reviewer feedback, validation should be in the openai_responses.py handler,
not the streaming.py file. Moved validation logic to create_openai_response()
method which is the main entry point for response creation.
- Added validation in create_openai_response() before processing
- Removed duplicate validation from _process_mcp_tool() in streaming.py
- Validation runs early and rejects malformed requests immediately
- Maintains same security check: rejects Authorization in headers dict
Per reviewer feedback, API models should be pure data structures without
business logic. Moved the Authorization header validation from the Pydantic
@model_validator in openai_responses.py to the handler in streaming.py.
- Removed @model_validator from OpenAIResponseInputToolMCP
- Added validation at handler level in _process_mcp_tool()
- Maintains same security check: rejects Authorization in headers dict
- Follows separation of concerns: models are data, handlers have logic
- Add Field(exclude=True) to authorization parameter to prevent token leakage in responses
- Add model validator to reject Authorization header in headers dict
- Users must use dedicated 'authorization' parameter instead of headers
- Headers field is preserved for legitimate non-auth headers (tracing, routing, etc.)
This implements the security requirement that authorization params are never
returned in responses, unlike generic headers which may be echoed back.
# What does this PR do?
Resolves#4102
1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and
the `OpenAIResponseInputToolWebSearch.type` Literal union
2. No changes needed to tool execution logic - all `web_search` types
map to the same underlying tool
3. Backward compatibility is maintained - existing `web_search`,
`web_search_preview`, and `web_search_preview_2025_03_11` types continue
to work
4. Added an integration test case using {"type":
"web_search_2025_08_26"} to verify it works correctly
5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to
reflect that `web_search_2025_08_26` is now supported.
6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist
in the codebase)
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
---------
Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This dependency has been bothering folks for a long time (cc @leseb). We
really needed it due to "library client" which is primarily used for our
tests and is not a part of the Stack server. Anyone who needs to use the
library client can certainly install `llama-stack-client` in their
environment to make that work.
Updated the notebook references to install `llama-stack-client`
additionally when setting things up.
https://github.com/llamastack/llama-stack/pull/4055 cleaned the agents
implementation but while doing so it removed some tests which actually
corresponded to the responses implementation. This PR brings those tests
and assocated recordings back.
(We should likely combine all responses tests into one suite, but that
is beyond the scope of this PR.)
# What does this PR do?
Remove circular dependency by moving tracing from API protocol
definitions
to router implementation layer.
This gets us closer to having a self contained API package with no other
cross-cutting dependencies to other parts of the llama stack codebase.
To the best of our ability, the llama_stack.api should only be type and
protocol definitions.
Changes:
- Create apis/common/tracing.py with marker decorator (zero core
dependencies)
- Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol
classes
- Apply actual tracing in core/resolver.py in `instantiate_provider`
based on protocol marker
- Move MetricResponseMixin from core to apis (it's an API response type)
- APIs package is now self-contained with zero core dependencies
The tracing functionality remains identical - actual trace_protocol from
core
is applied to router implementations at runtime when both telemetry is
enabled
and the protocol has the `__marked_for_tracing__` marker.
## Test Plan
Manual integration test confirms identical behavior to main branch:
```bash
llama stack list-deps --format uv starter | sh
export OLLAMA_URL=http://localhost:11434
llama stack run starter
curl -X POST http://localhost:8321/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "ollama/gpt-oss:20b",
"messages": [{"role": "user", "content": "Say hello"}],
"max_tokens": 10}'
```
Verified identical between main and this branch:
- trace_id present in response
- metrics array with prompt_tokens, completion_tokens, total_tokens
- Server logs show trace_protocol applied to all routers
Existing telemetry integration tests (tests/integration/telemetry/) validate
trace context propagation and span attributes.
relates to #3895
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
o Introduces vLLM provider support to the record/replay testing
framework
o Enabling both recording and replay of vLLM API interactions alongside
existing Ollama support.
The changes enable testing of vLLM functionality. vLLM tests focus on
inference capabilities, while Ollama continues to exercise the full API
surface
including vision features.
--
This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B
appears to be more capable at structure output and tool calls.
---------
Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.
A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models
## Test Plan
Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581
Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']
Using LLM model: passthrough/ollama/llama3.2-vision:11b
Making inference request...
Response: 4.
--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
# What does this PR do?
- when create vector store is called without chunk strategy, we actually
the strategy used so that the value is persisted instead of
strategy='None'
## Test Plan
updated tests
## What does this PR do?
The starter distribution now comes with all the required packages to
support persistent stores—like the agent store, metadata, and
inference—using PostgreSQL. Users can enable PostgreSQL support by
setting the `ENABLE_POSTGRES_STORE=1` environment variable.
This PR consolidates the functionality from the removed `postgres-demo`
distribution into the starter distribution, reducing maintenance
overhead.
**Closes: #2619**
**Supersedes: #2851** (rebased and updated)
## Changes Made
1. **Added PostgreSQL support to starter distribution**
- New `run-with-postgres-store.yaml` configuration
- Automatic config switching via `ENABLE_POSTGRES_STORE` environment
variable
- Removed separate `postgres-demo` distribution
2. **Updated to new build system**
- Integrated postgres switching logic into Containerfile entrypoint
- Uses new `storage_backends` and `storage_stores` API
- Properly configured both PostgreSQL KV store and SQL store
3. **Updated dependencies**
- Added `psycopg2-binary` and `asyncpg` to starter distribution
- All postgres-related dependencies automatically included
## How to Use
### With Docker (PostgreSQL):
```bash
docker run \
-e ENABLE_POSTGRES_STORE=1 \
-e POSTGRES_HOST=your_postgres_host \
-e POSTGRES_PORT=5432 \
-e POSTGRES_DB=llamastack \
-e POSTGRES_USER=llamastack \
-e POSTGRES_PASSWORD=llamastack \
-e OPENAI_API_KEY=your_key \
llamastack/distribution-starter
```
### PostgreSQL environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)
## Test Plan
All pre-commit hooks pass (mypy, ruff, distro-codegen)
`llama stack list-deps starter` confirms psycopg2-binary is included
Storage configuration correctly uses PostgreSQL backends
Container builds successfully with postgres support
## Credits
Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to
work with latest main.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Sébastien Han @leseb
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
list-deps takes positional args OR things like --providers
the issue with this, is that these args need to be optional since by
nature, one or the other can be specified.
add a check to list-deps that checks `if not args.providers and not
args.config`. If this is true, help is printed and we exit.
resolves#4075
## Test Plan
before:
```
╰─ llama stack list-deps
Traceback (most recent call last):
File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module>
sys.exit(main())
^^^^^^
File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
parser.run(args)
File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run
args.func(args)
File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command
return run_stack_list_deps_command(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command
normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config)
^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value
```
after:
```
╰─ llama stack list-deps
usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config | distro]
list the dependencies for a llama stack distribution
positional arguments:
config | distro Path to config file to use or name of known distro (llama stack list for a list). (default: None)
options:
-h, --help show this help message and exit
--providers PROVIDERS
sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple
providers per API. (default: None)
--format {uv,deps-only}
Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only)
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>