Commit graph

21 commits

Author SHA1 Message Date
Omar Abdelwahab
9e972cf20c docs: clarify security mechanism comments in get_headers_from_request
Based on user feedback, improved comments to distinguish between
the two security layers:

1. PRIMARY: Line 89 - Architectural prevention
   - get_request_provider_data() only reads from request body
   - Never accesses HTTP Authorization header
   - This is what actually prevents inference token leakage

2. SECONDARY: Lines 97-104 - Validation prevention
   - Rejects Authorization in mcp_headers dict
   - Enforces using dedicated mcp_authorization field
   - Prevents users from misusing the API

Previous comment was misleading by suggesting the validation
prevented inference token leakage, when the architecture
already ensures that isolation.
2025-11-07 14:05:48 -08:00
Omar Abdelwahab
2295a1aad5 formatting changes 2025-11-07 14:01:54 -08:00
Omar Abdelwahab
a2098eea27 docs: add comprehensive docstring for MCPProviderDataValidator
Adds inline documentation to help users understand:
- How to structure provider_data in HTTP requests
- Where to place mcp_headers vs mcp_authorization
- Security requirements (no Authorization in headers)
- Token format requirements (without Bearer prefix)
- Example usage with multiple MCP endpoints
2025-11-07 13:50:23 -08:00
Omar Abdelwahab
ccb870c8fb precommit 2025-11-07 12:14:42 -08:00
Omar Abdelwahab
445135b8cc feat: implement dedicated mcp_authorization field for remote provider
Completes the TODO for extracting authorization from a dedicated field.

What changed:
- Added mcp_authorization field to MCPProviderDataValidator
- Updated get_headers_from_request() to extract from mcp_authorization
- Authorization is now properly isolated per MCP endpoint

API usage example:
{
  "provider_data": {
    "mcp_headers": {
      "http://mcp-server.com": {
        "X-Trace-ID": "trace-123"
      }
    },
    "mcp_authorization": {
      "http://mcp-server.com": "mcp_token_xyz789"
    }
  }
}

Security guarantees:
- Authorization cannot be in mcp_headers (validation rejects it)
- Each MCP endpoint gets its own dedicated token
- No cross-service token leakage possible
2025-11-07 11:45:47 -08:00
Omar Abdelwahab
a842c90059 security: enforce Authorization rejection in remote MCP provider
Addresses reviewer concern about token isolation between services.
The remote provider now rejects Authorization headers in mcp_headers
to prevent accidentally passing inference tokens to MCP servers.

This makes the remote provider consistent with the inline provider:
- Both reject Authorization in headers dict
- Both require dedicated authorization parameter
- Prevents token leakage across service boundaries

Related changes:
- Added validation in get_headers_from_request()
- Throws ValueError if Authorization found in mcp_headers
- Added TODO for dedicated authorization field in provider_data
2025-11-07 11:34:33 -08:00
Omar Abdelwahab
5ce48d2c6a precommit 2025-11-06 12:02:45 -08:00
Omar Abdelwahab
e8cb52683d Updated get_headers_from_request 2025-11-06 11:41:33 -08:00
Omar Abdelwahab
411b18a90f
Merge branch 'main' into add-mcp-authentication-param 2025-11-05 14:12:32 -08:00
Omar Abdelwahab
7db4ed7bbb fix: update MCP tool runtime provider to use new function signatures
Updated list_mcp_tools and invoke_mcp_tool calls to use named parameters
instead of positional arguments to match the refactored API signatures.
2025-11-05 13:21:12 -08:00
Wojciech-Rebisz
07c28cd519
fix: Avoid model_limits KeyError (#4060)
# What does this PR do?
It avoids model_limit KeyError while trying to get embedding models for
Watsonx

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/llamastack/llama-stack/issues/4059

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Start server with watsonx distro:
```bash
llama stack list-deps watsonx | xargs -L1 uv pip install
uv run llama stack run watsonx
```
Run 
```python
client = LlamaStackClient(base_url=base_url)
client.models.list()
```
Check if there is any embedding model available (currently there is not
a single one)
2025-11-05 10:34:40 -08:00
Nathan Weinberg
62b3ad349a
fix: return to hardcoded model IDs for Vertex AI (#4041)
# What does this PR do?
partial revert of b67aef2

Vertex AI doesn't offer an endpoint for listing models from Google's
Model Garden

Return to hardcoded values until such an endpoint is available

Closes #3988 

## Test Plan
Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`,
`VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the
starter distribution
```bash
$ llama stack list-deps starter | xargs -L1 uv pip install
$ llama stack run starter
```

Client side, formerly broken cURL requests now working
```bash
$ curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.provider_id == "vertexai"))'
[
  {
    "identifier": "vertexai/vertex_ai/gemini-2.0-flash",
    "provider_resource_id": "vertex_ai/gemini-2.0-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-flash",
    "provider_resource_id": "vertex_ai/gemini-2.5-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-pro",
    "provider_resource_id": "vertex_ai/gemini-2.5-pro",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  }
]
$ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a
i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" | jq 
{                                                                                                                                    
  "id": "p8oIaYiQF8_PptQPo-GH8QQ",                                                                                                   
  "choices": [                                                                                                                       
    {                                                                                                                                
      "finish_reason": "stop",                                                                                                       
      "index": 0,                                                                                                                    
      "logprobs": null,                                                                                                              
      "message": {                                                                                                                   
        "content": "Hello there! How can I help you today?",                                                                         
        "refusal": null,                                                                                                             
        "role": "assistant",                                                                                                         
        "annotations": null,                                                                                                         
        "audio": null,                                                                                                               
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
...
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-03 17:38:16 -08:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
Ashwin Bharambe
da8f014b96
feat(models): list models available via provider_data header (#3968)
## Summary

When users provide API keys via `X-LlamaStack-Provider-Data` header,
`models.list()` now returns models they can access from those providers,
not just pre-registered models from the registry.

This complements the routing fix from f88416ef8 which enabled inference
calls with `provider_id/model_id` format for unregistered models. Users
can now discover which models are available to them before making
inference requests.

The implementation reuses
`NeedsRequestProviderData.get_request_provider_data()` to validate
credentials, then dynamically fetches models from providers without
caching them since they're user-specific. Registry models take
precedence to respect any pre-configured aliases.

## Test Script

```python
#!/usr/bin/env python3
import json
import os
from openai import OpenAI

# Test 1: Without provider_data header
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy")
models = client.models.list()
anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id]
print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic")

# Test 2: With provider_data header containing Anthropic API key
anthropic_api_key = os.environ["ANTHROPIC_API_KEY"]
client_with_key = OpenAI(
    base_url="http://localhost:8321/v1/openai/v1",
    api_key="dummy",
    default_headers={
        "X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key})
    }
)
models_with_key = client_with_key.models.list()
anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id]
print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic")
print(f"Anthropic models: {anthropic_with}")

assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key"
print("\n✓ Test passed!")
```

Run with a stack that has Anthropic provider configured (but without API
key in config):
```bash
ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py
```
2025-10-29 14:03:03 -07:00
ehhuang
1f9d48cd54
feat: openai files provider (#3946)
# What does this PR do?
- Adds OpenAI files provider 
- Note that file content retrieval is pretty limited by `purpose`
https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com

## Test Plan
Modify run yaml to use openai files provider:
```
  files:
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY:=}
      metadata_store:
        backend: sql_default
        table_name: openai_files_metadata

# Then run files tests
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files
```
2025-10-28 16:25:03 -07:00
Ashwin Bharambe
94b0592240
fix(mypy): add type stubs and fix typing issues (#3938)
Adds type stubs and fixes mypy errors for better type coverage.

Changes:
- Added type_checking dependency group with type stubs (torchtune, trl,
etc.)
- Added lm-format-enforcer to pre-commit hook
- Created HFAutoModel Protocol for type-safe HuggingFace model handling
- Added mypy.overrides for untyped libraries (torchtune, fairscale,
etc.)
- Fixed type issues in post-training providers, databricks, and
api_recorder

Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude
list).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 11:00:09 -07:00
Ashwin Bharambe
1d385b5b75
fix(mypy): resolve OpenAI SDK and provider type issues (#3936)
## Summary
- Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls
- Fix incorrect OpenAIChatCompletionChunk import in vllm provider
- Refactor to avoid type:ignore comments by using conditional kwargs

## Changes
**openai_mixin.py (9 errors fixed):**
- Build kwargs conditionally for embeddings.create() to avoid
NotGiven/Omit mismatch
- Only include parameters when they have actual values (not None)

**gemini.py (9 errors fixed):**
- Apply same conditional kwargs pattern
- Add missing Any import

**vllm.py (2 errors fixed):**
- Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference
- Remove incorrect alias from openai package

## Technical Notes
The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type
`NotGiven` but parameter signatures expect `Omit`. By only passing
parameters with actual values, we avoid this mismatch entirely without
needing `# type: ignore` comments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:54:29 -07:00
Ashwin Bharambe
d009dc29f7
fix(mypy): resolve provider utility and testing type issues (#3935)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 51s
Pre-commit / pre-commit (push) Successful in 2m0s
Fixes mypy type errors in provider utilities and testing infrastructure:
- `mcp.py`: Cast incompatible client types, wrap image data properly
- `batches.py`: Rename walrus variable to avoid shadowing
- `api_recorder.py`: Use cast for Pydantic field annotation

No functional changes.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:37:27 -07:00
ehhuang
b7dd3f5c56
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do?


## Test Plan
CI
vector_io tests will fail until next client sync

passed with
https://github.com/llamastack/llama-stack-client-python/pull/286 checked
out locally
2025-10-27 14:26:06 -07:00
Matthew Farrellee
a9b00db421
feat: add provider data keys for Cerebras, Databricks, NVIDIA, and RunPod (#3734)
# What does this PR do?

add provider-data key passing support to Cerebras, Databricks, NVIDIA
and RunPod

also, added missing tests for Fireworks, Anthropic, Gemini, SambaNova,
and vLLM

addresses #3517 

## Test Plan

ci w/ new tests

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-27 13:09:35 -07:00
Ashwin Bharambe
471b1b248b
chore(package): migrate to src/ layout (#3920)
Migrates package structure to src/ layout following Python packaging
best practices.

All code moved from `llama_stack/` to `src/llama_stack/`. Public API
unchanged - imports remain `import llama_stack.*`.

Updated build configs, pre-commit hooks, scripts, and GitHub workflows
accordingly. All hooks pass, package builds cleanly.

**Developer note**: Reinstall after pulling: `pip install -e .`
2025-10-27 12:02:21 -07:00