feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s

Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.

Closes: #3410


## What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.

## Test Plan

**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format

# Bedrock OpenAI-Compatible Provider - Test Results


**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`


---

## Test 1: Model Listing

**Request:**
```http
GET /v1/models HTTP/1.1
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}
```

---

## Test 2: Non-Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```

---

## Test 3: Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```

---

## Test 4: Error Handling - Invalid Model

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```

---

## Test 5: Multi-Turn Conversation

**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}
```

**Response 1:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}
```

**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}
```

**Response 2:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```

**Context retained across turns**

---

## Test 6: System Messages

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}
```


---

## Test 7: Tool Calling

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}
```

---

## Test 8: Sampling Parameters

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}
```

---

## Test 9: Authentication Error Handling

### Subtest A: Invalid API Key

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```

---

### Subtest B: Empty API Key (Fallback to Config)

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}
```

 **Fell back to config key**

---

### Subtest C: Malformed Token

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
This commit is contained in:
Sumanth Kamenani 2025-11-06 20:18:18 -05:00 committed by GitHub
parent a2c4c12384
commit e894e36eea
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
15 changed files with 309 additions and 190 deletions

View file

@ -0,0 +1,78 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from types import SimpleNamespace
from unittest.mock import AsyncMock, MagicMock
import pytest
from openai import AuthenticationError
from llama_stack.apis.inference import OpenAIChatCompletionRequestWithExtraBody
from llama_stack.providers.remote.inference.bedrock.bedrock import BedrockInferenceAdapter
from llama_stack.providers.remote.inference.bedrock.config import BedrockConfig
def test_adapter_initialization():
config = BedrockConfig(api_key="test-key", region_name="us-east-1")
adapter = BedrockInferenceAdapter(config=config)
assert adapter.config.auth_credential.get_secret_value() == "test-key"
assert adapter.config.region_name == "us-east-1"
def test_client_url_construction():
config = BedrockConfig(api_key="test-key", region_name="us-west-2")
adapter = BedrockInferenceAdapter(config=config)
assert adapter.get_base_url() == "https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"
def test_api_key_from_config():
config = BedrockConfig(api_key="config-key", region_name="us-east-1")
adapter = BedrockInferenceAdapter(config=config)
assert adapter.config.auth_credential.get_secret_value() == "config-key"
def test_api_key_from_header_overrides_config():
"""Test API key from request header overrides config via client property"""
config = BedrockConfig(api_key="config-key", region_name="us-east-1")
adapter = BedrockInferenceAdapter(config=config)
adapter.provider_data_api_key_field = "aws_bedrock_api_key"
adapter.get_request_provider_data = MagicMock(return_value=SimpleNamespace(aws_bedrock_api_key="header-key"))
# The client property is where header override happens (in OpenAIMixin)
assert adapter.client.api_key == "header-key"
async def test_authentication_error_handling():
"""Test that AuthenticationError from OpenAI client is converted to ValueError with helpful message"""
config = BedrockConfig(api_key="invalid-key", region_name="us-east-1")
adapter = BedrockInferenceAdapter(config=config)
# Mock the parent class method to raise AuthenticationError
mock_response = MagicMock()
mock_response.message = "Invalid authentication credentials"
auth_error = AuthenticationError(message="Invalid authentication credentials", response=mock_response, body=None)
# Create a mock that raises the error
mock_super = AsyncMock(side_effect=auth_error)
# Patch the parent class method
original_method = BedrockInferenceAdapter.__bases__[0].openai_chat_completion
BedrockInferenceAdapter.__bases__[0].openai_chat_completion = mock_super
try:
with pytest.raises(ValueError) as exc_info:
params = OpenAIChatCompletionRequestWithExtraBody(
model="test-model", messages=[{"role": "user", "content": "test"}]
)
await adapter.openai_chat_completion(params=params)
assert "AWS Bedrock authentication failed" in str(exc_info.value)
assert "Please verify your API key" in str(exc_info.value)
finally:
# Restore original method
BedrockInferenceAdapter.__bases__[0].openai_chat_completion = original_method

View file

@ -0,0 +1,39 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from llama_stack.providers.remote.inference.bedrock.config import BedrockConfig
def test_bedrock_config_defaults_no_env(monkeypatch):
"""Test BedrockConfig defaults when env vars are not set"""
monkeypatch.delenv("AWS_BEDROCK_API_KEY", raising=False)
monkeypatch.delenv("AWS_DEFAULT_REGION", raising=False)
config = BedrockConfig()
assert config.auth_credential is None
assert config.region_name == "us-east-2"
def test_bedrock_config_reads_from_env(monkeypatch):
"""Test BedrockConfig field initialization reads from environment variables"""
monkeypatch.setenv("AWS_DEFAULT_REGION", "eu-west-1")
config = BedrockConfig()
assert config.region_name == "eu-west-1"
def test_bedrock_config_with_values():
"""Test BedrockConfig accepts explicit values via alias"""
config = BedrockConfig(api_key="test-key", region_name="us-west-2")
assert config.auth_credential.get_secret_value() == "test-key"
assert config.region_name == "us-west-2"
def test_bedrock_config_sample():
"""Test BedrockConfig sample_run_config returns correct format"""
sample = BedrockConfig.sample_run_config()
assert "api_key" in sample
assert "region_name" in sample
assert sample["api_key"] == "${env.AWS_BEDROCK_API_KEY:=}"
assert sample["region_name"] == "${env.AWS_DEFAULT_REGION:=us-east-2}"

View file

@ -4,50 +4,66 @@
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from llama_stack.providers.remote.inference.bedrock.bedrock import (
_get_region_prefix,
_to_inference_profile_id,
)
from types import SimpleNamespace
from unittest.mock import AsyncMock, PropertyMock, patch
from llama_stack.apis.inference import OpenAIChatCompletionRequestWithExtraBody
from llama_stack.providers.remote.inference.bedrock.bedrock import BedrockInferenceAdapter
from llama_stack.providers.remote.inference.bedrock.config import BedrockConfig
def test_region_prefixes():
assert _get_region_prefix("us-east-1") == "us."
assert _get_region_prefix("eu-west-1") == "eu."
assert _get_region_prefix("ap-south-1") == "ap."
assert _get_region_prefix("ca-central-1") == "us."
def test_can_create_adapter():
config = BedrockConfig(api_key="test-key", region_name="us-east-1")
adapter = BedrockInferenceAdapter(config=config)
# Test case insensitive
assert _get_region_prefix("US-EAST-1") == "us."
assert _get_region_prefix("EU-WEST-1") == "eu."
assert _get_region_prefix("Ap-South-1") == "ap."
# Test None region
assert _get_region_prefix(None) == "us."
assert adapter is not None
assert adapter.config.region_name == "us-east-1"
assert adapter.get_api_key() == "test-key"
def test_model_id_conversion():
# Basic conversion
assert (
_to_inference_profile_id("meta.llama3-1-70b-instruct-v1:0", "us-east-1") == "us.meta.llama3-1-70b-instruct-v1:0"
def test_different_aws_regions():
# just check a couple regions to verify URL construction works
config = BedrockConfig(api_key="key", region_name="us-east-1")
adapter = BedrockInferenceAdapter(config=config)
assert adapter.get_base_url() == "https://bedrock-runtime.us-east-1.amazonaws.com/openai/v1"
config = BedrockConfig(api_key="key", region_name="eu-west-1")
adapter = BedrockInferenceAdapter(config=config)
assert adapter.get_base_url() == "https://bedrock-runtime.eu-west-1.amazonaws.com/openai/v1"
async def test_basic_chat_completion():
"""Test basic chat completion works with OpenAIMixin"""
config = BedrockConfig(api_key="k", region_name="us-east-1")
adapter = BedrockInferenceAdapter(config=config)
class FakeModelStore:
async def has_model(self, model_id):
return True
async def get_model(self, model_id):
return SimpleNamespace(provider_resource_id="meta.llama3-1-8b-instruct-v1:0")
adapter.model_store = FakeModelStore()
fake_response = SimpleNamespace(
id="chatcmpl-123",
choices=[SimpleNamespace(message=SimpleNamespace(content="Hello!", role="assistant"), finish_reason="stop")],
)
# Already has prefix
assert (
_to_inference_profile_id("us.meta.llama3-1-70b-instruct-v1:0", "us-east-1")
== "us.meta.llama3-1-70b-instruct-v1:0"
)
mock_create = AsyncMock(return_value=fake_response)
# ARN should be returned unchanged
arn = "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
assert _to_inference_profile_id(arn, "us-east-1") == arn
class FakeClient:
def __init__(self):
self.chat = SimpleNamespace(completions=SimpleNamespace(create=mock_create))
# ARN should be returned unchanged even without region
assert _to_inference_profile_id(arn) == arn
with patch.object(type(adapter), "client", new_callable=PropertyMock, return_value=FakeClient()):
params = OpenAIChatCompletionRequestWithExtraBody(
model="llama3-1-8b",
messages=[{"role": "user", "content": "hello"}],
stream=False,
)
response = await adapter.openai_chat_completion(params=params)
# Optional region parameter defaults to us-east-1
assert _to_inference_profile_id("meta.llama3-1-70b-instruct-v1:0") == "us.meta.llama3-1-70b-instruct-v1:0"
# Different regions work with optional parameter
assert (
_to_inference_profile_id("meta.llama3-1-70b-instruct-v1:0", "eu-west-1") == "eu.meta.llama3-1-70b-instruct-v1:0"
)
assert response.id == "chatcmpl-123"
assert mock_create.await_count == 1