feat: add OpenAI-compatible Bedrock provider (#3748)

Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Closes: #3410 ## What does this PR do? Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API. The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers. ## Test Plan **Tested the following scenarios:** - Non-streaming completion - basic request/response flow - Streaming completion - SSE streaming with chunked responses - Multi-turn conversations - context retention across turns - Tool calling - function calling with proper tool_calls format # Bedrock OpenAI-Compatible Provider - Test Results **Model:** `bedrock-inference/openai.gpt-oss-20b-1:0` --- ## Test 1: Model Listing **Request:** ```http GET /v1/models HTTP/1.1 ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "data": [ {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}, {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...} ] } ``` --- ## Test 2: Non-Streaming Completion **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}], "stream": false } ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "choices": [{ "finish_reason": "stop", "message": {"content": "...Hello from Bedrock"} }], "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129} } ``` --- ## Test 3: Streaming Completion **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Count from 1 to 5"}], "stream": true } ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: text/event-stream [6 SSE chunks received] Final content: "1, 2, 3, 4, 5" ``` --- ## Test 4: Error Handling - Invalid Model **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "invalid-model-id", "messages": [{"role": "user", "content": "Hello"}], "stream": false } ``` **Response:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models." } ``` --- ## Test 5: Multi-Turn Conversation **Request 1:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "My name is Alice"}] } ``` **Response 1:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Nice to meet you, Alice! How can I help you today?"} }] } ``` **Request 2 (with history):** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "user", "content": "My name is Alice"}, {"role": "assistant", "content": "...Nice to meet you, Alice!..."}, {"role": "user", "content": "What is my name?"} ] } ``` **Response 2:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Your name is Alice."} }], "usage": {"prompt_tokens": 183, "completion_tokens": 42} } ``` **Context retained across turns** --- ## Test 6: System Messages **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."}, {"role": "user", "content": "Tell me about the weather"} ] } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "Lo! I heed thy request..."} }], "usage": {"completion_tokens": 813} } ``` --- ## Test 7: Tool Calling **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}} } }] } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "finish_reason": "tool_calls", "message": { "tool_calls": [{ "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"} }] } }] } ``` --- ## Test 8: Sampling Parameters **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "Say hello"}], "temperature": 0.7, "top_p": 0.9 } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! 👋 How can I help you today?"} }] } ``` --- ## Test 9: Authentication Error Handling ### Subtest A: Invalid API Key **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ``` --- ### Subtest B: Empty API Key (Fallback to Config) **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": ""} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! How can I assist you today?"} }] } ``` **Fell back to config key** --- ### Subtest C: Malformed Token **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ```
2025-12-03 18:00:36 +00:00 · 2025-11-06 20:18:18 -05:00 · 2025-11-06 20:18:18 -05:00 · e894e36eea
commit e894e36eea
parent a2c4c12384
15 changed files with 309 additions and 190 deletions
--- a/src/llama_stack/providers/remote/inference/bedrock/init.py
+++ b/src/llama_stack/providers/remote/inference/bedrock/init.py
@ -11,7 +11,7 @@ async def get_adapter_impl(config: BedrockConfig, _deps):

    assert isinstance(config, BedrockConfig), f"Unexpected config type: {type(config)}"

-    impl = BedrockInferenceAdapter(config)
+    impl = BedrockInferenceAdapter(config=config)

    await impl.initialize()

--- a/src/llama_stack/providers/remote/inference/bedrock/bedrock.py
+++ b/src/llama_stack/providers/remote/inference/bedrock/bedrock.py
@ -4,139 +4,124 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.

-import json
-from collections.abc import AsyncIterator
+from collections.abc import AsyncIterator, Iterable

-from botocore.client import BaseClient
+from openai import AuthenticationError

 from llama_stack.apis.inference import (
-    ChatCompletionRequest,
-    Inference,
+    OpenAIChatCompletion,
+    OpenAIChatCompletionChunk,
    OpenAIChatCompletionRequestWithExtraBody,
+    OpenAICompletion,
    OpenAICompletionRequestWithExtraBody,
    OpenAIEmbeddingsRequestWithExtraBody,
    OpenAIEmbeddingsResponse,
 )
-from llama_stack.apis.inference.inference import (
-    OpenAIChatCompletion,
-    OpenAIChatCompletionChunk,
-    OpenAICompletion,
-)
-from llama_stack.providers.remote.inference.bedrock.config import BedrockConfig
-from llama_stack.providers.utils.bedrock.client import create_bedrock_client
-from llama_stack.providers.utils.inference.model_registry import (
-    ModelRegistryHelper,
-)
-from llama_stack.providers.utils.inference.openai_compat import (
-    get_sampling_strategy_options,
-)
-from llama_stack.providers.utils.inference.prompt_adapter import (
-    chat_completion_request_to_prompt,
-)
+from llama_stack.core.telemetry.tracing import get_current_span
+from llama_stack.log import get_logger
+from llama_stack.providers.utils.inference.openai_mixin import OpenAIMixin

-from .models import MODEL_ENTRIES
+from .config import BedrockConfig

-REGION_PREFIX_MAP = {
-    "us": "us.",
-    "eu": "eu.",
-    "ap": "ap.",
-}
+logger = get_logger(name=__name__, category="inference::bedrock")


-def _get_region_prefix(region: str | None) -> str:
-    # AWS requires region prefixes for inference profiles
-    if region is None:
-        return "us."  # default to US when we don't know
+class BedrockInferenceAdapter(OpenAIMixin):
+    """
+    Adapter for AWS Bedrock's OpenAI-compatible API endpoints.

-    # Handle case insensitive region matching
-    region_lower = region.lower()
-    for prefix in REGION_PREFIX_MAP:
-        if region_lower.startswith(f"{prefix}-"):
-            return REGION_PREFIX_MAP[prefix]
+    Supports Llama models across regions and GPT-OSS models (us-west-2 only).

-    # Fallback to US for anything we don't recognize
-    return "us."
+    Note: Bedrock's OpenAI-compatible endpoint does not support /v1/models
+    for dynamic model discovery. Models must be pre-registered in the config.
+    """

+    config: BedrockConfig
+    provider_data_api_key_field: str = "aws_bedrock_api_key"

-def _to_inference_profile_id(model_id: str, region: str = None) -> str:
-    # Return ARNs unchanged
-    if model_id.startswith("arn:"):
-        return model_id
+    def get_base_url(self) -> str:
+        """Get base URL for OpenAI client."""
+        return f"https://bedrock-runtime.{self.config.region_name}.amazonaws.com/openai/v1"

-    # Return inference profile IDs that already have regional prefixes
-    if any(model_id.startswith(p) for p in REGION_PREFIX_MAP.values()):
-        return model_id
+    async def list_provider_model_ids(self) -> Iterable[str]:
+        """
+        Bedrock's OpenAI-compatible endpoint does not support the /v1/models endpoint.
+        Returns empty list since models must be pre-registered in the config.
+        """
+        return []

-    # Default to US East when no region is provided
-    if region is None:
-        region = "us-east-1"
-
-    return _get_region_prefix(region) + model_id
-
-
-class BedrockInferenceAdapter(
-    ModelRegistryHelper,
-    Inference,
-):
-    def __init__(self, config: BedrockConfig) -> None:
-        ModelRegistryHelper.__init__(self, model_entries=MODEL_ENTRIES)
-        self._config = config
-        self._client = None
-
-    @property
-    def client(self) -> BaseClient:
-        if self._client is None:
-            self._client = create_bedrock_client(self._config)
-        return self._client
-
-    async def initialize(self) -> None:
-        pass
-
-    async def shutdown(self) -> None:
-        if self._client is not None:
-            self._client.close()
-
-    async def _get_params_for_chat_completion(self, request: ChatCompletionRequest) -> dict:
-        bedrock_model = request.model
-
-        sampling_params = request.sampling_params
-        options = get_sampling_strategy_options(sampling_params)
-
-        if sampling_params.max_tokens:
-            options["max_gen_len"] = sampling_params.max_tokens
-        if sampling_params.repetition_penalty > 0:
-            options["repetition_penalty"] = sampling_params.repetition_penalty
-
-        prompt = await chat_completion_request_to_prompt(request, self.get_llama_model(request.model))
-
-        # Convert foundation model ID to inference profile ID
-        region_name = self.client.meta.region_name
-        inference_profile_id = _to_inference_profile_id(bedrock_model, region_name)
-
-        return {
-            "modelId": inference_profile_id,
-            "body": json.dumps(
-                {
-                    "prompt": prompt,
-                    **options,
-                }
-            ),
-        }
+    async def check_model_availability(self, model: str) -> bool:
+        """
+        Bedrock doesn't support dynamic model listing via /v1/models.
+        Always return True to accept all models registered in the config.
+        """
+        return True

    async def openai_embeddings(
        self,
        params: OpenAIEmbeddingsRequestWithExtraBody,
    ) -> OpenAIEmbeddingsResponse:
-        raise NotImplementedError()
+        """Bedrock's OpenAI-compatible API does not support the /v1/embeddings endpoint."""
+        raise NotImplementedError(
+            "Bedrock's OpenAI-compatible API does not support /v1/embeddings endpoint. "
+            "See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-chat-completions.html"
+        )

    async def openai_completion(
        self,
        params: OpenAICompletionRequestWithExtraBody,
    ) -> OpenAICompletion:
-        raise NotImplementedError("OpenAI completion not supported by the Bedrock provider")
+        """Bedrock's OpenAI-compatible API does not support the /v1/completions endpoint."""
+        raise NotImplementedError(
+            "Bedrock's OpenAI-compatible API does not support /v1/completions endpoint. "
+            "Only /v1/chat/completions is supported. "
+            "See https://docs.aws.amazon.com/bedrock/latest/userguide/inference-chat-completions.html"
+        )

    async def openai_chat_completion(
        self,
        params: OpenAIChatCompletionRequestWithExtraBody,
    ) -> OpenAIChatCompletion | AsyncIterator[OpenAIChatCompletionChunk]:
-        raise NotImplementedError("OpenAI chat completion not supported by the Bedrock provider")
+        """Override to enable streaming usage metrics and handle authentication errors."""
+        # Enable streaming usage metrics when telemetry is active
+        if params.stream and get_current_span() is not None:
+            if params.stream_options is None:
+                params.stream_options = {"include_usage": True}
+            elif "include_usage" not in params.stream_options:
+                params.stream_options = {**params.stream_options, "include_usage": True}
+
+        try:
+            logger.debug(f"Calling Bedrock OpenAI API with model={params.model}, stream={params.stream}")
+            result = await super().openai_chat_completion(params=params)
+            logger.debug(f"Bedrock API returned: {type(result).__name__ if result is not None else 'None'}")
+
+            if result is None:
+                logger.error(f"Bedrock OpenAI client returned None for model={params.model}, stream={params.stream}")
+                raise RuntimeError(
+                    f"Bedrock API returned no response for model '{params.model}'. "
+                    "This may indicate the model is not supported or a network/API issue occurred."
+                )
+
+            return result
+        except AuthenticationError as e:
+            error_msg = str(e)
+
+            # Check if this is a token expiration error
+            if "expired" in error_msg.lower() or "Bearer Token has expired" in error_msg:
+                logger.error(f"AWS Bedrock authentication token expired: {error_msg}")
+                raise ValueError(
+                    "AWS Bedrock authentication failed: Bearer token has expired. "
+                    "The AWS_BEDROCK_API_KEY environment variable contains an expired pre-signed URL. "
+                    "Please refresh your token by generating a new pre-signed URL with AWS credentials. "
+                    "Refer to AWS Bedrock documentation for details on OpenAI-compatible endpoints."
+                ) from e
+            else:
+                logger.error(f"AWS Bedrock authentication failed: {error_msg}")
+                raise ValueError(
+                    f"AWS Bedrock authentication failed: {error_msg}. "
+                    "Please verify your API key is correct in the provider config or x-llamastack-provider-data header. "
+                    "The API key should be a valid AWS pre-signed URL for Bedrock's OpenAI-compatible endpoint."
+                ) from e
+        except Exception as e:
+            logger.error(f"Unexpected error calling Bedrock API: {type(e).__name__}: {e}", exc_info=True)
+            raise
--- a/src/llama_stack/providers/remote/inference/bedrock/config.py
+++ b/src/llama_stack/providers/remote/inference/bedrock/config.py
@ -4,8 +4,29 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.

-from llama_stack.providers.utils.bedrock.config import BedrockBaseConfig
+import os
+
+from pydantic import BaseModel, Field
+
+from llama_stack.providers.utils.inference.model_registry import RemoteInferenceProviderConfig


-class BedrockConfig(BedrockBaseConfig):
-    pass
+class BedrockProviderDataValidator(BaseModel):
+    aws_bedrock_api_key: str | None = Field(
+        default=None,
+        description="API key for Amazon Bedrock",
+    )
+
+
+class BedrockConfig(RemoteInferenceProviderConfig):
+    region_name: str = Field(
+        default_factory=lambda: os.getenv("AWS_DEFAULT_REGION", "us-east-2"),
+        description="AWS Region for the Bedrock Runtime endpoint",
+    )
+
+    @classmethod
+    def sample_run_config(cls, **kwargs):
+        return {
+            "api_key": "${env.AWS_BEDROCK_API_KEY:=}",
+            "region_name": "${env.AWS_DEFAULT_REGION:=us-east-2}",
+        }
--- a/src/llama_stack/providers/remote/inference/bedrock/models.py
+++ b/src/llama_stack/providers/remote/inference/bedrock/models.py
@ -1,29 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from llama_stack.models.llama.sku_types import CoreModelId
-from llama_stack.providers.utils.inference.model_registry import (
-    build_hf_repo_model_entry,
-)
-
-SAFETY_MODELS_ENTRIES = []
-
-
-# https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html
-MODEL_ENTRIES = [
-    build_hf_repo_model_entry(
-        "meta.llama3-1-8b-instruct-v1:0",
-        CoreModelId.llama3_1_8b_instruct.value,
-    ),
-    build_hf_repo_model_entry(
-        "meta.llama3-1-70b-instruct-v1:0",
-        CoreModelId.llama3_1_70b_instruct.value,
-    ),
-    build_hf_repo_model_entry(
-        "meta.llama3-1-405b-instruct-v1:0",
-        CoreModelId.llama3_1_405b_instruct.value,
-    ),
-] + SAFETY_MODELS_ENTRIES