Sumanth Kamenani
e894e36eea
feat: add OpenAI-compatible Bedrock provider ( #3748 )
...
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.
Closes : #3410
## What does this PR do?
Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.
The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.
## Test Plan
**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format
# Bedrock OpenAI-Compatible Provider - Test Results
**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`
---
## Test 1: Model Listing
**Request:**
```http
GET /v1/models HTTP/1.1
```
**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json
{
"data": [
{"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
{"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
]
}
```
---
## Test 2: Non-Streaming Completion
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json
{
"model": "bedrock-inference/openai.gpt-oss-20b-1:0",
"messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
"stream": false
}
```
**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json
{
"choices": [{
"finish_reason": "stop",
"message": {"content": "...Hello from Bedrock"}
}],
"usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```
---
## Test 3: Streaming Completion
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json
{
"model": "bedrock-inference/openai.gpt-oss-20b-1:0",
"messages": [{"role": "user", "content": "Count from 1 to 5"}],
"stream": true
}
```
**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream
[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```
---
## Test 4: Error Handling - Invalid Model
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json
{
"model": "invalid-model-id",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}
```
**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json
{
"detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```
---
## Test 5: Multi-Turn Conversation
**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1
{
"messages": [{"role": "user", "content": "My name is Alice"}]
}
```
**Response 1:**
```http
HTTP/1.1 200 OK
{
"choices": [{
"message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
}]
}
```
**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1
{
"messages": [
{"role": "user", "content": "My name is Alice"},
{"role": "assistant", "content": "...Nice to meet you, Alice!..."},
{"role": "user", "content": "What is my name?"}
]
}
```
**Response 2:**
```http
HTTP/1.1 200 OK
{
"choices": [{
"message": {"content": "...Your name is Alice."}
}],
"usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```
**Context retained across turns**
---
## Test 6: System Messages
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
{
"messages": [
{"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
{"role": "user", "content": "Tell me about the weather"}
]
}
```
**Response:**
```http
HTTP/1.1 200 OK
{
"choices": [{
"message": {"content": "Lo! I heed thy request..."}
}],
"usage": {"completion_tokens": 813}
}
```
---
## Test 7: Tool Calling
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
{
"messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
}
}]
}
```
**Response:**
```http
HTTP/1.1 200 OK
{
"choices": [{
"finish_reason": "tool_calls",
"message": {
"tool_calls": [{
"function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
}]
}
}]
}
```
---
## Test 8: Sampling Parameters
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
{
"messages": [{"role": "user", "content": "Say hello"}],
"temperature": 0.7,
"top_p": 0.9
}
```
**Response:**
```http
HTTP/1.1 200 OK
{
"choices": [{
"message": {"content": "...Hello! 👋 How can I help you today?"}
}]
}
```
---
## Test 9: Authentication Error Handling
### Subtest A: Invalid API Key
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}
{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```
**Response:**
```http
HTTP/1.1 400 Bad Request
{
"detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
---
### Subtest B: Empty API Key (Fallback to Config)
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}
{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```
**Response:**
```http
HTTP/1.1 200 OK
{
"choices": [{
"message": {"content": "...Hello! How can I assist you today?"}
}]
}
```
**Fell back to config key**
---
### Subtest C: Malformed Token
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}
{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```
**Response:**
```http
HTTP/1.1 400 Bad Request
{
"detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
2025-11-06 17:18:18 -08:00