Fixes#3370
AWS switched to requiring region-prefixed inference profile IDs instead
of foundation model IDs for on-demand throughput. This was causing
ValidationException errors.
Added auto-detection based on boto3 client region to convert model IDs
like meta.llama3-1-70b-instruct-v1:0 to
us.meta.llama3-1-70b-instruct-v1:0 depending on the detected region.
Also handles edge cases like ARNs, case insensitive regions, and None
regions.
Tested with this request.
```json
{
"model_id": "meta.llama3-1-8b-instruct-v1:0",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "tell me a riddle"
}
],
"sampling_params": {
"strategy": {
"type": "top_p",
"temperature": 0.7,
"top_p": 0.9
},
"max_tokens": 512
}
}
```
<img width="1488" height="878" alt="image"
src="https://github.com/user-attachments/assets/0d61beec-3869-4a31-8f37-9f554c280b88"
/>