feat!: standardize base_url for inference (#4177)

# What does this PR do?

Completes #3732 by removing runtime URL transformations and requiring
users to provide full URLs in configuration. All providers now use
'base_url' consistently and respect the exact URL provided without
appending paths like /v1 or /openai/v1 at runtime.

BREAKING CHANGE: Users must update configs to include full URL paths
(e.g., http://localhost:11434/v1 instead of http://localhost:11434).

Closes #3732 

## Test Plan

Existing tests should pass even with the URL changes, due to default
URLs being altered.

Add unit test to enforce URL standardization across remote inference
providers (verifies all use 'base_url' field with HttpUrl | None type)

Signed-off-by: Charlie Doern <cdoern@redhat.com>
This commit is contained in:
Charlie Doern 2025-11-19 11:44:28 -05:00 committed by GitHub
parent 91f1b352b4
commit d5cd0eea14
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
67 changed files with 282 additions and 227 deletions

View file

@ -6,7 +6,7 @@
from pathlib import Path
from pydantic import Field, SecretStr, field_validator
from pydantic import Field, HttpUrl, SecretStr, field_validator
from llama_stack.providers.utils.inference.model_registry import RemoteInferenceProviderConfig
from llama_stack_api import json_schema_type
@ -14,7 +14,7 @@ from llama_stack_api import json_schema_type
@json_schema_type
class VLLMInferenceAdapterConfig(RemoteInferenceProviderConfig):
url: str | None = Field(
base_url: HttpUrl | None = Field(
default=None,
description="The URL for the vLLM model serving endpoint",
)
@ -48,11 +48,11 @@ class VLLMInferenceAdapterConfig(RemoteInferenceProviderConfig):
@classmethod
def sample_run_config(
cls,
url: str = "${env.VLLM_URL:=}",
base_url: str = "${env.VLLM_URL:=}",
**kwargs,
):
return {
"url": url,
"base_url": base_url,
"max_tokens": "${env.VLLM_MAX_TOKENS:=4096}",
"api_token": "${env.VLLM_API_TOKEN:=fake}",
"tls_verify": "${env.VLLM_TLS_VERIFY:=true}",