mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-05 10:23:44 +00:00
feat!: standardize base_url for inference
Completes #3732 by removing runtime URL transformations and requiring users to provide full URLs in configuration. All providers now use 'base_url' consistently and respect the exact URL provided without appending paths like /v1 or /openai/v1 at runtime. Add unit test to enforce URL standardization across remote inference providers (verifies all use 'base_url' field with HttpUrl | None type) BREAKING CHANGE: Users must update configs to include full URL paths (e.g., http://localhost:11434/v1 instead of http://localhost:11434). Signed-off-by: Charlie Doern <cdoern@redhat.com>
This commit is contained in:
parent
7093978754
commit
7a9c32f737
67 changed files with 282 additions and 227 deletions
|
|
@ -24,7 +24,7 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `api_base` | `HttpUrl` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com) |
|
||||
| `base_url` | `HttpUrl \| None` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com/openai/v1) |
|
||||
| `api_version` | `str \| None` | No | | Azure API version for Azure (e.g., 2024-12-01-preview) |
|
||||
| `api_type` | `str \| None` | No | azure | Azure API type for Azure (e.g., azure) |
|
||||
|
||||
|
|
@ -32,7 +32,7 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview
|
|||
|
||||
```yaml
|
||||
api_key: ${env.AZURE_API_KEY:=}
|
||||
api_base: ${env.AZURE_API_BASE:=}
|
||||
base_url: ${env.AZURE_API_BASE:=}
|
||||
api_version: ${env.AZURE_API_VERSION:=}
|
||||
api_type: ${env.AZURE_API_TYPE:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ Cerebras inference provider for running models on Cerebras Cloud platform.
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `base_url` | `str` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://api.cerebras.ai/v1 | Base URL for the Cerebras API |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
base_url: https://api.cerebras.ai
|
||||
base_url: https://api.cerebras.ai/v1
|
||||
api_key: ${env.CEREBRAS_API_KEY:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ Databricks inference provider for running models on Databricks' unified analytic
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_token` | `SecretStr \| None` | No | | The Databricks API token |
|
||||
| `url` | `str \| None` | No | | The URL for the Databricks model serving endpoint |
|
||||
| `base_url` | `HttpUrl \| None` | No | | The URL for the Databricks model serving endpoint (should include /serving-endpoints path) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.DATABRICKS_HOST:=}
|
||||
base_url: ${env.DATABRICKS_HOST:=}
|
||||
api_token: ${env.DATABRICKS_TOKEN:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `str` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.fireworks.ai/inference/v1
|
||||
base_url: https://api.fireworks.ai/inference/v1
|
||||
api_key: ${env.FIREWORKS_API_KEY:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ Groq inference provider for ultra-fast inference using Groq's LPU technology.
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `str` | No | https://api.groq.com | The URL for the Groq AI server |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://api.groq.com/openai/v1 | The URL for the Groq AI server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.groq.com
|
||||
base_url: https://api.groq.com/openai/v1
|
||||
api_key: ${env.GROQ_API_KEY:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `openai_compat_api_base` | `str` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.llama.com/compat/v1/
|
||||
base_url: https://api.llama.com/compat/v1/
|
||||
api_key: ${env.LLAMA_API_KEY}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,15 +17,13 @@ NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `str` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://integrate.api.nvidia.com/v1 | A base url for accessing the NVIDIA NIM |
|
||||
| `timeout` | `int` | No | 60 | Timeout for the HTTP requests |
|
||||
| `append_api_version` | `bool` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
|
||||
| `rerank_model_to_url` | `dict[str, str]` | No | `{'nv-rerank-qa-mistral-4b:1': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking', 'nvidia/nv-rerankqa-mistral-4b-v3': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking', 'nvidia/llama-3.2-nv-rerankqa-1b-v2': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking'}` | Mapping of rerank model identifiers to their API endpoints. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
|
||||
base_url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com/v1}
|
||||
api_key: ${env.NVIDIA_API_KEY:=}
|
||||
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -16,10 +16,10 @@ Ollama inference provider for running local models through the Ollama runtime.
|
|||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `url` | `str` | No | http://localhost:11434 | |
|
||||
| `base_url` | `HttpUrl \| None` | No | http://localhost:11434/v1 | |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
||||
base_url: ${env.OLLAMA_URL:=http://localhost:11434/v1}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,7 +17,7 @@ OpenAI inference provider for accessing GPT models and other OpenAI services.
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `base_url` | `str` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ Passthrough inference provider for connecting to any external inference service
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `str` | No | | The URL for the passthrough endpoint |
|
||||
| `base_url` | `HttpUrl \| None` | No | | The URL for the passthrough endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.PASSTHROUGH_URL}
|
||||
base_url: ${env.PASSTHROUGH_URL}
|
||||
api_key: ${env.PASSTHROUGH_API_KEY}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ RunPod inference provider for running models on RunPod's cloud GPU platform.
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_token` | `SecretStr \| None` | No | | The API token |
|
||||
| `url` | `str \| None` | No | | The URL for the Runpod model serving endpoint |
|
||||
| `base_url` | `HttpUrl \| None` | No | | The URL for the Runpod model serving endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.RUNPOD_URL:=}
|
||||
base_url: ${env.RUNPOD_URL:=}
|
||||
api_token: ${env.RUNPOD_API_TOKEN}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ SambaNova inference provider for running models on SambaNova's dataflow architec
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `str` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.sambanova.ai/v1
|
||||
base_url: https://api.sambanova.ai/v1
|
||||
api_key: ${env.SAMBANOVA_API_KEY:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -16,10 +16,10 @@ Text Generation Inference (TGI) provider for HuggingFace model serving.
|
|||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `url` | `str` | No | | The URL for the TGI serving endpoint |
|
||||
| `base_url` | `HttpUrl \| None` | No | | The URL for the TGI serving endpoint (should include /v1 path) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.TGI_URL:=}
|
||||
base_url: ${env.TGI_URL:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,11 +17,11 @@ Together AI inference provider for open-source models and collaborative AI devel
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `str` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.together.xyz/v1
|
||||
base_url: https://api.together.xyz/v1
|
||||
api_key: ${env.TOGETHER_API_KEY:=}
|
||||
```
|
||||
|
|
|
|||
|
|
@ -17,14 +17,14 @@ Remote vLLM inference provider for connecting to vLLM servers.
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_token` | `SecretStr \| None` | No | | The API token |
|
||||
| `url` | `str \| None` | No | | The URL for the vLLM model serving endpoint |
|
||||
| `base_url` | `HttpUrl \| None` | No | | The URL for the vLLM model serving endpoint |
|
||||
| `max_tokens` | `int` | No | 4096 | Maximum number of tokens to generate. |
|
||||
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.VLLM_URL:=}
|
||||
base_url: ${env.VLLM_URL:=}
|
||||
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
|
||||
api_token: ${env.VLLM_API_TOKEN:=fake}
|
||||
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
|
||||
|
|
|
|||
|
|
@ -17,14 +17,14 @@ IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform
|
|||
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `str` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
|
||||
| `base_url` | `HttpUrl \| None` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
|
||||
| `project_id` | `str \| None` | No | | The watsonx.ai project ID |
|
||||
| `timeout` | `int` | No | 60 | Timeout for the HTTP requests |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
|
||||
base_url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
|
||||
api_key: ${env.WATSONX_API_KEY:=}
|
||||
project_id: ${env.WATSONX_PROJECT_ID:=}
|
||||
```
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue