mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
docs: provider and distro codegen migration (#3531)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> - Updates provider and distro codegen to handle the new format - Migrates provider and distro files to the new format ## Test Plan - Manual testing <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
This commit is contained in:
parent
45da31801c
commit
d23865757f
103 changed files with 1796 additions and 423 deletions
|
|
@ -1,42 +0,0 @@
|
|||
# Inference
|
||||
|
||||
## Overview
|
||||
|
||||
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
||||
|
||||
This API provides the raw interface to the underlying models. Two kinds of models are supported:
|
||||
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
||||
- Embedding models: these models generate embeddings to be used for semantic search.
|
||||
|
||||
This section contains documentation for all available providers for the **inference** API.
|
||||
|
||||
## Providers
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
inline_meta-reference
|
||||
inline_sentence-transformers
|
||||
remote_anthropic
|
||||
remote_azure
|
||||
remote_bedrock
|
||||
remote_cerebras
|
||||
remote_databricks
|
||||
remote_fireworks
|
||||
remote_gemini
|
||||
remote_groq
|
||||
remote_hf_endpoint
|
||||
remote_hf_serverless
|
||||
remote_llama-openai-compat
|
||||
remote_nvidia
|
||||
remote_ollama
|
||||
remote_openai
|
||||
remote_passthrough
|
||||
remote_runpod
|
||||
remote_sambanova
|
||||
remote_tgi
|
||||
remote_together
|
||||
remote_vertexai
|
||||
remote_vllm
|
||||
remote_watsonx
|
||||
```
|
||||
|
|
@ -1,32 +0,0 @@
|
|||
# inline::meta-reference
|
||||
|
||||
## Description
|
||||
|
||||
Meta's reference implementation of inference with support for various model formats and optimization techniques.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `model` | `str \| None` | No | | |
|
||||
| `torch_seed` | `int \| None` | No | | |
|
||||
| `max_seq_len` | `<class 'int'>` | No | 4096 | |
|
||||
| `max_batch_size` | `<class 'int'>` | No | 1 | |
|
||||
| `model_parallel_size` | `int \| None` | No | | |
|
||||
| `create_distributed_process_group` | `<class 'bool'>` | No | True | |
|
||||
| `checkpoint_dir` | `str \| None` | No | | |
|
||||
| `quantization` | `Bf16QuantizationConfig \| Fp8QuantizationConfig \| Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type'` | No | | |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
model: Llama3.2-3B-Instruct
|
||||
checkpoint_dir: ${env.CHECKPOINT_DIR:=null}
|
||||
quantization:
|
||||
type: ${env.QUANTIZATION_TYPE:=bf16}
|
||||
model_parallel_size: ${env.MODEL_PARALLEL_SIZE:=0}
|
||||
max_batch_size: ${env.MAX_BATCH_SIZE:=1}
|
||||
max_seq_len: ${env.MAX_SEQ_LEN:=4096}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,13 +0,0 @@
|
|||
# inline::sentence-transformers
|
||||
|
||||
## Description
|
||||
|
||||
Sentence Transformers inference provider for text embeddings and similarity search.
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,19 +0,0 @@
|
|||
# remote::anthropic
|
||||
|
||||
## Description
|
||||
|
||||
Anthropic inference provider for accessing Claude models and Anthropic's AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for Anthropic models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.ANTHROPIC_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,29 +0,0 @@
|
|||
# remote::azure
|
||||
|
||||
## Description
|
||||
|
||||
|
||||
Azure OpenAI inference provider for accessing GPT models and other Azure services.
|
||||
Provider documentation
|
||||
https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `<class 'pydantic.types.SecretStr'>` | No | | Azure API key for Azure |
|
||||
| `api_base` | `<class 'pydantic.networks.HttpUrl'>` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com) |
|
||||
| `api_version` | `str \| None` | No | | Azure API version for Azure (e.g., 2024-12-01-preview) |
|
||||
| `api_type` | `str \| None` | No | azure | Azure API type for Azure (e.g., azure) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.AZURE_API_KEY:=}
|
||||
api_base: ${env.AZURE_API_BASE:=}
|
||||
api_version: ${env.AZURE_API_VERSION:=}
|
||||
api_type: ${env.AZURE_API_TYPE:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,28 +0,0 @@
|
|||
# remote::bedrock
|
||||
|
||||
## Description
|
||||
|
||||
AWS Bedrock inference provider for accessing various AI models through AWS's managed service.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `aws_access_key_id` | `str \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
|
||||
| `aws_secret_access_key` | `str \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
|
||||
| `aws_session_token` | `str \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
|
||||
| `region_name` | `str \| None` | No | | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
|
||||
| `profile_name` | `str \| None` | No | | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
|
||||
| `total_max_attempts` | `int \| None` | No | | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
|
||||
| `retry_mode` | `str \| None` | No | | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
|
||||
| `connect_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
|
||||
| `read_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
|
||||
| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::cerebras
|
||||
|
||||
## Description
|
||||
|
||||
Cerebras inference provider for running models on Cerebras Cloud platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `base_url` | `<class 'str'>` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
|
||||
| `api_key` | `<class 'pydantic.types.SecretStr'>` | No | | Cerebras API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
base_url: https://api.cerebras.ai
|
||||
api_key: ${env.CEREBRAS_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::databricks
|
||||
|
||||
## Description
|
||||
|
||||
Databricks inference provider for running models on Databricks' unified analytics platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | | The URL for the Databricks model serving endpoint |
|
||||
| `api_token` | `<class 'pydantic.types.SecretStr'>` | No | | The Databricks API token |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.DATABRICKS_HOST:=}
|
||||
api_token: ${env.DATABRICKS_TOKEN:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,22 +0,0 @@
|
|||
# remote::fireworks
|
||||
|
||||
## Description
|
||||
|
||||
Fireworks AI inference provider for Llama models and other AI models on the Fireworks platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Fireworks.ai API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.fireworks.ai/inference/v1
|
||||
api_key: ${env.FIREWORKS_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,19 +0,0 @@
|
|||
# remote::gemini
|
||||
|
||||
## Description
|
||||
|
||||
Google Gemini inference provider for accessing Gemini models and Google's AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for Gemini models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.GEMINI_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::groq
|
||||
|
||||
## Description
|
||||
|
||||
Groq inference provider for ultra-fast inference using Groq's LPU technology.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Groq API key |
|
||||
| `url` | `<class 'str'>` | No | https://api.groq.com | The URL for the Groq AI server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.groq.com
|
||||
api_key: ${env.GROQ_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::hf::endpoint
|
||||
|
||||
## Description
|
||||
|
||||
HuggingFace Inference Endpoints provider for dedicated model serving.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `endpoint_name` | `<class 'str'>` | No | | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
|
||||
| `api_token` | `pydantic.types.SecretStr \| None` | No | | Your Hugging Face user access token (will default to locally saved token if not provided) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
endpoint_name: ${env.INFERENCE_ENDPOINT_NAME}
|
||||
api_token: ${env.HF_API_TOKEN}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::hf::serverless
|
||||
|
||||
## Description
|
||||
|
||||
HuggingFace Inference API serverless provider for on-demand model inference.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `huggingface_repo` | `<class 'str'>` | No | | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
|
||||
| `api_token` | `pydantic.types.SecretStr \| None` | No | | Your Hugging Face user access token (will default to locally saved token if not provided) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
huggingface_repo: ${env.INFERENCE_MODEL}
|
||||
api_token: ${env.HF_API_TOKEN}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::llama-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Llama API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.llama.com/compat/v1/
|
||||
api_key: ${env.LLAMA_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,24 +0,0 @@
|
|||
# remote::nvidia
|
||||
|
||||
## Description
|
||||
|
||||
NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The NVIDIA API key, only needed of using the hosted service |
|
||||
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
|
||||
| `append_api_version` | `<class 'bool'>` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
|
||||
api_key: ${env.NVIDIA_API_KEY:=}
|
||||
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,20 +0,0 @@
|
|||
# remote::ollama
|
||||
|
||||
## Description
|
||||
|
||||
Ollama inference provider for running local models through the Ollama runtime.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | http://localhost:11434 | |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::openai
|
||||
|
||||
## Description
|
||||
|
||||
OpenAI inference provider for accessing GPT models and other OpenAI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for OpenAI models |
|
||||
| `base_url` | `<class 'str'>` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.OPENAI_API_KEY:=}
|
||||
base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::passthrough
|
||||
|
||||
## Description
|
||||
|
||||
Passthrough inference provider for connecting to any external inference service not directly supported.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | API Key for the passthrouth endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.PASSTHROUGH_URL}
|
||||
api_key: ${env.PASSTHROUGH_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::runpod
|
||||
|
||||
## Description
|
||||
|
||||
RunPod inference provider for running models on RunPod's cloud GPU platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `str \| None` | No | | The URL for the Runpod model serving endpoint |
|
||||
| `api_token` | `str \| None` | No | | The API token |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.RUNPOD_URL:=}
|
||||
api_token: ${env.RUNPOD_API_TOKEN}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::sambanova-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
SambaNova OpenAI-compatible provider for using SambaNova models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The SambaNova API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.sambanova.ai/v1
|
||||
api_key: ${env.SAMBANOVA_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# remote::sambanova
|
||||
|
||||
## Description
|
||||
|
||||
SambaNova inference provider for running models on SambaNova's dataflow architecture.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The SambaNova cloud API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.sambanova.ai/v1
|
||||
api_key: ${env.SAMBANOVA_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,19 +0,0 @@
|
|||
# remote::tgi
|
||||
|
||||
## Description
|
||||
|
||||
Text Generation Inference (TGI) provider for HuggingFace model serving.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | | The URL for the TGI serving endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.TGI_URL:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,22 +0,0 @@
|
|||
# remote::together
|
||||
|
||||
## Description
|
||||
|
||||
Together AI inference provider for open-source models and collaborative AI development.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Together AI API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.together.xyz/v1
|
||||
api_key: ${env.TOGETHER_API_KEY:=}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,40 +0,0 @@
|
|||
# remote::vertexai
|
||||
|
||||
## Description
|
||||
|
||||
Google Vertex AI inference provider enables you to use Google's Gemini models through Google Cloud's Vertex AI platform, providing several advantages:
|
||||
|
||||
• Enterprise-grade security: Uses Google Cloud's security controls and IAM
|
||||
• Better integration: Seamless integration with other Google Cloud services
|
||||
• Advanced features: Access to additional Vertex AI features like model tuning and monitoring
|
||||
• Authentication: Uses Google Cloud Application Default Credentials (ADC) instead of API keys
|
||||
|
||||
Configuration:
|
||||
- Set VERTEX_AI_PROJECT environment variable (required)
|
||||
- Set VERTEX_AI_LOCATION environment variable (optional, defaults to us-central1)
|
||||
- Use Google Cloud Application Default Credentials or service account key
|
||||
|
||||
Authentication Setup:
|
||||
Option 1 (Recommended): gcloud auth application-default login
|
||||
Option 2: Set GOOGLE_APPLICATION_CREDENTIALS to service account key path
|
||||
|
||||
Available Models:
|
||||
- vertex_ai/gemini-2.0-flash
|
||||
- vertex_ai/gemini-2.5-flash
|
||||
- vertex_ai/gemini-2.5-pro
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `project` | `<class 'str'>` | No | | Google Cloud project ID for Vertex AI |
|
||||
| `location` | `<class 'str'>` | No | us-central1 | Google Cloud location for Vertex AI |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
project: ${env.VERTEX_AI_PROJECT:=}
|
||||
location: ${env.VERTEX_AI_LOCATION:=us-central1}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,26 +0,0 @@
|
|||
# remote::vllm
|
||||
|
||||
## Description
|
||||
|
||||
Remote vLLM inference provider for connecting to vLLM servers.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `str \| None` | No | | The URL for the vLLM model serving endpoint |
|
||||
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
|
||||
| `api_token` | `str \| None` | No | fake | The API token |
|
||||
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.VLLM_URL:=}
|
||||
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
|
||||
api_token: ${env.VLLM_API_TOKEN:=fake}
|
||||
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -1,24 +0,0 @@
|
|||
# remote::watsonx
|
||||
|
||||
## Description
|
||||
|
||||
IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The watsonx API key |
|
||||
| `project_id` | `str \| None` | No | | The Project ID key |
|
||||
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
|
||||
api_key: ${env.WATSONX_API_KEY:=}
|
||||
project_id: ${env.WATSONX_PROJECT_ID:=}
|
||||
|
||||
```
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue