chore: Enabling Milvus for VectorIO CI

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-12-27 15:08:06 +00:00 · 2025-06-27 21:25:57 -04:00 · 2025-06-27 21:25:57 -04:00 · c8d41d45ec
commit c8d41d45ec
parent 709eb7da33
115 changed files with 2919 additions and 184 deletions
--- a/docs/source/providers/inference/index.md
+++ b/docs/source/providers/inference/index.md
@ -0,0 +1,32 @@
+# Inference Providers
+
+This section contains documentation for all available providers for the **inference** API.
+
+- [inline::meta-reference](inline_meta-reference.md)
+- [inline::sentence-transformers](inline_sentence-transformers.md)
+- [inline::vllm](inline_vllm.md)
+- [remote::anthropic](remote_anthropic.md)
+- [remote::bedrock](remote_bedrock.md)
+- [remote::cerebras](remote_cerebras.md)
+- [remote::cerebras-openai-compat](remote_cerebras-openai-compat.md)
+- [remote::databricks](remote_databricks.md)
+- [remote::fireworks](remote_fireworks.md)
+- [remote::fireworks-openai-compat](remote_fireworks-openai-compat.md)
+- [remote::gemini](remote_gemini.md)
+- [remote::groq](remote_groq.md)
+- [remote::groq-openai-compat](remote_groq-openai-compat.md)
+- [remote::hf::endpoint](remote_hf_endpoint.md)
+- [remote::hf::serverless](remote_hf_serverless.md)
+- [remote::llama-openai-compat](remote_llama-openai-compat.md)
+- [remote::nvidia](remote_nvidia.md)
+- [remote::ollama](remote_ollama.md)
+- [remote::openai](remote_openai.md)
+- [remote::passthrough](remote_passthrough.md)
+- [remote::runpod](remote_runpod.md)
+- [remote::sambanova](remote_sambanova.md)
+- [remote::sambanova-openai-compat](remote_sambanova-openai-compat.md)
+- [remote::tgi](remote_tgi.md)
+- [remote::together](remote_together.md)
+- [remote::together-openai-compat](remote_together-openai-compat.md)
+- [remote::vllm](remote_vllm.md)
+- [remote::watsonx](remote_watsonx.md)
--- a/docs/source/providers/inference/inline_meta-reference.md
+++ b/docs/source/providers/inference/inline_meta-reference.md
@ -0,0 +1,32 @@
+# inline::meta-reference
+
+## Description
+
+Meta's reference implementation of inference with support for various model formats and optimization techniques.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `model` | `str \| None` | No |  |  |
+| `torch_seed` | `int \| None` | No |  |  |
+| `max_seq_len` | `<class 'int'>` | No | 4096 |  |
+| `max_batch_size` | `<class 'int'>` | No | 1 |  |
+| `model_parallel_size` | `int \| None` | No |  |  |
+| `create_distributed_process_group` | `<class 'bool'>` | No | True |  |
+| `checkpoint_dir` | `str \| None` | No |  |  |
+| `quantization` | `Bf16QuantizationConfig \| Fp8QuantizationConfig \| Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type'` | No |  |  |
+
+## Sample Configuration
+
+```yaml
+model: Llama3.2-3B-Instruct
+checkpoint_dir: ${env.CHECKPOINT_DIR:=null}
+quantization:
+  type: ${env.QUANTIZATION_TYPE:=bf16}
+model_parallel_size: ${env.MODEL_PARALLEL_SIZE:=0}
+max_batch_size: ${env.MAX_BATCH_SIZE:=1}
+max_seq_len: ${env.MAX_SEQ_LEN:=4096}
+
+```
+
--- a/docs/source/providers/inference/inline_sentence-transformers.md
+++ b/docs/source/providers/inference/inline_sentence-transformers.md
@ -0,0 +1,13 @@
+# inline::sentence-transformers
+
+## Description
+
+Sentence Transformers inference provider for text embeddings and similarity search.
+
+## Sample Configuration
+
+```yaml
+{}
+
+```
+
--- a/docs/source/providers/inference/inline_vllm.md
+++ b/docs/source/providers/inference/inline_vllm.md
@ -0,0 +1,29 @@
+# inline::vllm
+
+## Description
+
+vLLM inference provider for high-performance model serving with PagedAttention and continuous batching.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `tensor_parallel_size` | `<class 'int'>` | No | 1 | Number of tensor parallel replicas (number of GPUs to use). |
+| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
+| `max_model_len` | `<class 'int'>` | No | 4096 | Maximum context length to use during serving. |
+| `max_num_seqs` | `<class 'int'>` | No | 4 | Maximum parallel batch size for generation. |
+| `enforce_eager` | `<class 'bool'>` | No | False | Whether to use eager mode for inference (otherwise cuda graphs are used). |
+| `gpu_memory_utilization` | `<class 'float'>` | No | 0.3 | How much GPU memory will be allocated when this provider has finished loading, including memory that was already allocated before loading. |
+
+## Sample Configuration
+
+```yaml
+tensor_parallel_size: ${env.TENSOR_PARALLEL_SIZE:=1}
+max_tokens: ${env.MAX_TOKENS:=4096}
+max_model_len: ${env.MAX_MODEL_LEN:=4096}
+max_num_seqs: ${env.MAX_NUM_SEQS:=4}
+enforce_eager: ${env.ENFORCE_EAGER:=False}
+gpu_memory_utilization: ${env.GPU_MEMORY_UTILIZATION:=0.3}
+
+```
+
--- a/docs/source/providers/inference/remote_anthropic.md
+++ b/docs/source/providers/inference/remote_anthropic.md
@ -0,0 +1,19 @@
+# remote::anthropic
+
+## Description
+
+Anthropic inference provider for accessing Claude models and Anthropic's AI services.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | API key for Anthropic models |
+
+## Sample Configuration
+
+```yaml
+api_key: ${env.ANTHROPIC_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_bedrock.md
+++ b/docs/source/providers/inference/remote_bedrock.md
@ -0,0 +1,28 @@
+# remote::bedrock
+
+## Description
+
+AWS Bedrock inference provider for accessing various AI models through AWS's managed service.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `aws_access_key_id` | `str \| None` | No |  | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
+| `aws_secret_access_key` | `str \| None` | No |  | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
+| `aws_session_token` | `str \| None` | No |  | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
+| `region_name` | `str \| None` | No |  | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
+| `profile_name` | `str \| None` | No |  | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
+| `total_max_attempts` | `int \| None` | No |  | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
+| `retry_mode` | `str \| None` | No |  | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
+| `connect_timeout` | `float \| None` | No | 60 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
+| `read_timeout` | `float \| None` | No | 60 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
+| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
+
+## Sample Configuration
+
+```yaml
+{}
+
+```
+
--- a/docs/source/providers/inference/remote_cerebras-openai-compat.md
+++ b/docs/source/providers/inference/remote_cerebras-openai-compat.md
@ -0,0 +1,21 @@
+# remote::cerebras-openai-compat
+
+## Description
+
+Cerebras OpenAI-compatible provider for using Cerebras models with OpenAI API format.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | The Cerebras API key |
+| `openai_compat_api_base` | `<class 'str'>` | No | https://api.cerebras.ai/v1 | The URL for the Cerebras API server |
+
+## Sample Configuration
+
+```yaml
+openai_compat_api_base: https://api.cerebras.ai/v1
+api_key: ${env.CEREBRAS_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_cerebras.md
+++ b/docs/source/providers/inference/remote_cerebras.md
@ -0,0 +1,21 @@
+# remote::cerebras
+
+## Description
+
+Cerebras inference provider for running models on Cerebras Cloud platform.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `base_url` | `<class 'str'>` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
+| `api_key` | `pydantic.types.SecretStr \| None` | No |  | Cerebras API Key |
+
+## Sample Configuration
+
+```yaml
+base_url: https://api.cerebras.ai
+api_key: ${env.CEREBRAS_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_databricks.md
+++ b/docs/source/providers/inference/remote_databricks.md
@ -0,0 +1,21 @@
+# remote::databricks
+
+## Description
+
+Databricks inference provider for running models on Databricks' unified analytics platform.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No |  | The URL for the Databricks model serving endpoint |
+| `api_token` | `<class 'str'>` | No |  | The Databricks API token |
+
+## Sample Configuration
+
+```yaml
+url: ${env.DATABRICKS_URL}
+api_token: ${env.DATABRICKS_API_TOKEN}
+
+```
+
--- a/docs/source/providers/inference/remote_fireworks-openai-compat.md
+++ b/docs/source/providers/inference/remote_fireworks-openai-compat.md
@ -0,0 +1,21 @@
+# remote::fireworks-openai-compat
+
+## Description
+
+Fireworks AI OpenAI-compatible provider for using Fireworks models with OpenAI API format.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | The Fireworks API key |
+| `openai_compat_api_base` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks API server |
+
+## Sample Configuration
+
+```yaml
+openai_compat_api_base: https://api.fireworks.ai/inference/v1
+api_key: ${env.FIREWORKS_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_fireworks.md
+++ b/docs/source/providers/inference/remote_fireworks.md
@ -0,0 +1,21 @@
+# remote::fireworks
+
+## Description
+
+Fireworks AI inference provider for Llama models and other AI models on the Fireworks platform.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
+| `api_key` | `pydantic.types.SecretStr \| None` | No |  | The Fireworks.ai API Key |
+
+## Sample Configuration
+
+```yaml
+url: https://api.fireworks.ai/inference/v1
+api_key: ${env.FIREWORKS_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_gemini.md
+++ b/docs/source/providers/inference/remote_gemini.md
@ -0,0 +1,19 @@
+# remote::gemini
+
+## Description
+
+Google Gemini inference provider for accessing Gemini models and Google's AI services.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | API key for Gemini models |
+
+## Sample Configuration
+
+```yaml
+api_key: ${env.GEMINI_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_groq-openai-compat.md
+++ b/docs/source/providers/inference/remote_groq-openai-compat.md
@ -0,0 +1,21 @@
+# remote::groq-openai-compat
+
+## Description
+
+Groq OpenAI-compatible provider for using Groq models with OpenAI API format.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | The Groq API key |
+| `openai_compat_api_base` | `<class 'str'>` | No | https://api.groq.com/openai/v1 | The URL for the Groq API server |
+
+## Sample Configuration
+
+```yaml
+openai_compat_api_base: https://api.groq.com/openai/v1
+api_key: ${env.GROQ_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_groq.md
+++ b/docs/source/providers/inference/remote_groq.md
@ -0,0 +1,21 @@
+# remote::groq
+
+## Description
+
+Groq inference provider for ultra-fast inference using Groq's LPU technology.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | The Groq API key |
+| `url` | `<class 'str'>` | No | https://api.groq.com | The URL for the Groq AI server |
+
+## Sample Configuration
+
+```yaml
+url: https://api.groq.com
+api_key: ${env.GROQ_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_hf_endpoint.md
+++ b/docs/source/providers/inference/remote_hf_endpoint.md
@ -0,0 +1,21 @@
+# remote::hf::endpoint
+
+## Description
+
+HuggingFace Inference Endpoints provider for dedicated model serving.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `endpoint_name` | `<class 'str'>` | No | PydanticUndefined | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
+| `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |
+
+## Sample Configuration
+
+```yaml
+endpoint_name: ${env.INFERENCE_ENDPOINT_NAME}
+api_token: ${env.HF_API_TOKEN}
+
+```
+
--- a/docs/source/providers/inference/remote_hf_serverless.md
+++ b/docs/source/providers/inference/remote_hf_serverless.md
@ -0,0 +1,21 @@
+# remote::hf::serverless
+
+## Description
+
+HuggingFace Inference API serverless provider for on-demand model inference.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `huggingface_repo` | `<class 'str'>` | No | PydanticUndefined | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
+| `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |
+
+## Sample Configuration
+
+```yaml
+huggingface_repo: ${env.INFERENCE_MODEL}
+api_token: ${env.HF_API_TOKEN}
+
+```
+
--- a/docs/source/providers/inference/remote_llama-openai-compat.md
+++ b/docs/source/providers/inference/remote_llama-openai-compat.md
@ -0,0 +1,21 @@
+# remote::llama-openai-compat
+
+## Description
+
+Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | The Llama API key |
+| `openai_compat_api_base` | `<class 'str'>` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
+
+## Sample Configuration
+
+```yaml
+openai_compat_api_base: https://api.llama.com/compat/v1/
+api_key: ${env.LLAMA_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_nvidia.md
+++ b/docs/source/providers/inference/remote_nvidia.md
@ -0,0 +1,24 @@
+# remote::nvidia
+
+## Description
+
+NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
+| `api_key` | `pydantic.types.SecretStr \| None` | No |  | The NVIDIA API key, only needed of using the hosted service |
+| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
+| `append_api_version` | `<class 'bool'>` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
+
+## Sample Configuration
+
+```yaml
+url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
+api_key: ${env.NVIDIA_API_KEY:+}
+append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
+
+```
+
--- a/docs/source/providers/inference/remote_ollama.md
+++ b/docs/source/providers/inference/remote_ollama.md
@ -0,0 +1,21 @@
+# remote::ollama
+
+## Description
+
+Ollama inference provider for running local models through the Ollama runtime.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No | http://localhost:11434 |  |
+| `raise_on_connect_error` | `<class 'bool'>` | No | True |  |
+
+## Sample Configuration
+
+```yaml
+url: ${env.OLLAMA_URL:=http://localhost:11434}
+raise_on_connect_error: true
+
+```
+
--- a/docs/source/providers/inference/remote_openai.md
+++ b/docs/source/providers/inference/remote_openai.md
@ -0,0 +1,19 @@
+# remote::openai
+
+## Description
+
+OpenAI inference provider for accessing GPT models and other OpenAI services.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | API key for OpenAI models |
+
+## Sample Configuration
+
+```yaml
+api_key: ${env.OPENAI_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_passthrough.md
+++ b/docs/source/providers/inference/remote_passthrough.md
@ -0,0 +1,21 @@
+# remote::passthrough
+
+## Description
+
+Passthrough inference provider for connecting to any external inference service not directly supported.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No |  | The URL for the passthrough endpoint |
+| `api_key` | `pydantic.types.SecretStr \| None` | No |  | API Key for the passthrouth endpoint |
+
+## Sample Configuration
+
+```yaml
+url: ${env.PASSTHROUGH_URL}
+api_key: ${env.PASSTHROUGH_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_runpod.md
+++ b/docs/source/providers/inference/remote_runpod.md
@ -0,0 +1,21 @@
+# remote::runpod
+
+## Description
+
+RunPod inference provider for running models on RunPod's cloud GPU platform.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `str \| None` | No |  | The URL for the Runpod model serving endpoint |
+| `api_token` | `str \| None` | No |  | The API token |
+
+## Sample Configuration
+
+```yaml
+url: ${env.RUNPOD_URL:+}
+api_token: ${env.RUNPOD_API_TOKEN:+}
+
+```
+
--- a/docs/source/providers/inference/remote_sambanova-openai-compat.md
+++ b/docs/source/providers/inference/remote_sambanova-openai-compat.md
@ -0,0 +1,21 @@
+# remote::sambanova-openai-compat
+
+## Description
+
+SambaNova OpenAI-compatible provider for using SambaNova models with OpenAI API format.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | The SambaNova API key |
+| `openai_compat_api_base` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova API server |
+
+## Sample Configuration
+
+```yaml
+openai_compat_api_base: https://api.sambanova.ai/v1
+api_key: ${env.SAMBANOVA_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_sambanova.md
+++ b/docs/source/providers/inference/remote_sambanova.md
@ -0,0 +1,21 @@
+# remote::sambanova
+
+## Description
+
+SambaNova inference provider for running models on SambaNova's dataflow architecture.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
+| `api_key` | `pydantic.types.SecretStr \| None` | No |  | The SambaNova cloud API Key |
+
+## Sample Configuration
+
+```yaml
+url: https://api.sambanova.ai/v1
+api_key: ${env.SAMBANOVA_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_tgi.md
+++ b/docs/source/providers/inference/remote_tgi.md
@ -0,0 +1,19 @@
+# remote::tgi
+
+## Description
+
+Text Generation Inference (TGI) provider for HuggingFace model serving.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No | PydanticUndefined | The URL for the TGI serving endpoint |
+
+## Sample Configuration
+
+```yaml
+url: ${env.TGI_URL}
+
+```
+
--- a/docs/source/providers/inference/remote_together-openai-compat.md
+++ b/docs/source/providers/inference/remote_together-openai-compat.md
@ -0,0 +1,21 @@
+# remote::together-openai-compat
+
+## Description
+
+Together AI OpenAI-compatible provider for using Together models with OpenAI API format.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `api_key` | `str \| None` | No |  | The Together API key |
+| `openai_compat_api_base` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together API server |
+
+## Sample Configuration
+
+```yaml
+openai_compat_api_base: https://api.together.xyz/v1
+api_key: ${env.TOGETHER_API_KEY}
+
+```
+
--- a/docs/source/providers/inference/remote_together.md
+++ b/docs/source/providers/inference/remote_together.md
@ -0,0 +1,21 @@
+# remote::together
+
+## Description
+
+Together AI inference provider for open-source models and collaborative AI development.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
+| `api_key` | `pydantic.types.SecretStr \| None` | No |  | The Together AI API Key |
+
+## Sample Configuration
+
+```yaml
+url: https://api.together.xyz/v1
+api_key: ${env.TOGETHER_API_KEY:+}
+
+```
+
--- a/docs/source/providers/inference/remote_vllm.md
+++ b/docs/source/providers/inference/remote_vllm.md
@ -0,0 +1,25 @@
+# remote::vllm
+
+## Description
+
+Remote vLLM inference provider for connecting to vLLM servers.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `str \| None` | No |  | The URL for the vLLM model serving endpoint |
+| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
+| `api_token` | `str \| None` | No | fake | The API token |
+| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
+
+## Sample Configuration
+
+```yaml
+url: ${env.VLLM_URL}
+max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
+api_token: ${env.VLLM_API_TOKEN:=fake}
+tls_verify: ${env.VLLM_TLS_VERIFY:=true}
+
+```
+
--- a/docs/source/providers/inference/remote_watsonx.md
+++ b/docs/source/providers/inference/remote_watsonx.md
@ -0,0 +1,24 @@
+# remote::watsonx
+
+## Description
+
+IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform.
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `url` | `<class 'str'>` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
+| `api_key` | `pydantic.types.SecretStr \| None` | No |  | The watsonx API key, only needed of using the hosted service |
+| `project_id` | `str \| None` | No |  | The Project ID key, only needed of using the hosted service |
+| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
+
+## Sample Configuration
+
+```yaml
+url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
+api_key: ${env.WATSONX_API_KEY:+}
+project_id: ${env.WATSONX_PROJECT_ID:+}
+
+```
+