mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-07 22:34:37 +00:00
docs: auto generated documentation for providers (#2543)
# What does this PR do? Simple approach to get some provider pages in the docs. Add or update description fields in the provider configuration class using Pydantic’s Field, ensuring these descriptions are clear and complete, as they will be used to auto-generate provider documentation via ./scripts/distro_codegen.py instead of editing the docs manually. Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
parent
8d8e90d78e
commit
c9a49a80e8
96 changed files with 2562 additions and 65 deletions
32
docs/source/providers/inference/index.md
Normal file
32
docs/source/providers/inference/index.md
Normal file
|
@ -0,0 +1,32 @@
|
|||
# Inference Providers
|
||||
|
||||
This section contains documentation for all available providers for the **inference** API.
|
||||
|
||||
- [inline::meta-reference](inline_meta-reference.md)
|
||||
- [inline::sentence-transformers](inline_sentence-transformers.md)
|
||||
- [inline::vllm](inline_vllm.md)
|
||||
- [remote::anthropic](remote_anthropic.md)
|
||||
- [remote::bedrock](remote_bedrock.md)
|
||||
- [remote::cerebras](remote_cerebras.md)
|
||||
- [remote::cerebras-openai-compat](remote_cerebras-openai-compat.md)
|
||||
- [remote::databricks](remote_databricks.md)
|
||||
- [remote::fireworks](remote_fireworks.md)
|
||||
- [remote::fireworks-openai-compat](remote_fireworks-openai-compat.md)
|
||||
- [remote::gemini](remote_gemini.md)
|
||||
- [remote::groq](remote_groq.md)
|
||||
- [remote::groq-openai-compat](remote_groq-openai-compat.md)
|
||||
- [remote::hf::endpoint](remote_hf_endpoint.md)
|
||||
- [remote::hf::serverless](remote_hf_serverless.md)
|
||||
- [remote::llama-openai-compat](remote_llama-openai-compat.md)
|
||||
- [remote::nvidia](remote_nvidia.md)
|
||||
- [remote::ollama](remote_ollama.md)
|
||||
- [remote::openai](remote_openai.md)
|
||||
- [remote::passthrough](remote_passthrough.md)
|
||||
- [remote::runpod](remote_runpod.md)
|
||||
- [remote::sambanova](remote_sambanova.md)
|
||||
- [remote::sambanova-openai-compat](remote_sambanova-openai-compat.md)
|
||||
- [remote::tgi](remote_tgi.md)
|
||||
- [remote::together](remote_together.md)
|
||||
- [remote::together-openai-compat](remote_together-openai-compat.md)
|
||||
- [remote::vllm](remote_vllm.md)
|
||||
- [remote::watsonx](remote_watsonx.md)
|
32
docs/source/providers/inference/inline_meta-reference.md
Normal file
32
docs/source/providers/inference/inline_meta-reference.md
Normal file
|
@ -0,0 +1,32 @@
|
|||
# inline::meta-reference
|
||||
|
||||
## Description
|
||||
|
||||
Meta's reference implementation of inference with support for various model formats and optimization techniques.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `model` | `str \| None` | No | | |
|
||||
| `torch_seed` | `int \| None` | No | | |
|
||||
| `max_seq_len` | `<class 'int'>` | No | 4096 | |
|
||||
| `max_batch_size` | `<class 'int'>` | No | 1 | |
|
||||
| `model_parallel_size` | `int \| None` | No | | |
|
||||
| `create_distributed_process_group` | `<class 'bool'>` | No | True | |
|
||||
| `checkpoint_dir` | `str \| None` | No | | |
|
||||
| `quantization` | `Bf16QuantizationConfig \| Fp8QuantizationConfig \| Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type'` | No | | |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
model: Llama3.2-3B-Instruct
|
||||
checkpoint_dir: ${env.CHECKPOINT_DIR:=null}
|
||||
quantization:
|
||||
type: ${env.QUANTIZATION_TYPE:=bf16}
|
||||
model_parallel_size: ${env.MODEL_PARALLEL_SIZE:=0}
|
||||
max_batch_size: ${env.MAX_BATCH_SIZE:=1}
|
||||
max_seq_len: ${env.MAX_SEQ_LEN:=4096}
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,13 @@
|
|||
# inline::sentence-transformers
|
||||
|
||||
## Description
|
||||
|
||||
Sentence Transformers inference provider for text embeddings and similarity search.
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
|
||||
```
|
||||
|
29
docs/source/providers/inference/inline_vllm.md
Normal file
29
docs/source/providers/inference/inline_vllm.md
Normal file
|
@ -0,0 +1,29 @@
|
|||
# inline::vllm
|
||||
|
||||
## Description
|
||||
|
||||
vLLM inference provider for high-performance model serving with PagedAttention and continuous batching.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `tensor_parallel_size` | `<class 'int'>` | No | 1 | Number of tensor parallel replicas (number of GPUs to use). |
|
||||
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
|
||||
| `max_model_len` | `<class 'int'>` | No | 4096 | Maximum context length to use during serving. |
|
||||
| `max_num_seqs` | `<class 'int'>` | No | 4 | Maximum parallel batch size for generation. |
|
||||
| `enforce_eager` | `<class 'bool'>` | No | False | Whether to use eager mode for inference (otherwise cuda graphs are used). |
|
||||
| `gpu_memory_utilization` | `<class 'float'>` | No | 0.3 | How much GPU memory will be allocated when this provider has finished loading, including memory that was already allocated before loading. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
tensor_parallel_size: ${env.TENSOR_PARALLEL_SIZE:=1}
|
||||
max_tokens: ${env.MAX_TOKENS:=4096}
|
||||
max_model_len: ${env.MAX_MODEL_LEN:=4096}
|
||||
max_num_seqs: ${env.MAX_NUM_SEQS:=4}
|
||||
enforce_eager: ${env.ENFORCE_EAGER:=False}
|
||||
gpu_memory_utilization: ${env.GPU_MEMORY_UTILIZATION:=0.3}
|
||||
|
||||
```
|
||||
|
19
docs/source/providers/inference/remote_anthropic.md
Normal file
19
docs/source/providers/inference/remote_anthropic.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
# remote::anthropic
|
||||
|
||||
## Description
|
||||
|
||||
Anthropic inference provider for accessing Claude models and Anthropic's AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for Anthropic models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.ANTHROPIC_API_KEY}
|
||||
|
||||
```
|
||||
|
28
docs/source/providers/inference/remote_bedrock.md
Normal file
28
docs/source/providers/inference/remote_bedrock.md
Normal file
|
@ -0,0 +1,28 @@
|
|||
# remote::bedrock
|
||||
|
||||
## Description
|
||||
|
||||
AWS Bedrock inference provider for accessing various AI models through AWS's managed service.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `aws_access_key_id` | `str \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
|
||||
| `aws_secret_access_key` | `str \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
|
||||
| `aws_session_token` | `str \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
|
||||
| `region_name` | `str \| None` | No | | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
|
||||
| `profile_name` | `str \| None` | No | | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
|
||||
| `total_max_attempts` | `int \| None` | No | | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
|
||||
| `retry_mode` | `str \| None` | No | | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
|
||||
| `connect_timeout` | `float \| None` | No | 60 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
|
||||
| `read_timeout` | `float \| None` | No | 60 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
|
||||
| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::cerebras-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Cerebras OpenAI-compatible provider for using Cerebras models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Cerebras API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.cerebras.ai/v1 | The URL for the Cerebras API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.cerebras.ai/v1
|
||||
api_key: ${env.CEREBRAS_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_cerebras.md
Normal file
21
docs/source/providers/inference/remote_cerebras.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::cerebras
|
||||
|
||||
## Description
|
||||
|
||||
Cerebras inference provider for running models on Cerebras Cloud platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `base_url` | `<class 'str'>` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Cerebras API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
base_url: https://api.cerebras.ai
|
||||
api_key: ${env.CEREBRAS_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_databricks.md
Normal file
21
docs/source/providers/inference/remote_databricks.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::databricks
|
||||
|
||||
## Description
|
||||
|
||||
Databricks inference provider for running models on Databricks' unified analytics platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | | The URL for the Databricks model serving endpoint |
|
||||
| `api_token` | `<class 'str'>` | No | | The Databricks API token |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.DATABRICKS_URL}
|
||||
api_token: ${env.DATABRICKS_API_TOKEN}
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::fireworks-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Fireworks AI OpenAI-compatible provider for using Fireworks models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Fireworks API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.fireworks.ai/inference/v1
|
||||
api_key: ${env.FIREWORKS_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_fireworks.md
Normal file
21
docs/source/providers/inference/remote_fireworks.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::fireworks
|
||||
|
||||
## Description
|
||||
|
||||
Fireworks AI inference provider for Llama models and other AI models on the Fireworks platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Fireworks.ai API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.fireworks.ai/inference/v1
|
||||
api_key: ${env.FIREWORKS_API_KEY}
|
||||
|
||||
```
|
||||
|
19
docs/source/providers/inference/remote_gemini.md
Normal file
19
docs/source/providers/inference/remote_gemini.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
# remote::gemini
|
||||
|
||||
## Description
|
||||
|
||||
Google Gemini inference provider for accessing Gemini models and Google's AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for Gemini models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.GEMINI_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_groq-openai-compat.md
Normal file
21
docs/source/providers/inference/remote_groq-openai-compat.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::groq-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Groq OpenAI-compatible provider for using Groq models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Groq API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.groq.com/openai/v1 | The URL for the Groq API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.groq.com/openai/v1
|
||||
api_key: ${env.GROQ_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_groq.md
Normal file
21
docs/source/providers/inference/remote_groq.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::groq
|
||||
|
||||
## Description
|
||||
|
||||
Groq inference provider for ultra-fast inference using Groq's LPU technology.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Groq API key |
|
||||
| `url` | `<class 'str'>` | No | https://api.groq.com | The URL for the Groq AI server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.groq.com
|
||||
api_key: ${env.GROQ_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_hf_endpoint.md
Normal file
21
docs/source/providers/inference/remote_hf_endpoint.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::hf::endpoint
|
||||
|
||||
## Description
|
||||
|
||||
HuggingFace Inference Endpoints provider for dedicated model serving.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `endpoint_name` | `<class 'str'>` | No | PydanticUndefined | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
|
||||
| `api_token` | `pydantic.types.SecretStr \| None` | No | | Your Hugging Face user access token (will default to locally saved token if not provided) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
endpoint_name: ${env.INFERENCE_ENDPOINT_NAME}
|
||||
api_token: ${env.HF_API_TOKEN}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_hf_serverless.md
Normal file
21
docs/source/providers/inference/remote_hf_serverless.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::hf::serverless
|
||||
|
||||
## Description
|
||||
|
||||
HuggingFace Inference API serverless provider for on-demand model inference.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `huggingface_repo` | `<class 'str'>` | No | PydanticUndefined | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
|
||||
| `api_token` | `pydantic.types.SecretStr \| None` | No | | Your Hugging Face user access token (will default to locally saved token if not provided) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
huggingface_repo: ${env.INFERENCE_MODEL}
|
||||
api_token: ${env.HF_API_TOKEN}
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::llama-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Llama API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.llama.com/compat/v1/
|
||||
api_key: ${env.LLAMA_API_KEY}
|
||||
|
||||
```
|
||||
|
24
docs/source/providers/inference/remote_nvidia.md
Normal file
24
docs/source/providers/inference/remote_nvidia.md
Normal file
|
@ -0,0 +1,24 @@
|
|||
# remote::nvidia
|
||||
|
||||
## Description
|
||||
|
||||
NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The NVIDIA API key, only needed of using the hosted service |
|
||||
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
|
||||
| `append_api_version` | `<class 'bool'>` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
|
||||
api_key: ${env.NVIDIA_API_KEY:+}
|
||||
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_ollama.md
Normal file
21
docs/source/providers/inference/remote_ollama.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::ollama
|
||||
|
||||
## Description
|
||||
|
||||
Ollama inference provider for running local models through the Ollama runtime.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | http://localhost:11434 | |
|
||||
| `raise_on_connect_error` | `<class 'bool'>` | No | True | |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
||||
raise_on_connect_error: true
|
||||
|
||||
```
|
||||
|
19
docs/source/providers/inference/remote_openai.md
Normal file
19
docs/source/providers/inference/remote_openai.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
# remote::openai
|
||||
|
||||
## Description
|
||||
|
||||
OpenAI inference provider for accessing GPT models and other OpenAI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for OpenAI models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.OPENAI_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_passthrough.md
Normal file
21
docs/source/providers/inference/remote_passthrough.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::passthrough
|
||||
|
||||
## Description
|
||||
|
||||
Passthrough inference provider for connecting to any external inference service not directly supported.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | API Key for the passthrouth endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.PASSTHROUGH_URL}
|
||||
api_key: ${env.PASSTHROUGH_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_runpod.md
Normal file
21
docs/source/providers/inference/remote_runpod.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::runpod
|
||||
|
||||
## Description
|
||||
|
||||
RunPod inference provider for running models on RunPod's cloud GPU platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `str \| None` | No | | The URL for the Runpod model serving endpoint |
|
||||
| `api_token` | `str \| None` | No | | The API token |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.RUNPOD_URL:+}
|
||||
api_token: ${env.RUNPOD_API_TOKEN:+}
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::sambanova-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
SambaNova OpenAI-compatible provider for using SambaNova models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The SambaNova API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.sambanova.ai/v1
|
||||
api_key: ${env.SAMBANOVA_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_sambanova.md
Normal file
21
docs/source/providers/inference/remote_sambanova.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::sambanova
|
||||
|
||||
## Description
|
||||
|
||||
SambaNova inference provider for running models on SambaNova's dataflow architecture.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The SambaNova cloud API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.sambanova.ai/v1
|
||||
api_key: ${env.SAMBANOVA_API_KEY}
|
||||
|
||||
```
|
||||
|
19
docs/source/providers/inference/remote_tgi.md
Normal file
19
docs/source/providers/inference/remote_tgi.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
# remote::tgi
|
||||
|
||||
## Description
|
||||
|
||||
Text Generation Inference (TGI) provider for HuggingFace model serving.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | PydanticUndefined | The URL for the TGI serving endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.TGI_URL}
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::together-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Together AI OpenAI-compatible provider for using Together models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Together API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.together.xyz/v1
|
||||
api_key: ${env.TOGETHER_API_KEY}
|
||||
|
||||
```
|
||||
|
21
docs/source/providers/inference/remote_together.md
Normal file
21
docs/source/providers/inference/remote_together.md
Normal file
|
@ -0,0 +1,21 @@
|
|||
# remote::together
|
||||
|
||||
## Description
|
||||
|
||||
Together AI inference provider for open-source models and collaborative AI development.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Together AI API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.together.xyz/v1
|
||||
api_key: ${env.TOGETHER_API_KEY:+}
|
||||
|
||||
```
|
||||
|
25
docs/source/providers/inference/remote_vllm.md
Normal file
25
docs/source/providers/inference/remote_vllm.md
Normal file
|
@ -0,0 +1,25 @@
|
|||
# remote::vllm
|
||||
|
||||
## Description
|
||||
|
||||
Remote vLLM inference provider for connecting to vLLM servers.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `str \| None` | No | | The URL for the vLLM model serving endpoint |
|
||||
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
|
||||
| `api_token` | `str \| None` | No | fake | The API token |
|
||||
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.VLLM_URL}
|
||||
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
|
||||
api_token: ${env.VLLM_API_TOKEN:=fake}
|
||||
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
|
||||
|
||||
```
|
||||
|
24
docs/source/providers/inference/remote_watsonx.md
Normal file
24
docs/source/providers/inference/remote_watsonx.md
Normal file
|
@ -0,0 +1,24 @@
|
|||
# remote::watsonx
|
||||
|
||||
## Description
|
||||
|
||||
IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The watsonx API key, only needed of using the hosted service |
|
||||
| `project_id` | `str \| None` | No | | The Project ID key, only needed of using the hosted service |
|
||||
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
|
||||
api_key: ${env.WATSONX_API_KEY:+}
|
||||
project_id: ${env.WATSONX_PROJECT_ID:+}
|
||||
|
||||
```
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue