mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-27 15:08:06 +00:00
chore: Enabling Milvus for VectorIO CI
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
709eb7da33
commit
c8d41d45ec
115 changed files with 2919 additions and 184 deletions
32
docs/source/providers/inference/index.md
Normal file
32
docs/source/providers/inference/index.md
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
# Inference Providers
|
||||
|
||||
This section contains documentation for all available providers for the **inference** API.
|
||||
|
||||
- [inline::meta-reference](inline_meta-reference.md)
|
||||
- [inline::sentence-transformers](inline_sentence-transformers.md)
|
||||
- [inline::vllm](inline_vllm.md)
|
||||
- [remote::anthropic](remote_anthropic.md)
|
||||
- [remote::bedrock](remote_bedrock.md)
|
||||
- [remote::cerebras](remote_cerebras.md)
|
||||
- [remote::cerebras-openai-compat](remote_cerebras-openai-compat.md)
|
||||
- [remote::databricks](remote_databricks.md)
|
||||
- [remote::fireworks](remote_fireworks.md)
|
||||
- [remote::fireworks-openai-compat](remote_fireworks-openai-compat.md)
|
||||
- [remote::gemini](remote_gemini.md)
|
||||
- [remote::groq](remote_groq.md)
|
||||
- [remote::groq-openai-compat](remote_groq-openai-compat.md)
|
||||
- [remote::hf::endpoint](remote_hf_endpoint.md)
|
||||
- [remote::hf::serverless](remote_hf_serverless.md)
|
||||
- [remote::llama-openai-compat](remote_llama-openai-compat.md)
|
||||
- [remote::nvidia](remote_nvidia.md)
|
||||
- [remote::ollama](remote_ollama.md)
|
||||
- [remote::openai](remote_openai.md)
|
||||
- [remote::passthrough](remote_passthrough.md)
|
||||
- [remote::runpod](remote_runpod.md)
|
||||
- [remote::sambanova](remote_sambanova.md)
|
||||
- [remote::sambanova-openai-compat](remote_sambanova-openai-compat.md)
|
||||
- [remote::tgi](remote_tgi.md)
|
||||
- [remote::together](remote_together.md)
|
||||
- [remote::together-openai-compat](remote_together-openai-compat.md)
|
||||
- [remote::vllm](remote_vllm.md)
|
||||
- [remote::watsonx](remote_watsonx.md)
|
||||
32
docs/source/providers/inference/inline_meta-reference.md
Normal file
32
docs/source/providers/inference/inline_meta-reference.md
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
# inline::meta-reference
|
||||
|
||||
## Description
|
||||
|
||||
Meta's reference implementation of inference with support for various model formats and optimization techniques.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `model` | `str \| None` | No | | |
|
||||
| `torch_seed` | `int \| None` | No | | |
|
||||
| `max_seq_len` | `<class 'int'>` | No | 4096 | |
|
||||
| `max_batch_size` | `<class 'int'>` | No | 1 | |
|
||||
| `model_parallel_size` | `int \| None` | No | | |
|
||||
| `create_distributed_process_group` | `<class 'bool'>` | No | True | |
|
||||
| `checkpoint_dir` | `str \| None` | No | | |
|
||||
| `quantization` | `Bf16QuantizationConfig \| Fp8QuantizationConfig \| Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type'` | No | | |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
model: Llama3.2-3B-Instruct
|
||||
checkpoint_dir: ${env.CHECKPOINT_DIR:=null}
|
||||
quantization:
|
||||
type: ${env.QUANTIZATION_TYPE:=bf16}
|
||||
model_parallel_size: ${env.MODEL_PARALLEL_SIZE:=0}
|
||||
max_batch_size: ${env.MAX_BATCH_SIZE:=1}
|
||||
max_seq_len: ${env.MAX_SEQ_LEN:=4096}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -0,0 +1,13 @@
|
|||
# inline::sentence-transformers
|
||||
|
||||
## Description
|
||||
|
||||
Sentence Transformers inference provider for text embeddings and similarity search.
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
|
||||
```
|
||||
|
||||
29
docs/source/providers/inference/inline_vllm.md
Normal file
29
docs/source/providers/inference/inline_vllm.md
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
# inline::vllm
|
||||
|
||||
## Description
|
||||
|
||||
vLLM inference provider for high-performance model serving with PagedAttention and continuous batching.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `tensor_parallel_size` | `<class 'int'>` | No | 1 | Number of tensor parallel replicas (number of GPUs to use). |
|
||||
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
|
||||
| `max_model_len` | `<class 'int'>` | No | 4096 | Maximum context length to use during serving. |
|
||||
| `max_num_seqs` | `<class 'int'>` | No | 4 | Maximum parallel batch size for generation. |
|
||||
| `enforce_eager` | `<class 'bool'>` | No | False | Whether to use eager mode for inference (otherwise cuda graphs are used). |
|
||||
| `gpu_memory_utilization` | `<class 'float'>` | No | 0.3 | How much GPU memory will be allocated when this provider has finished loading, including memory that was already allocated before loading. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
tensor_parallel_size: ${env.TENSOR_PARALLEL_SIZE:=1}
|
||||
max_tokens: ${env.MAX_TOKENS:=4096}
|
||||
max_model_len: ${env.MAX_MODEL_LEN:=4096}
|
||||
max_num_seqs: ${env.MAX_NUM_SEQS:=4}
|
||||
enforce_eager: ${env.ENFORCE_EAGER:=False}
|
||||
gpu_memory_utilization: ${env.GPU_MEMORY_UTILIZATION:=0.3}
|
||||
|
||||
```
|
||||
|
||||
19
docs/source/providers/inference/remote_anthropic.md
Normal file
19
docs/source/providers/inference/remote_anthropic.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# remote::anthropic
|
||||
|
||||
## Description
|
||||
|
||||
Anthropic inference provider for accessing Claude models and Anthropic's AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for Anthropic models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.ANTHROPIC_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
28
docs/source/providers/inference/remote_bedrock.md
Normal file
28
docs/source/providers/inference/remote_bedrock.md
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
# remote::bedrock
|
||||
|
||||
## Description
|
||||
|
||||
AWS Bedrock inference provider for accessing various AI models through AWS's managed service.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `aws_access_key_id` | `str \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
|
||||
| `aws_secret_access_key` | `str \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
|
||||
| `aws_session_token` | `str \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
|
||||
| `region_name` | `str \| None` | No | | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
|
||||
| `profile_name` | `str \| None` | No | | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
|
||||
| `total_max_attempts` | `int \| None` | No | | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
|
||||
| `retry_mode` | `str \| None` | No | | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
|
||||
| `connect_timeout` | `float \| None` | No | 60 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
|
||||
| `read_timeout` | `float \| None` | No | 60 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
|
||||
| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::cerebras-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Cerebras OpenAI-compatible provider for using Cerebras models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Cerebras API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.cerebras.ai/v1 | The URL for the Cerebras API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.cerebras.ai/v1
|
||||
api_key: ${env.CEREBRAS_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_cerebras.md
Normal file
21
docs/source/providers/inference/remote_cerebras.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::cerebras
|
||||
|
||||
## Description
|
||||
|
||||
Cerebras inference provider for running models on Cerebras Cloud platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `base_url` | `<class 'str'>` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Cerebras API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
base_url: https://api.cerebras.ai
|
||||
api_key: ${env.CEREBRAS_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_databricks.md
Normal file
21
docs/source/providers/inference/remote_databricks.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::databricks
|
||||
|
||||
## Description
|
||||
|
||||
Databricks inference provider for running models on Databricks' unified analytics platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | | The URL for the Databricks model serving endpoint |
|
||||
| `api_token` | `<class 'str'>` | No | | The Databricks API token |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.DATABRICKS_URL}
|
||||
api_token: ${env.DATABRICKS_API_TOKEN}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::fireworks-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Fireworks AI OpenAI-compatible provider for using Fireworks models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Fireworks API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.fireworks.ai/inference/v1
|
||||
api_key: ${env.FIREWORKS_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_fireworks.md
Normal file
21
docs/source/providers/inference/remote_fireworks.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::fireworks
|
||||
|
||||
## Description
|
||||
|
||||
Fireworks AI inference provider for Llama models and other AI models on the Fireworks platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Fireworks.ai API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.fireworks.ai/inference/v1
|
||||
api_key: ${env.FIREWORKS_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
19
docs/source/providers/inference/remote_gemini.md
Normal file
19
docs/source/providers/inference/remote_gemini.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# remote::gemini
|
||||
|
||||
## Description
|
||||
|
||||
Google Gemini inference provider for accessing Gemini models and Google's AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for Gemini models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.GEMINI_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_groq-openai-compat.md
Normal file
21
docs/source/providers/inference/remote_groq-openai-compat.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::groq-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Groq OpenAI-compatible provider for using Groq models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Groq API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.groq.com/openai/v1 | The URL for the Groq API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.groq.com/openai/v1
|
||||
api_key: ${env.GROQ_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_groq.md
Normal file
21
docs/source/providers/inference/remote_groq.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::groq
|
||||
|
||||
## Description
|
||||
|
||||
Groq inference provider for ultra-fast inference using Groq's LPU technology.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Groq API key |
|
||||
| `url` | `<class 'str'>` | No | https://api.groq.com | The URL for the Groq AI server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.groq.com
|
||||
api_key: ${env.GROQ_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_hf_endpoint.md
Normal file
21
docs/source/providers/inference/remote_hf_endpoint.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::hf::endpoint
|
||||
|
||||
## Description
|
||||
|
||||
HuggingFace Inference Endpoints provider for dedicated model serving.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `endpoint_name` | `<class 'str'>` | No | PydanticUndefined | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
|
||||
| `api_token` | `pydantic.types.SecretStr \| None` | No | | Your Hugging Face user access token (will default to locally saved token if not provided) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
endpoint_name: ${env.INFERENCE_ENDPOINT_NAME}
|
||||
api_token: ${env.HF_API_TOKEN}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_hf_serverless.md
Normal file
21
docs/source/providers/inference/remote_hf_serverless.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::hf::serverless
|
||||
|
||||
## Description
|
||||
|
||||
HuggingFace Inference API serverless provider for on-demand model inference.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `huggingface_repo` | `<class 'str'>` | No | PydanticUndefined | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
|
||||
| `api_token` | `pydantic.types.SecretStr \| None` | No | | Your Hugging Face user access token (will default to locally saved token if not provided) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
huggingface_repo: ${env.INFERENCE_MODEL}
|
||||
api_token: ${env.HF_API_TOKEN}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::llama-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Llama API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.llama.com/compat/v1/
|
||||
api_key: ${env.LLAMA_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
24
docs/source/providers/inference/remote_nvidia.md
Normal file
24
docs/source/providers/inference/remote_nvidia.md
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
# remote::nvidia
|
||||
|
||||
## Description
|
||||
|
||||
NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The NVIDIA API key, only needed of using the hosted service |
|
||||
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
|
||||
| `append_api_version` | `<class 'bool'>` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
|
||||
api_key: ${env.NVIDIA_API_KEY:+}
|
||||
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_ollama.md
Normal file
21
docs/source/providers/inference/remote_ollama.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::ollama
|
||||
|
||||
## Description
|
||||
|
||||
Ollama inference provider for running local models through the Ollama runtime.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | http://localhost:11434 | |
|
||||
| `raise_on_connect_error` | `<class 'bool'>` | No | True | |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
||||
raise_on_connect_error: true
|
||||
|
||||
```
|
||||
|
||||
19
docs/source/providers/inference/remote_openai.md
Normal file
19
docs/source/providers/inference/remote_openai.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# remote::openai
|
||||
|
||||
## Description
|
||||
|
||||
OpenAI inference provider for accessing GPT models and other OpenAI services.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | API key for OpenAI models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
api_key: ${env.OPENAI_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_passthrough.md
Normal file
21
docs/source/providers/inference/remote_passthrough.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::passthrough
|
||||
|
||||
## Description
|
||||
|
||||
Passthrough inference provider for connecting to any external inference service not directly supported.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | API Key for the passthrouth endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.PASSTHROUGH_URL}
|
||||
api_key: ${env.PASSTHROUGH_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_runpod.md
Normal file
21
docs/source/providers/inference/remote_runpod.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::runpod
|
||||
|
||||
## Description
|
||||
|
||||
RunPod inference provider for running models on RunPod's cloud GPU platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `str \| None` | No | | The URL for the Runpod model serving endpoint |
|
||||
| `api_token` | `str \| None` | No | | The API token |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.RUNPOD_URL:+}
|
||||
api_token: ${env.RUNPOD_API_TOKEN:+}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::sambanova-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
SambaNova OpenAI-compatible provider for using SambaNova models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The SambaNova API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.sambanova.ai/v1
|
||||
api_key: ${env.SAMBANOVA_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_sambanova.md
Normal file
21
docs/source/providers/inference/remote_sambanova.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::sambanova
|
||||
|
||||
## Description
|
||||
|
||||
SambaNova inference provider for running models on SambaNova's dataflow architecture.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The SambaNova cloud API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.sambanova.ai/v1
|
||||
api_key: ${env.SAMBANOVA_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
19
docs/source/providers/inference/remote_tgi.md
Normal file
19
docs/source/providers/inference/remote_tgi.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# remote::tgi
|
||||
|
||||
## Description
|
||||
|
||||
Text Generation Inference (TGI) provider for HuggingFace model serving.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | PydanticUndefined | The URL for the TGI serving endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.TGI_URL}
|
||||
|
||||
```
|
||||
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# remote::together-openai-compat
|
||||
|
||||
## Description
|
||||
|
||||
Together AI OpenAI-compatible provider for using Together models with OpenAI API format.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | `str \| None` | No | | The Together API key |
|
||||
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together API server |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
openai_compat_api_base: https://api.together.xyz/v1
|
||||
api_key: ${env.TOGETHER_API_KEY}
|
||||
|
||||
```
|
||||
|
||||
21
docs/source/providers/inference/remote_together.md
Normal file
21
docs/source/providers/inference/remote_together.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# remote::together
|
||||
|
||||
## Description
|
||||
|
||||
Together AI inference provider for open-source models and collaborative AI development.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Together AI API Key |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: https://api.together.xyz/v1
|
||||
api_key: ${env.TOGETHER_API_KEY:+}
|
||||
|
||||
```
|
||||
|
||||
25
docs/source/providers/inference/remote_vllm.md
Normal file
25
docs/source/providers/inference/remote_vllm.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
# remote::vllm
|
||||
|
||||
## Description
|
||||
|
||||
Remote vLLM inference provider for connecting to vLLM servers.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `str \| None` | No | | The URL for the vLLM model serving endpoint |
|
||||
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
|
||||
| `api_token` | `str \| None` | No | fake | The API token |
|
||||
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.VLLM_URL}
|
||||
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
|
||||
api_token: ${env.VLLM_API_TOKEN:=fake}
|
||||
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
|
||||
|
||||
```
|
||||
|
||||
24
docs/source/providers/inference/remote_watsonx.md
Normal file
24
docs/source/providers/inference/remote_watsonx.md
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
# remote::watsonx
|
||||
|
||||
## Description
|
||||
|
||||
IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The watsonx API key, only needed of using the hosted service |
|
||||
| `project_id` | `str \| None` | No | | The Project ID key, only needed of using the hosted service |
|
||||
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
|
||||
api_key: ${env.WATSONX_API_KEY:+}
|
||||
project_id: ${env.WATSONX_PROJECT_ID:+}
|
||||
|
||||
```
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue