Merge branch 'main' into feature/dpo-training

This commit is contained in:
Nehanth Narendrula 2025-07-29 14:57:22 -04:00 committed by GitHub
commit b68b818539
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
265 changed files with 10254 additions and 7796 deletions

View file

@ -7,7 +7,16 @@ Llama Stack supports external providers that live outside of the main codebase.
## Configuration
To enable external providers, you need to configure the `external_providers_dir` in your Llama Stack configuration. This directory should contain your external provider specifications:
To enable external providers, you need to add `module` into your build yaml, allowing Llama Stack to install the required package corresponding to the external provider.
an example entry in your build.yaml should look like:
```
- provider_type: remote::ramalama
module: ramalama_stack
```
Additionally you can configure the `external_providers_dir` in your Llama Stack configuration. This method is in the process of being deprecated in favor of the `module` method. If using this method, the external provider directory should contain your external provider specifications:
```yaml
external_providers_dir: ~/.llama/providers.d/
@ -112,6 +121,31 @@ container_image: custom-vector-store:latest # optional
## Required Implementation
## All Providers
All providers must contain a `get_provider_spec` function in their `provider` module. This is a standardized structure that Llama Stack expects and is necessary for getting things such as the config class. The `get_provider_spec` method returns a structure identical to the `adapter`. An example function may look like:
```python
from llama_stack.providers.datatypes import (
ProviderSpec,
Api,
AdapterSpec,
remote_provider_spec,
)
def get_provider_spec() -> ProviderSpec:
return remote_provider_spec(
api=Api.inference,
adapter=AdapterSpec(
adapter_type="ramalama",
pip_packages=["ramalama>=0.8.5", "pymilvus"],
config_class="ramalama_stack.config.RamalamaImplConfig",
module="ramalama_stack",
),
)
```
### Remote Providers
Remote providers must expose a `get_adapter_impl()` function in their module that takes two arguments:
@ -155,7 +189,7 @@ Version: 0.1.0
Location: /path/to/venv/lib/python3.10/site-packages
```
## Example: Custom Ollama Provider
## Example using `external_providers_dir`: Custom Ollama Provider
Here's a complete example of creating and using a custom Ollama provider:
@ -206,6 +240,34 @@ external_providers_dir: ~/.llama/providers.d/
The provider will now be available in Llama Stack with the type `remote::custom_ollama`.
## Example using `module`: ramalama-stack
[ramalama-stack](https://github.com/containers/ramalama-stack) is a recognized external provider that supports installation via module.
To install Llama Stack with this external provider a user can provider the following build.yaml:
```yaml
version: 2
distribution_spec:
description: Use (an external) Ramalama server for running LLM inference
container_image: null
providers:
inference:
- provider_type: remote::ramalama
module: ramalama_stack==0.3.0a0
image_type: venv
image_name: null
external_providers_dir: null
additional_pip_packages:
- aiosqlite
- sqlalchemy[asyncio]
```
No other steps are required other than `llama stack build` and `llama stack run`. The build process will use `module` to install all of the provider dependencies, retrieve the spec, etc.
The provider will now be available in Llama Stack with the type `remote::ramalama`.
## Best Practices
1. **Package Naming**: Use the prefix `llama-stack-provider-` for your provider packages to make them easily identifiable.
@ -229,9 +291,10 @@ information. Execute the test for the Provider type you are developing.
If your external provider isn't being loaded:
1. Check that `module` points to a published pip package with a top level `provider` module including `get_provider_spec`.
1. Check that the `external_providers_dir` path is correct and accessible.
2. Verify that the YAML files are properly formatted.
3. Ensure all required Python packages are installed.
4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more
information using `LLAMA_STACK_LOGGING=all=debug`.
5. Verify that the provider package is installed in your Python environment.
5. Verify that the provider package is installed in your Python environment if using `external_providers_dir`.

View file

@ -4,17 +4,13 @@ This section contains documentation for all available providers for the **infere
- [inline::meta-reference](inline_meta-reference.md)
- [inline::sentence-transformers](inline_sentence-transformers.md)
- [inline::vllm](inline_vllm.md)
- [remote::anthropic](remote_anthropic.md)
- [remote::bedrock](remote_bedrock.md)
- [remote::cerebras](remote_cerebras.md)
- [remote::cerebras-openai-compat](remote_cerebras-openai-compat.md)
- [remote::databricks](remote_databricks.md)
- [remote::fireworks](remote_fireworks.md)
- [remote::fireworks-openai-compat](remote_fireworks-openai-compat.md)
- [remote::gemini](remote_gemini.md)
- [remote::groq](remote_groq.md)
- [remote::groq-openai-compat](remote_groq-openai-compat.md)
- [remote::hf::endpoint](remote_hf_endpoint.md)
- [remote::hf::serverless](remote_hf_serverless.md)
- [remote::llama-openai-compat](remote_llama-openai-compat.md)
@ -24,9 +20,7 @@ This section contains documentation for all available providers for the **infere
- [remote::passthrough](remote_passthrough.md)
- [remote::runpod](remote_runpod.md)
- [remote::sambanova](remote_sambanova.md)
- [remote::sambanova-openai-compat](remote_sambanova-openai-compat.md)
- [remote::tgi](remote_tgi.md)
- [remote::together](remote_together.md)
- [remote::together-openai-compat](remote_together-openai-compat.md)
- [remote::vllm](remote_vllm.md)
- [remote::watsonx](remote_watsonx.md)

View file

@ -1,29 +0,0 @@
# inline::vllm
## Description
vLLM inference provider for high-performance model serving with PagedAttention and continuous batching.
## Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `tensor_parallel_size` | `<class 'int'>` | No | 1 | Number of tensor parallel replicas (number of GPUs to use). |
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
| `max_model_len` | `<class 'int'>` | No | 4096 | Maximum context length to use during serving. |
| `max_num_seqs` | `<class 'int'>` | No | 4 | Maximum parallel batch size for generation. |
| `enforce_eager` | `<class 'bool'>` | No | False | Whether to use eager mode for inference (otherwise cuda graphs are used). |
| `gpu_memory_utilization` | `<class 'float'>` | No | 0.3 | How much GPU memory will be allocated when this provider has finished loading, including memory that was already allocated before loading. |
## Sample Configuration
```yaml
tensor_parallel_size: ${env.TENSOR_PARALLEL_SIZE:=1}
max_tokens: ${env.MAX_TOKENS:=4096}
max_model_len: ${env.MAX_MODEL_LEN:=4096}
max_num_seqs: ${env.MAX_NUM_SEQS:=4}
enforce_eager: ${env.ENFORCE_EAGER:=False}
gpu_memory_utilization: ${env.GPU_MEMORY_UTILIZATION:=0.3}
```

View file

@ -13,7 +13,7 @@ Anthropic inference provider for accessing Claude models and Anthropic's AI serv
## Sample Configuration
```yaml
api_key: ${env.ANTHROPIC_API_KEY}
api_key: ${env.ANTHROPIC_API_KEY:=}
```

View file

@ -15,7 +15,7 @@ Cerebras inference provider for running models on Cerebras Cloud platform.
```yaml
base_url: https://api.cerebras.ai
api_key: ${env.CEREBRAS_API_KEY}
api_key: ${env.CEREBRAS_API_KEY:=}
```

View file

@ -14,8 +14,8 @@ Databricks inference provider for running models on Databricks' unified analytic
## Sample Configuration
```yaml
url: ${env.DATABRICKS_URL}
api_token: ${env.DATABRICKS_API_TOKEN}
url: ${env.DATABRICKS_URL:=}
api_token: ${env.DATABRICKS_API_TOKEN:=}
```

View file

@ -8,6 +8,7 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Fireworks.ai API Key |
@ -15,7 +16,7 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
```yaml
url: https://api.fireworks.ai/inference/v1
api_key: ${env.FIREWORKS_API_KEY}
api_key: ${env.FIREWORKS_API_KEY:=}
```

View file

@ -13,7 +13,7 @@ Google Gemini inference provider for accessing Gemini models and Google's AI ser
## Sample Configuration
```yaml
api_key: ${env.GEMINI_API_KEY}
api_key: ${env.GEMINI_API_KEY:=}
```

View file

@ -15,7 +15,7 @@ Groq inference provider for ultra-fast inference using Groq's LPU technology.
```yaml
url: https://api.groq.com
api_key: ${env.GROQ_API_KEY}
api_key: ${env.GROQ_API_KEY:=}
```

View file

@ -9,6 +9,7 @@ Ollama inference provider for running local models through the Ollama runtime.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `url` | `<class 'str'>` | No | http://localhost:11434 | |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
## Sample Configuration

View file

@ -9,11 +9,13 @@ OpenAI inference provider for accessing GPT models and other OpenAI services.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | API key for OpenAI models |
| `base_url` | `<class 'str'>` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
## Sample Configuration
```yaml
api_key: ${env.OPENAI_API_KEY}
api_key: ${env.OPENAI_API_KEY:=}
base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1}
```

View file

@ -15,7 +15,7 @@ SambaNova OpenAI-compatible provider for using SambaNova models with OpenAI API
```yaml
openai_compat_api_base: https://api.sambanova.ai/v1
api_key: ${env.SAMBANOVA_API_KEY}
api_key: ${env.SAMBANOVA_API_KEY:=}
```

View file

@ -15,7 +15,7 @@ SambaNova inference provider for running models on SambaNova's dataflow architec
```yaml
url: https://api.sambanova.ai/v1
api_key: ${env.SAMBANOVA_API_KEY}
api_key: ${env.SAMBANOVA_API_KEY:=}
```

View file

@ -13,7 +13,7 @@ Text Generation Inference (TGI) provider for HuggingFace model serving.
## Sample Configuration
```yaml
url: ${env.TGI_URL}
url: ${env.TGI_URL:=}
```

View file

@ -8,6 +8,7 @@ Together AI inference provider for open-source models and collaborative AI devel
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Together AI API Key |
@ -15,7 +16,7 @@ Together AI inference provider for open-source models and collaborative AI devel
```yaml
url: https://api.together.xyz/v1
api_key: ${env.TOGETHER_API_KEY}
api_key: ${env.TOGETHER_API_KEY:=}
```

View file

@ -12,11 +12,12 @@ Remote vLLM inference provider for connecting to vLLM servers.
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
| `api_token` | `str \| None` | No | fake | The API token |
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
## Sample Configuration
```yaml
url: ${env.VLLM_URL}
url: ${env.VLLM_URL:=}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}

View file

@ -15,7 +15,7 @@ SambaNova's safety provider for content moderation and safety filtering.
```yaml
url: https://api.sambanova.ai/v1
api_key: ${env.SAMBANOVA_API_KEY}
api_key: ${env.SAMBANOVA_API_KEY:=}
```

View file

@ -42,11 +42,15 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `db_path` | `<class 'str'>` | No | PydanticUndefined | |
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
## Sample Configuration
```yaml
db_path: ${env.CHROMADB_PATH}
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/chroma_inline_registry.db
```

View file

@ -41,11 +41,15 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `url` | `str \| None` | No | PydanticUndefined | |
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
## Sample Configuration
```yaml
url: ${env.CHROMADB_URL}
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/chroma_remote_registry.db
```

View file

@ -17,7 +17,7 @@ That means you'll get fast and efficient vector retrieval.
To use PGVector in your Llama Stack project, follow these steps:
1. Install the necessary dependencies.
2. Configure your Llama Stack project to use Faiss.
2. Configure your Llama Stack project to use pgvector. (e.g. remote::pgvector).
3. Start storing and querying vectors.
## Installation