mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-23 00:49:42 +00:00
Merge branch 'main' into update-api-docs
This commit is contained in:
commit
cd16c72cdf
136 changed files with 4279 additions and 1465 deletions
392
docs/source/apis/external.md
Normal file
392
docs/source/apis/external.md
Normal file
|
|
@ -0,0 +1,392 @@
|
|||
# External APIs
|
||||
|
||||
Llama Stack supports external APIs that live outside of the main codebase. This allows you to:
|
||||
- Create and maintain your own APIs independently
|
||||
- Share APIs with others without contributing to the main codebase
|
||||
- Keep API-specific code separate from the core Llama Stack code
|
||||
|
||||
## Configuration
|
||||
|
||||
To enable external APIs, you need to configure the `external_apis_dir` in your Llama Stack configuration. This directory should contain your external API specifications:
|
||||
|
||||
```yaml
|
||||
external_apis_dir: ~/.llama/apis.d/
|
||||
```
|
||||
|
||||
## Directory Structure
|
||||
|
||||
The external APIs directory should follow this structure:
|
||||
|
||||
```
|
||||
apis.d/
|
||||
custom_api1.yaml
|
||||
custom_api2.yaml
|
||||
```
|
||||
|
||||
Each YAML file in these directories defines an API specification.
|
||||
|
||||
## API Specification
|
||||
|
||||
Here's an example of an external API specification for a weather API:
|
||||
|
||||
```yaml
|
||||
module: weather
|
||||
api_dependencies:
|
||||
- inference
|
||||
protocol: WeatherAPI
|
||||
name: weather
|
||||
pip_packages:
|
||||
- llama-stack-api-weather
|
||||
```
|
||||
|
||||
### API Specification Fields
|
||||
|
||||
- `module`: Python module containing the API implementation
|
||||
- `protocol`: Name of the protocol class for the API
|
||||
- `name`: Name of the API
|
||||
- `pip_packages`: List of pip packages to install the API, typically a single package
|
||||
|
||||
## Required Implementation
|
||||
|
||||
External APIs must expose a `available_providers()` function in their module that returns a list of provider names:
|
||||
|
||||
```python
|
||||
# llama_stack_api_weather/api.py
|
||||
from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
|
||||
|
||||
|
||||
def available_providers() -> list[ProviderSpec]:
|
||||
return [
|
||||
InlineProviderSpec(
|
||||
api=Api.weather,
|
||||
provider_type="inline::darksky",
|
||||
pip_packages=[],
|
||||
module="llama_stack_provider_darksky",
|
||||
config_class="llama_stack_provider_darksky.DarkSkyWeatherImplConfig",
|
||||
),
|
||||
]
|
||||
```
|
||||
|
||||
A Protocol class like so:
|
||||
|
||||
```python
|
||||
# llama_stack_api_weather/api.py
|
||||
from typing import Protocol
|
||||
|
||||
from llama_stack.schema_utils import webmethod
|
||||
|
||||
|
||||
class WeatherAPI(Protocol):
|
||||
"""
|
||||
A protocol for the Weather API.
|
||||
"""
|
||||
|
||||
@webmethod(route="/locations", method="GET")
|
||||
async def get_available_locations() -> dict[str, list[str]]:
|
||||
"""
|
||||
Get the available locations.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
## Example: Custom API
|
||||
|
||||
Here's a complete example of creating and using a custom API:
|
||||
|
||||
1. First, create the API package:
|
||||
|
||||
```bash
|
||||
mkdir -p llama-stack-api-weather
|
||||
cd llama-stack-api-weather
|
||||
mkdir src/llama_stack_api_weather
|
||||
git init
|
||||
uv init
|
||||
```
|
||||
|
||||
2. Edit `pyproject.toml`:
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "llama-stack-api-weather"
|
||||
version = "0.1.0"
|
||||
description = "Weather API for Llama Stack"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = ["llama-stack", "pydantic"]
|
||||
|
||||
[build-system]
|
||||
requires = ["setuptools"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["src"]
|
||||
include = ["llama_stack_api_weather", "llama_stack_api_weather.*"]
|
||||
```
|
||||
|
||||
3. Create the initial files:
|
||||
|
||||
```bash
|
||||
touch src/llama_stack_api_weather/__init__.py
|
||||
touch src/llama_stack_api_weather/api.py
|
||||
```
|
||||
|
||||
```python
|
||||
# llama-stack-api-weather/src/llama_stack_api_weather/__init__.py
|
||||
"""Weather API for Llama Stack."""
|
||||
|
||||
from .api import WeatherAPI, available_providers
|
||||
|
||||
__all__ = ["WeatherAPI", "available_providers"]
|
||||
```
|
||||
|
||||
4. Create the API implementation:
|
||||
|
||||
```python
|
||||
# llama-stack-api-weather/src/llama_stack_api_weather/weather.py
|
||||
from typing import Protocol
|
||||
|
||||
from llama_stack.providers.datatypes import (
|
||||
AdapterSpec,
|
||||
Api,
|
||||
ProviderSpec,
|
||||
RemoteProviderSpec,
|
||||
)
|
||||
from llama_stack.schema_utils import webmethod
|
||||
|
||||
|
||||
def available_providers() -> list[ProviderSpec]:
|
||||
return [
|
||||
RemoteProviderSpec(
|
||||
api=Api.weather,
|
||||
provider_type="remote::kaze",
|
||||
config_class="llama_stack_provider_kaze.KazeProviderConfig",
|
||||
adapter=AdapterSpec(
|
||||
adapter_type="kaze",
|
||||
module="llama_stack_provider_kaze",
|
||||
pip_packages=["llama_stack_provider_kaze"],
|
||||
config_class="llama_stack_provider_kaze.KazeProviderConfig",
|
||||
),
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
class WeatherProvider(Protocol):
|
||||
"""
|
||||
A protocol for the Weather API.
|
||||
"""
|
||||
|
||||
@webmethod(route="/weather/locations", method="GET")
|
||||
async def get_available_locations() -> dict[str, list[str]]:
|
||||
"""
|
||||
Get the available locations.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
5. Create the API specification:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/apis.d/weather.yaml
|
||||
module: llama_stack_api_weather
|
||||
name: weather
|
||||
pip_packages: ["llama-stack-api-weather"]
|
||||
protocol: WeatherProvider
|
||||
|
||||
```
|
||||
|
||||
6. Install the API package:
|
||||
|
||||
```bash
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
7. Configure Llama Stack to use external APIs:
|
||||
|
||||
```yaml
|
||||
version: "2"
|
||||
image_name: "llama-stack-api-weather"
|
||||
apis:
|
||||
- weather
|
||||
providers: {}
|
||||
external_apis_dir: ~/.llama/apis.d
|
||||
```
|
||||
|
||||
The API will now be available at `/v1/weather/locations`.
|
||||
|
||||
## Example: custom provider for the weather API
|
||||
|
||||
1. Create the provider package:
|
||||
|
||||
```bash
|
||||
mkdir -p llama-stack-provider-kaze
|
||||
cd llama-stack-provider-kaze
|
||||
uv init
|
||||
```
|
||||
|
||||
2. Edit `pyproject.toml`:
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "llama-stack-provider-kaze"
|
||||
version = "0.1.0"
|
||||
description = "Kaze weather provider for Llama Stack"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = ["llama-stack", "pydantic", "aiohttp"]
|
||||
|
||||
[build-system]
|
||||
requires = ["setuptools"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["src"]
|
||||
include = ["llama_stack_provider_kaze", "llama_stack_provider_kaze.*"]
|
||||
```
|
||||
|
||||
3. Create the initial files:
|
||||
|
||||
```bash
|
||||
touch src/llama_stack_provider_kaze/__init__.py
|
||||
touch src/llama_stack_provider_kaze/kaze.py
|
||||
```
|
||||
|
||||
4. Create the provider implementation:
|
||||
|
||||
|
||||
Initialization function:
|
||||
|
||||
```python
|
||||
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/__init__.py
|
||||
"""Kaze weather provider for Llama Stack."""
|
||||
|
||||
from .config import KazeProviderConfig
|
||||
from .kaze import WeatherKazeAdapter
|
||||
|
||||
__all__ = ["KazeProviderConfig", "WeatherKazeAdapter"]
|
||||
|
||||
|
||||
async def get_adapter_impl(config: KazeProviderConfig, _deps):
|
||||
from .kaze import WeatherKazeAdapter
|
||||
|
||||
impl = WeatherKazeAdapter(config)
|
||||
await impl.initialize()
|
||||
return impl
|
||||
```
|
||||
|
||||
Configuration:
|
||||
|
||||
```python
|
||||
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/config.py
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class KazeProviderConfig(BaseModel):
|
||||
"""Configuration for the Kaze weather provider."""
|
||||
|
||||
base_url: str = Field(
|
||||
"https://api.kaze.io/v1",
|
||||
description="Base URL for the Kaze weather API",
|
||||
)
|
||||
```
|
||||
|
||||
Main implementation:
|
||||
|
||||
```python
|
||||
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/kaze.py
|
||||
from llama_stack_api_weather.api import WeatherProvider
|
||||
|
||||
from .config import KazeProviderConfig
|
||||
|
||||
|
||||
class WeatherKazeAdapter(WeatherProvider):
|
||||
"""Kaze weather provider implementation."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
config: KazeProviderConfig,
|
||||
) -> None:
|
||||
self.config = config
|
||||
|
||||
async def initialize(self) -> None:
|
||||
pass
|
||||
|
||||
async def get_available_locations(self) -> dict[str, list[str]]:
|
||||
"""Get available weather locations."""
|
||||
return {"locations": ["Paris", "Tokyo"]}
|
||||
```
|
||||
|
||||
5. Create the provider specification:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/providers.d/remote/weather/kaze.yaml
|
||||
adapter:
|
||||
adapter_type: kaze
|
||||
pip_packages: ["llama_stack_provider_kaze"]
|
||||
config_class: llama_stack_provider_kaze.config.KazeProviderConfig
|
||||
module: llama_stack_provider_kaze
|
||||
optional_api_dependencies: []
|
||||
```
|
||||
|
||||
6. Install the provider package:
|
||||
|
||||
```bash
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
7. Configure Llama Stack to use the provider:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/run-byoa.yaml
|
||||
version: "2"
|
||||
image_name: "llama-stack-api-weather"
|
||||
apis:
|
||||
- weather
|
||||
providers:
|
||||
weather:
|
||||
- provider_id: kaze
|
||||
provider_type: remote::kaze
|
||||
config: {}
|
||||
external_apis_dir: ~/.llama/apis.d
|
||||
external_providers_dir: ~/.llama/providers.d
|
||||
server:
|
||||
port: 8321
|
||||
```
|
||||
|
||||
8. Run the server:
|
||||
|
||||
```bash
|
||||
python -m llama_stack.distribution.server.server --yaml-config ~/.llama/run-byoa.yaml
|
||||
```
|
||||
|
||||
9. Test the API:
|
||||
|
||||
```bash
|
||||
curl -sSf http://127.0.0.1:8321/v1/weather/locations
|
||||
{"locations":["Paris","Tokyo"]}%
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Package Naming**: Use a clear and descriptive name for your API package.
|
||||
|
||||
2. **Version Management**: Keep your API package versioned and compatible with the Llama Stack version you're using.
|
||||
|
||||
3. **Dependencies**: Only include the minimum required dependencies in your API package.
|
||||
|
||||
4. **Documentation**: Include clear documentation in your API package about:
|
||||
- Installation requirements
|
||||
- Configuration options
|
||||
- API endpoints and usage
|
||||
- Any limitations or known issues
|
||||
|
||||
5. **Testing**: Include tests in your API package to ensure it works correctly with Llama Stack.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If your external API isn't being loaded:
|
||||
|
||||
1. Check that the `external_apis_dir` path is correct and accessible.
|
||||
2. Verify that the YAML files are properly formatted.
|
||||
3. Ensure all required Python packages are installed.
|
||||
4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more information using `LLAMA_STACK_LOGGING=all=debug`.
|
||||
5. Verify that the API package is installed in your Python environment.
|
||||
|
|
@ -10,9 +10,11 @@ A Llama Stack API is described as a collection of REST endpoints. We currently s
|
|||
- **Eval**: generate outputs (via Inference or Agents) and perform scoring
|
||||
- **VectorIO**: perform operations on vector stores, such as adding documents, searching, and deleting documents
|
||||
- **Telemetry**: collect telemetry data from the system
|
||||
- **Post Training**: fine-tune a model
|
||||
- **Tool Runtime**: interact with various tools and protocols
|
||||
- **Responses**: generate responses from an LLM using this OpenAI compatible API.
|
||||
|
||||
We are working on adding a few more APIs to complete the application lifecycle. These will include:
|
||||
- **Batch Inference**: run inference on a dataset of inputs
|
||||
- **Batch Agents**: run agents on a dataset of inputs
|
||||
- **Post Training**: fine-tune a model
|
||||
- **Synthetic Data Generation**: generate synthetic data for model development
|
||||
|
|
|
|||
|
|
@ -504,6 +504,47 @@ created by users sharing a team with them:
|
|||
description: any user has read access to any resource created by a user with the same team
|
||||
```
|
||||
|
||||
#### API Endpoint Authorization with Scopes
|
||||
|
||||
In addition to resource-based access control, Llama Stack supports endpoint-level authorization using OAuth 2.0 style scopes. When authentication is enabled, specific API endpoints require users to have particular scopes in their authentication token.
|
||||
|
||||
**Scope-Gated APIs:**
|
||||
The following APIs are currently gated by scopes:
|
||||
|
||||
- **Telemetry API** (scope: `telemetry.read`):
|
||||
- `POST /telemetry/traces` - Query traces
|
||||
- `GET /telemetry/traces/{trace_id}` - Get trace by ID
|
||||
- `GET /telemetry/traces/{trace_id}/spans/{span_id}` - Get span by ID
|
||||
- `POST /telemetry/spans/{span_id}/tree` - Get span tree
|
||||
- `POST /telemetry/spans` - Query spans
|
||||
- `POST /telemetry/metrics/{metric_name}` - Query metrics
|
||||
|
||||
**Authentication Configuration:**
|
||||
|
||||
For **JWT/OAuth2 providers**, scopes should be included in the JWT's claims:
|
||||
```json
|
||||
{
|
||||
"sub": "user123",
|
||||
"scope": "telemetry.read",
|
||||
"aud": "llama-stack"
|
||||
}
|
||||
```
|
||||
|
||||
For **custom authentication providers**, the endpoint must return user attributes including the `scopes` array:
|
||||
```json
|
||||
{
|
||||
"principal": "user123",
|
||||
"attributes": {
|
||||
"scopes": ["telemetry.read"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- Users without the required scope receive a 403 Forbidden response
|
||||
- When authentication is disabled, scope checks are bypassed
|
||||
- Endpoints without `required_scope` work normally for all authenticated users
|
||||
|
||||
### Quota Configuration
|
||||
|
||||
The `quota` section allows you to enable server-side request throttling for both
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
---
|
||||
orphan: true
|
||||
---
|
||||
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
|
||||
# NVIDIA Distribution
|
||||
|
||||
|
|
|
|||
|
|
@ -7,7 +7,17 @@ Llama Stack supports external providers that live outside of the main codebase.
|
|||
|
||||
## Configuration
|
||||
|
||||
To enable external providers, you need to configure the `external_providers_dir` in your Llama Stack configuration. This directory should contain your external provider specifications:
|
||||
To enable external providers, you need to add `module` into your build yaml, allowing Llama Stack to install the required package corresponding to the external provider.
|
||||
|
||||
an example entry in your build.yaml should look like:
|
||||
|
||||
```
|
||||
- provider_id: ramalama
|
||||
provider_type: remote::ramalama
|
||||
module: ramalama_stack
|
||||
```
|
||||
|
||||
Additionally you can configure the `external_providers_dir` in your Llama Stack configuration. This method is in the process of being deprecated in favor of the `module` method. If using this method, the external provider directory should contain your external provider specifications:
|
||||
|
||||
```yaml
|
||||
external_providers_dir: ~/.llama/providers.d/
|
||||
|
|
@ -112,6 +122,31 @@ container_image: custom-vector-store:latest # optional
|
|||
|
||||
## Required Implementation
|
||||
|
||||
## All Providers
|
||||
|
||||
All providers must contain a `get_provider_spec` function in their `provider` module. This is a standardized structure that Llama Stack expects and is necessary for getting things such as the config class. The `get_provider_spec` method returns a structure identical to the `adapter`. An example function may look like:
|
||||
|
||||
```python
|
||||
from llama_stack.providers.datatypes import (
|
||||
ProviderSpec,
|
||||
Api,
|
||||
AdapterSpec,
|
||||
remote_provider_spec,
|
||||
)
|
||||
|
||||
|
||||
def get_provider_spec() -> ProviderSpec:
|
||||
return remote_provider_spec(
|
||||
api=Api.inference,
|
||||
adapter=AdapterSpec(
|
||||
adapter_type="ramalama",
|
||||
pip_packages=["ramalama>=0.8.5", "pymilvus"],
|
||||
config_class="ramalama_stack.config.RamalamaImplConfig",
|
||||
module="ramalama_stack",
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
### Remote Providers
|
||||
|
||||
Remote providers must expose a `get_adapter_impl()` function in their module that takes two arguments:
|
||||
|
|
@ -155,7 +190,7 @@ Version: 0.1.0
|
|||
Location: /path/to/venv/lib/python3.10/site-packages
|
||||
```
|
||||
|
||||
## Example: Custom Ollama Provider
|
||||
## Example using `external_providers_dir`: Custom Ollama Provider
|
||||
|
||||
Here's a complete example of creating and using a custom Ollama provider:
|
||||
|
||||
|
|
@ -206,6 +241,35 @@ external_providers_dir: ~/.llama/providers.d/
|
|||
|
||||
The provider will now be available in Llama Stack with the type `remote::custom_ollama`.
|
||||
|
||||
|
||||
## Example using `module`: ramalama-stack
|
||||
|
||||
[ramalama-stack](https://github.com/containers/ramalama-stack) is a recognized external provider that supports installation via module.
|
||||
|
||||
To install Llama Stack with this external provider a user can provider the following build.yaml:
|
||||
|
||||
```yaml
|
||||
version: 2
|
||||
distribution_spec:
|
||||
description: Use (an external) Ramalama server for running LLM inference
|
||||
container_image: null
|
||||
providers:
|
||||
inference:
|
||||
- provider_id: ramalama
|
||||
provider_type: remote::ramalama
|
||||
module: ramalama_stack==0.3.0a0
|
||||
image_type: venv
|
||||
image_name: null
|
||||
external_providers_dir: null
|
||||
additional_pip_packages:
|
||||
- aiosqlite
|
||||
- sqlalchemy[asyncio]
|
||||
```
|
||||
|
||||
No other steps are required other than `llama stack build` and `llama stack run`. The build process will use `module` to install all of the provider dependencies, retrieve the spec, etc.
|
||||
|
||||
The provider will now be available in Llama Stack with the type `remote::ramalama`.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Package Naming**: Use the prefix `llama-stack-provider-` for your provider packages to make them easily identifiable.
|
||||
|
|
@ -229,9 +293,10 @@ information. Execute the test for the Provider type you are developing.
|
|||
|
||||
If your external provider isn't being loaded:
|
||||
|
||||
1. Check that `module` points to a published pip package with a top level `provider` module including `get_provider_spec`.
|
||||
1. Check that the `external_providers_dir` path is correct and accessible.
|
||||
2. Verify that the YAML files are properly formatted.
|
||||
3. Ensure all required Python packages are installed.
|
||||
4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more
|
||||
information using `LLAMA_STACK_LOGGING=all=debug`.
|
||||
5. Verify that the provider package is installed in your Python environment.
|
||||
5. Verify that the provider package is installed in your Python environment if using `external_providers_dir`.
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
|
|||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Fireworks.ai API Key |
|
||||
|
||||
|
|
|
|||
|
|
@ -9,8 +9,7 @@ Ollama inference provider for running local models through the Ollama runtime.
|
|||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `url` | `<class 'str'>` | No | http://localhost:11434 | |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | refresh and re-register models periodically |
|
||||
| `refresh_models_interval` | `<class 'int'>` | No | 300 | interval in seconds to refresh models |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ Together AI inference provider for open-source models and collaborative AI devel
|
|||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Together AI API Key |
|
||||
|
||||
|
|
|
|||
|
|
@ -13,7 +13,6 @@ Remote vLLM inference provider for connecting to vLLM servers.
|
|||
| `api_token` | `str \| None` | No | fake | The API token |
|
||||
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
|
||||
| `refresh_models_interval` | `<class 'int'>` | No | 300 | Interval in seconds to refresh models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue