docs: concepts and building_applications migration (#3534)

# What does this PR do?

- Migrates the remaining documentation sections to the new documentation format

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

## Test Plan

- Partial migration

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
This commit is contained in:
Alexey Rybak 2025-09-24 14:05:30 -07:00 committed by GitHub
parent 05ff4c4420
commit c71ce8df61
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
82 changed files with 2535 additions and 1237 deletions

View file

@ -0,0 +1,101 @@
---
title: API Stability Leveling
description: Understanding API stability levels and versioning in Llama Stack
sidebar_label: API Stability
sidebar_position: 4
---
# Llama Stack API Stability Leveling
In order to provide a stable experience in Llama Stack, the various APIs need different stability levels indicating the level of support, backwards compatability, and overall production readiness.
## Different Levels
### v1alpha
- Little to no expectation of support between versions
- Breaking changes are permitted
- Datatypes and parameters can break
- Routes can be added and removed
#### Graduation Criteria
- an API can graduate from `v1alpha` to `v1beta` if the team has identified the extent of the non-optional routes and the shape of their parameters/return types for the API eg. `/v1/openai/chat/completions`. Optional types can change.
- CRUD must stay stable once in `v1beta`. This is a commitment to backward compatibility, guaranteeing that most code you write against the v1beta version will not break during future updates. We may make additive changes (like adding a new, optional field to a response), but we will not make breaking changes (like renaming an existing "modelName" field to "name", changing an ID's data type from an integer to a string, or altering an endpoint URL).
- for OpenAI APIs, a comparison to the OpenAI spec for the specific API can be done to ensure completeness.
### v1beta
- API routes remain consistent between versions
- Parameters and return types are not ensured between versions
- API, besides minor fixes and adjustments, should be _almost_ v1. Changes should not be drastic.
#### Graduation Criteria
- an API can graduate from `v1beta` to `v1` if the API surface and datatypes are complete as identified by the team. The parameters and return types that are mandatory for each route are stable. All aspects of graduating from `v1alpha1` to `v1beta` apply as well.
- Optional parameters, routes, or parts of the return type can be added after graduating to `v1`
### v1 (stable)
- Considered stable
- Backwards compatible between Z-streams
- Y-stream breaking changes must go through the proper approval and announcement process.
- Datatypes for a route and its return types cannot change between Z-streams
- Y-stream datatype changes should be sparing, unless the changes are additional net-new parameters
- Must have proper conformance testing as outlined in https://github.com/llamastack/llama-stack/issues/3237
### v2+ (Major Versions)
Introducing a new major version like `/v2` is a significant and disruptive event that should be treated as a last resort. It is reserved for essential changes to a stable `/v1` API that are fundamentally backward-incompatible and cannot be implemented through additive, non-breaking changes or breaking changes across X/Y-Stream releases (x.y.z).
If a `/v2` version is deemed absolutely necessary, it must adhere to the following protocol to ensure a sane and predictable transition for users:
#### Lifecycle Progression
A new major version must follow the same stability lifecycle as `/v1`. It will be introduced as `/v2alpha`, mature to `/v2beta`, and finally become stable as `/v2`.
#### Coexistence:
The new `/v2` API must be introduced alongside the existing `/v1` API and run in parallel. It must not replace the `/v1` API immediately.
#### Deprecation Policy:
When a `/v2` API is introduced, a clear and generous deprecation policy for the `/v1` API must be published simultaneously. This policy must outline the timeline for the eventual removal of the `/v1` API, giving users ample time to migrate.
### API Stability vs. Provider Stability
The leveling introduced in this document relates to the stability of the API and not specifically the providers within the API.
Providers can iterate as much as they want on functionality as long as they work within the bounds of an API. If they need to change the API, then the API should not be `/v1`, or those breaking changes can only happen on a y-stream release basis.
### Approval and Announcement Process for Breaking Changes
- **PR Labeling**: Any pull request that introduces a breaking API change must be clearly labeled with `breaking-change`.
- **PR Title/Commit**: Any pull request that introduces a breaking API change must contain `BREAKING CHANGE` in the title and commit footer. Alternatively, the commit can include `!`, eg. `feat(api)!: title goes here` This is outlined in the [conventional commits documentation](https://www.conventionalcommits.org/en/v1.0.0/#specification)
- **Maintainer Review**: At least one maintainer must explicitly acknowledge the breaking change during review by applying the `breaking-change` label. An approval must come with this label or the acknowledgement this label has already been applied.
- **Announcement**: Breaking changes require inclusion in release notes and, if applicable, a separate communication (e.g., Discord, Github Issues, or GitHub Discussions) prior to release.
If a PR has proper approvals, labels, and commit/title hygiene, the failing API conformance tests will be bypassed.
## Enforcement
### Migration of API routes under `/v1alpha`, `/v1beta`, and `/v1`
Instead of placing every API under `/v1`, any API that is not fully stable or complete should go under `/v1alpha` or `/v1beta`. For example, at the time of this writing, `post_training` belongs here, as well as any OpenAI-compatible API whose surface does not exactly match the upstream OpenAI API it mimics.
This migration is crucial as we get Llama Stack in the hands of users who intend to productize various APIs. A clear view of what is stable and what is actively being developed will enable users to pick and choose various APIs to build their products on.
This migration will be a breaking change for any API moving out of `/v1`. Ideally, this should happen before 0.3.0 and especially 1.0.0.
### `x-stability` tags in the OpenAPI spec for oasdiff
`x-stability` tags allow tools like oasdiff to enforce different rules for different stability levels; these tags should match the routes: [oasdiff stability](https://github.com/oasdiff/oasdiff/blob/main/docs/STABILITY.md)
### Testing
The testing of each stable API is already outlined in [issue #3237](https://github.com/llamastack/llama-stack/issues/3237) and is being worked on. These sorts of conformance tests should apply primarily to `/v1` APIs only, with `/v1alpha` and `/v1beta` having any tests the maintainers see fit as well as basic testing to ensure the routing works properly.
### New APIs going forward
Any subsequently introduced APIs should be introduced as `/v1alpha`

View file

@ -0,0 +1,19 @@
---
title: API Providers
description: Understanding remote vs inline provider implementations
sidebar_label: API Providers
sidebar_position: 2
---
# API Providers
The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
- LLM inference providers (e.g., Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.),
- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, etc.),
- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.)
Providers come in two flavors:
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
- **Inline**: the provider is fully specified and implemented within the Llama Stack codebase. It may be a simple wrapper around an existing library, or a full fledged implementation within Llama Stack.
Most importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.

View file

@ -0,0 +1,398 @@
---
title: External APIs
description: Understanding external APIs in Llama Stack
sidebar_label: External APIs
sidebar_position: 4
---
# External APIs
Llama Stack supports external APIs that live outside of the main codebase. This allows you to:
- Create and maintain your own APIs independently
- Share APIs with others without contributing to the main codebase
- Keep API-specific code separate from the core Llama Stack code
## Configuration
To enable external APIs, you need to configure the `external_apis_dir` in your Llama Stack configuration. This directory should contain your external API specifications:
```yaml
external_apis_dir: ~/.llama/apis.d/
```
## Directory Structure
The external APIs directory should follow this structure:
```
apis.d/
custom_api1.yaml
custom_api2.yaml
```
Each YAML file in these directories defines an API specification.
## API Specification
Here's an example of an external API specification for a weather API:
```yaml
module: weather
api_dependencies:
- inference
protocol: WeatherAPI
name: weather
pip_packages:
- llama-stack-api-weather
```
### API Specification Fields
- `module`: Python module containing the API implementation
- `protocol`: Name of the protocol class for the API
- `name`: Name of the API
- `pip_packages`: List of pip packages to install the API, typically a single package
## Required Implementation
External APIs must expose a `available_providers()` function in their module that returns a list of provider names:
```python
# llama_stack_api_weather/api.py
from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
def available_providers() -> list[ProviderSpec]:
return [
InlineProviderSpec(
api=Api.weather,
provider_type="inline::darksky",
pip_packages=[],
module="llama_stack_provider_darksky",
config_class="llama_stack_provider_darksky.DarkSkyWeatherImplConfig",
),
]
```
A Protocol class like so:
```python
# llama_stack_api_weather/api.py
from typing import Protocol
from llama_stack.schema_utils import webmethod
class WeatherAPI(Protocol):
"""
A protocol for the Weather API.
"""
@webmethod(route="/locations", method="GET")
async def get_available_locations() -> dict[str, list[str]]:
"""
Get the available locations.
"""
...
```
## Example: Custom API
Here's a complete example of creating and using a custom API:
1. First, create the API package:
```bash
mkdir -p llama-stack-api-weather
cd llama-stack-api-weather
mkdir src/llama_stack_api_weather
git init
uv init
```
2. Edit `pyproject.toml`:
```toml
[project]
name = "llama-stack-api-weather"
version = "0.1.0"
description = "Weather API for Llama Stack"
readme = "README.md"
requires-python = ">=3.12"
dependencies = ["llama-stack", "pydantic"]
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["src"]
include = ["llama_stack_api_weather", "llama_stack_api_weather.*"]
```
3. Create the initial files:
```bash
touch src/llama_stack_api_weather/__init__.py
touch src/llama_stack_api_weather/api.py
```
```python
# llama-stack-api-weather/src/llama_stack_api_weather/__init__.py
"""Weather API for Llama Stack."""
from .api import WeatherAPI, available_providers
__all__ = ["WeatherAPI", "available_providers"]
```
4. Create the API implementation:
```python
# llama-stack-api-weather/src/llama_stack_api_weather/weather.py
from typing import Protocol
from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
ProviderSpec,
RemoteProviderSpec,
)
from llama_stack.schema_utils import webmethod
def available_providers() -> list[ProviderSpec]:
return [
RemoteProviderSpec(
api=Api.weather,
provider_type="remote::kaze",
config_class="llama_stack_provider_kaze.KazeProviderConfig",
adapter=AdapterSpec(
adapter_type="kaze",
module="llama_stack_provider_kaze",
pip_packages=["llama_stack_provider_kaze"],
config_class="llama_stack_provider_kaze.KazeProviderConfig",
),
),
]
class WeatherProvider(Protocol):
"""
A protocol for the Weather API.
"""
@webmethod(route="/weather/locations", method="GET")
async def get_available_locations() -> dict[str, list[str]]:
"""
Get the available locations.
"""
...
```
5. Create the API specification:
```yaml
# ~/.llama/apis.d/weather.yaml
module: llama_stack_api_weather
name: weather
pip_packages: ["llama-stack-api-weather"]
protocol: WeatherProvider
```
6. Install the API package:
```bash
uv pip install -e .
```
7. Configure Llama Stack to use external APIs:
```yaml
version: "2"
image_name: "llama-stack-api-weather"
apis:
- weather
providers: {}
external_apis_dir: ~/.llama/apis.d
```
The API will now be available at `/v1/weather/locations`.
## Example: custom provider for the weather API
1. Create the provider package:
```bash
mkdir -p llama-stack-provider-kaze
cd llama-stack-provider-kaze
uv init
```
2. Edit `pyproject.toml`:
```toml
[project]
name = "llama-stack-provider-kaze"
version = "0.1.0"
description = "Kaze weather provider for Llama Stack"
readme = "README.md"
requires-python = ">=3.12"
dependencies = ["llama-stack", "pydantic", "aiohttp"]
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["src"]
include = ["llama_stack_provider_kaze", "llama_stack_provider_kaze.*"]
```
3. Create the initial files:
```bash
touch src/llama_stack_provider_kaze/__init__.py
touch src/llama_stack_provider_kaze/kaze.py
```
4. Create the provider implementation:
Initialization function:
```python
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/__init__.py
"""Kaze weather provider for Llama Stack."""
from .config import KazeProviderConfig
from .kaze import WeatherKazeAdapter
__all__ = ["KazeProviderConfig", "WeatherKazeAdapter"]
async def get_adapter_impl(config: KazeProviderConfig, _deps):
from .kaze import WeatherKazeAdapter
impl = WeatherKazeAdapter(config)
await impl.initialize()
return impl
```
Configuration:
```python
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/config.py
from pydantic import BaseModel, Field
class KazeProviderConfig(BaseModel):
"""Configuration for the Kaze weather provider."""
base_url: str = Field(
"https://api.kaze.io/v1",
description="Base URL for the Kaze weather API",
)
```
Main implementation:
```python
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/kaze.py
from llama_stack_api_weather.api import WeatherProvider
from .config import KazeProviderConfig
class WeatherKazeAdapter(WeatherProvider):
"""Kaze weather provider implementation."""
def __init__(
self,
config: KazeProviderConfig,
) -> None:
self.config = config
async def initialize(self) -> None:
pass
async def get_available_locations(self) -> dict[str, list[str]]:
"""Get available weather locations."""
return {"locations": ["Paris", "Tokyo"]}
```
5. Create the provider specification:
```yaml
# ~/.llama/providers.d/remote/weather/kaze.yaml
adapter:
adapter_type: kaze
pip_packages: ["llama_stack_provider_kaze"]
config_class: llama_stack_provider_kaze.config.KazeProviderConfig
module: llama_stack_provider_kaze
optional_api_dependencies: []
```
6. Install the provider package:
```bash
uv pip install -e .
```
7. Configure Llama Stack to use the provider:
```yaml
# ~/.llama/run-byoa.yaml
version: "2"
image_name: "llama-stack-api-weather"
apis:
- weather
providers:
weather:
- provider_id: kaze
provider_type: remote::kaze
config: {}
external_apis_dir: ~/.llama/apis.d
external_providers_dir: ~/.llama/providers.d
server:
port: 8321
```
8. Run the server:
```bash
python -m llama_stack.core.server.server --yaml-config ~/.llama/run-byoa.yaml
```
9. Test the API:
```bash
curl -sSf http://127.0.0.1:8321/v1/weather/locations
{"locations":["Paris","Tokyo"]}%
```
## Best Practices
1. **Package Naming**: Use a clear and descriptive name for your API package.
2. **Version Management**: Keep your API package versioned and compatible with the Llama Stack version you're using.
3. **Dependencies**: Only include the minimum required dependencies in your API package.
4. **Documentation**: Include clear documentation in your API package about:
- Installation requirements
- Configuration options
- API endpoints and usage
- Any limitations or known issues
5. **Testing**: Include tests in your API package to ensure it works correctly with Llama Stack.
## Troubleshooting
If your external API isn't being loaded:
1. Check that the `external_apis_dir` path is correct and accessible.
2. Verify that the YAML files are properly formatted.
3. Ensure all required Python packages are installed.
4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more information using `LLAMA_STACK_LOGGING=all=debug`.
5. Verify that the API package is installed in your Python environment.

View file

@ -0,0 +1,28 @@
---
title: APIs
description: Available REST APIs and planned capabilities in Llama Stack
sidebar_label: APIs
sidebar_position: 1
---
# APIs
A Llama Stack API is described as a collection of REST endpoints. We currently support the following APIs:
- **Inference**: run inference with a LLM
- **Safety**: apply safety policies to the output at a Systems (not only model) level
- **Agents**: run multi-step agentic workflows with LLMs with tool usage, memory (RAG), etc.
- **DatasetIO**: interface with datasets and data loaders
- **Scoring**: evaluate outputs of the system
- **Eval**: generate outputs (via Inference or Agents) and perform scoring
- **VectorIO**: perform operations on vector stores, such as adding documents, searching, and deleting documents
- **Telemetry**: collect telemetry data from the system
- **Post Training**: fine-tune a model
- **Tool Runtime**: interact with various tools and protocols
- **Responses**: generate responses from an LLM using this OpenAI compatible API.
We are working on adding a few more APIs to complete the application lifecycle. These will include:
- **Batch Inference**: run inference on a dataset of inputs
- **Batch Agents**: run agents on a dataset of inputs
- **Synthetic Data Generation**: generate synthetic data for model development
- **Batches**: OpenAI-compatible batch management for inference

View file

@ -0,0 +1,74 @@
---
title: Llama Stack Architecture
description: Understanding Llama Stack's service-oriented design and benefits
sidebar_label: Architecture
sidebar_position: 2
---
# Llama Stack architecture
Llama Stack allows you to build different layers of distributions for your AI workloads using various SDKs and API providers.
<img src="/img/llama-stack.png" alt="Llama Stack" width="400" />
## Benefits of Llama stack
### Current challenges in custom AI applications
Building production AI applications today requires solving multiple challenges:
**Infrastructure Complexity**
- Running large language models efficiently requires specialized infrastructure.
- Different deployment scenarios (local development, cloud, edge) need different solutions.
- Moving from development to production often requires significant rework.
**Essential Capabilities**
- Safety guardrails and content filtering are necessary in an enterprise setting.
- Just model inference is not enough - Knowledge retrieval and RAG capabilities are required.
- Nearly any application needs composable multi-step workflows.
- Without monitoring, observability and evaluation, you end up operating in the dark.
**Lack of Flexibility and Choice**
- Directly integrating with multiple providers creates tight coupling.
- Different providers have different APIs and abstractions.
- Changing providers requires significant code changes.
### Our Solution: A Universal Stack
Llama Stack addresses these challenges through a service-oriented, API-first approach:
**Develop Anywhere, Deploy Everywhere**
- Start locally with CPU-only setups
- Move to GPU acceleration when needed
- Deploy to cloud or edge without code changes
- Same APIs and developer experience everywhere
**Production-Ready Building Blocks**
- Pre-built safety guardrails and content filtering
- Built-in RAG and agent capabilities
- Comprehensive evaluation toolkit
- Full observability and monitoring
**True Provider Independence**
- Swap providers without application changes
- Mix and match best-in-class implementations
- Federation and fallback support
- No vendor lock-in
**Robust Ecosystem**
- Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies).
- Ecosystem offers tailored infrastructure, software, and services for deploying a variety of models.
## Our Philosophy
- **Service-Oriented**: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
- **Composability**: Every component is independent but works together seamlessly
- **Production Ready**: Built for real-world applications, not just demos
- **Turnkey Solutions**: Easy to deploy built in solutions for popular deployment scenarios
With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations.

View file

@ -0,0 +1,16 @@
---
title: Distributions
description: Pre-packaged provider configurations for different deployment scenarios
sidebar_label: Distributions
sidebar_position: 5
---
# Distributions
While there is a lot of flexibility to mix-and-match providers, often users will work with a specific set of providers (hardware support, contractual obligations, etc.) We therefore need to provide a _convenient shorthand_ for such collections. We call this shorthand a **Llama Stack Distribution** or a **Distro**. One can think of it as specific pre-packaged versions of the Llama Stack. Here are some examples:
**Remotely Hosted Distro**: These are the simplest to consume from a user perspective. You can simply obtain the API key for these providers, point to a URL and have _all_ Llama Stack APIs working out of the box. Currently, [Fireworks](https://fireworks.ai/) and [Together](https://together.xyz/) provide such easy-to-consume Llama Stack distributions.
**Locally Hosted Distro**: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.
**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](/docs/distributions/ondevice_distro/ios_sdk) and [Android](/docs/distributions/ondevice_distro/android_sdk)

View file

@ -0,0 +1,43 @@
# Core Concepts
Given Llama Stack's service-oriented philosophy, a few concepts and workflows arise which may not feel completely natural in the LLM landscape, especially if you are coming with a background in other frameworks.
## Documentation Structure
This section covers the fundamental concepts of Llama Stack:
- **[Architecture](./architecture.md)** - Learn about Llama Stack's architectural design and principles
- **[APIs](./apis/index.mdx)** - Understanding the core APIs and their stability levels
- [API Overview](./apis/index.mdx) - Core APIs available in Llama Stack
- [API Providers](./apis/api_providers.mdx) - How providers implement APIs
- [API Stability Leveling](./apis/api_leveling.mdx) - API stability and versioning
- **[Distributions](./distributions.md)** - Pre-configured deployment packages
- **[Resources](./resources.md)** - Understanding Llama Stack resources and their lifecycle
- **[External Integration](./external.md)** - Integrating with external services and providers
## Getting Started
If you're new to Llama Stack, we recommend starting with:
1. **[Architecture](./architecture.md)** - Understand the overall system design
2. **[APIs](./apis/index.mdx)** - Learn about the available APIs and their purpose
3. **[Distributions](./distributions.md)** - Choose a pre-configured setup for your use case
Each concept builds upon the previous ones to give you a comprehensive understanding of how Llama Stack works and how to use it effectively.---
title: Core Concepts
description: Understanding Llama Stack's service-oriented philosophy and key concepts
sidebar_label: Overview
sidebar_position: 1
---
# Core Concepts
Given Llama Stack's service-oriented philosophy, a few concepts and workflows arise which may not feel completely natural in the LLM landscape, especially if you are coming with a background in other frameworks.
This section covers the key concepts you need to understand to work effectively with Llama Stack:
- **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits
- **[APIs](./apis)** - Available REST APIs and planned capabilities
- **[API Providers](./api_providers)** - Remote vs inline provider implementations
- **[Distributions](./distributions)** - Pre-packaged provider configurations
- **[Resources](./resources)** - Resource federation and registration

View file

@ -0,0 +1,26 @@
---
title: Resources
description: Resource federation and registration in Llama Stack
sidebar_label: Resources
sidebar_position: 6
---
# Resources
Some of these APIs are associated with a set of **Resources**. Here is the mapping of APIs to resources:
- **Inference**, **Eval** and **Post Training** are associated with `Model` resources.
- **Safety** is associated with `Shield` resources.
- **Tool Runtime** is associated with `ToolGroup` resources.
- **DatasetIO** is associated with `Dataset` resources.
- **VectorIO** is associated with `VectorDB` resources.
- **Scoring** is associated with `ScoringFunction` resources.
- **Eval** is associated with `Model` and `Benchmark` resources.
Furthermore, we allow these resources to be **federated** across multiple providers. For example, you may have some Llama models served by Fireworks while others are served by AWS Bedrock. Regardless, they will all work seamlessly with the same uniform Inference API provided by Llama Stack.
:::tip Registering Resources
Given this architecture, it is necessary for the Stack to know which provider to use for a given resource. This means you need to explicitly _register_ resources (including models) before you can use them with the associated APIs.
:::