mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-22 08:17:18 +00:00
docs: concepts and building_applications migration (#3534)
# What does this PR do? - Migrates the remaining documentation sections to the new documentation format <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - Partial migration <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
This commit is contained in:
parent
05ff4c4420
commit
c71ce8df61
82 changed files with 2535 additions and 1237 deletions
101
docs/docs/concepts/apis/api_leveling.mdx
Normal file
101
docs/docs/concepts/apis/api_leveling.mdx
Normal file
|
@ -0,0 +1,101 @@
|
|||
---
|
||||
title: API Stability Leveling
|
||||
description: Understanding API stability levels and versioning in Llama Stack
|
||||
sidebar_label: API Stability
|
||||
sidebar_position: 4
|
||||
---
|
||||
|
||||
# Llama Stack API Stability Leveling
|
||||
|
||||
In order to provide a stable experience in Llama Stack, the various APIs need different stability levels indicating the level of support, backwards compatability, and overall production readiness.
|
||||
|
||||
## Different Levels
|
||||
|
||||
### v1alpha
|
||||
|
||||
- Little to no expectation of support between versions
|
||||
- Breaking changes are permitted
|
||||
- Datatypes and parameters can break
|
||||
- Routes can be added and removed
|
||||
|
||||
#### Graduation Criteria
|
||||
|
||||
- an API can graduate from `v1alpha` to `v1beta` if the team has identified the extent of the non-optional routes and the shape of their parameters/return types for the API eg. `/v1/openai/chat/completions`. Optional types can change.
|
||||
- CRUD must stay stable once in `v1beta`. This is a commitment to backward compatibility, guaranteeing that most code you write against the v1beta version will not break during future updates. We may make additive changes (like adding a new, optional field to a response), but we will not make breaking changes (like renaming an existing "modelName" field to "name", changing an ID's data type from an integer to a string, or altering an endpoint URL).
|
||||
- for OpenAI APIs, a comparison to the OpenAI spec for the specific API can be done to ensure completeness.
|
||||
|
||||
### v1beta
|
||||
|
||||
- API routes remain consistent between versions
|
||||
- Parameters and return types are not ensured between versions
|
||||
- API, besides minor fixes and adjustments, should be _almost_ v1. Changes should not be drastic.
|
||||
|
||||
#### Graduation Criteria
|
||||
|
||||
- an API can graduate from `v1beta` to `v1` if the API surface and datatypes are complete as identified by the team. The parameters and return types that are mandatory for each route are stable. All aspects of graduating from `v1alpha1` to `v1beta` apply as well.
|
||||
- Optional parameters, routes, or parts of the return type can be added after graduating to `v1`
|
||||
|
||||
### v1 (stable)
|
||||
|
||||
- Considered stable
|
||||
- Backwards compatible between Z-streams
|
||||
- Y-stream breaking changes must go through the proper approval and announcement process.
|
||||
- Datatypes for a route and its return types cannot change between Z-streams
|
||||
- Y-stream datatype changes should be sparing, unless the changes are additional net-new parameters
|
||||
- Must have proper conformance testing as outlined in https://github.com/llamastack/llama-stack/issues/3237
|
||||
|
||||
### v2+ (Major Versions)
|
||||
|
||||
Introducing a new major version like `/v2` is a significant and disruptive event that should be treated as a last resort. It is reserved for essential changes to a stable `/v1` API that are fundamentally backward-incompatible and cannot be implemented through additive, non-breaking changes or breaking changes across X/Y-Stream releases (x.y.z).
|
||||
|
||||
If a `/v2` version is deemed absolutely necessary, it must adhere to the following protocol to ensure a sane and predictable transition for users:
|
||||
|
||||
#### Lifecycle Progression
|
||||
|
||||
A new major version must follow the same stability lifecycle as `/v1`. It will be introduced as `/v2alpha`, mature to `/v2beta`, and finally become stable as `/v2`.
|
||||
|
||||
#### Coexistence:
|
||||
|
||||
The new `/v2` API must be introduced alongside the existing `/v1` API and run in parallel. It must not replace the `/v1` API immediately.
|
||||
|
||||
#### Deprecation Policy:
|
||||
|
||||
When a `/v2` API is introduced, a clear and generous deprecation policy for the `/v1` API must be published simultaneously. This policy must outline the timeline for the eventual removal of the `/v1` API, giving users ample time to migrate.
|
||||
|
||||
### API Stability vs. Provider Stability
|
||||
|
||||
The leveling introduced in this document relates to the stability of the API and not specifically the providers within the API.
|
||||
|
||||
Providers can iterate as much as they want on functionality as long as they work within the bounds of an API. If they need to change the API, then the API should not be `/v1`, or those breaking changes can only happen on a y-stream release basis.
|
||||
|
||||
### Approval and Announcement Process for Breaking Changes
|
||||
|
||||
- **PR Labeling**: Any pull request that introduces a breaking API change must be clearly labeled with `breaking-change`.
|
||||
- **PR Title/Commit**: Any pull request that introduces a breaking API change must contain `BREAKING CHANGE` in the title and commit footer. Alternatively, the commit can include `!`, eg. `feat(api)!: title goes here` This is outlined in the [conventional commits documentation](https://www.conventionalcommits.org/en/v1.0.0/#specification)
|
||||
- **Maintainer Review**: At least one maintainer must explicitly acknowledge the breaking change during review by applying the `breaking-change` label. An approval must come with this label or the acknowledgement this label has already been applied.
|
||||
- **Announcement**: Breaking changes require inclusion in release notes and, if applicable, a separate communication (e.g., Discord, Github Issues, or GitHub Discussions) prior to release.
|
||||
|
||||
If a PR has proper approvals, labels, and commit/title hygiene, the failing API conformance tests will be bypassed.
|
||||
|
||||
|
||||
## Enforcement
|
||||
|
||||
### Migration of API routes under `/v1alpha`, `/v1beta`, and `/v1`
|
||||
|
||||
Instead of placing every API under `/v1`, any API that is not fully stable or complete should go under `/v1alpha` or `/v1beta`. For example, at the time of this writing, `post_training` belongs here, as well as any OpenAI-compatible API whose surface does not exactly match the upstream OpenAI API it mimics.
|
||||
|
||||
This migration is crucial as we get Llama Stack in the hands of users who intend to productize various APIs. A clear view of what is stable and what is actively being developed will enable users to pick and choose various APIs to build their products on.
|
||||
|
||||
This migration will be a breaking change for any API moving out of `/v1`. Ideally, this should happen before 0.3.0 and especially 1.0.0.
|
||||
|
||||
### `x-stability` tags in the OpenAPI spec for oasdiff
|
||||
|
||||
`x-stability` tags allow tools like oasdiff to enforce different rules for different stability levels; these tags should match the routes: [oasdiff stability](https://github.com/oasdiff/oasdiff/blob/main/docs/STABILITY.md)
|
||||
|
||||
### Testing
|
||||
|
||||
The testing of each stable API is already outlined in [issue #3237](https://github.com/llamastack/llama-stack/issues/3237) and is being worked on. These sorts of conformance tests should apply primarily to `/v1` APIs only, with `/v1alpha` and `/v1beta` having any tests the maintainers see fit as well as basic testing to ensure the routing works properly.
|
||||
|
||||
### New APIs going forward
|
||||
|
||||
Any subsequently introduced APIs should be introduced as `/v1alpha`
|
19
docs/docs/concepts/apis/api_providers.mdx
Normal file
19
docs/docs/concepts/apis/api_providers.mdx
Normal file
|
@ -0,0 +1,19 @@
|
|||
---
|
||||
title: API Providers
|
||||
description: Understanding remote vs inline provider implementations
|
||||
sidebar_label: API Providers
|
||||
sidebar_position: 2
|
||||
---
|
||||
|
||||
# API Providers
|
||||
|
||||
The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
|
||||
- LLM inference providers (e.g., Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.),
|
||||
- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, etc.),
|
||||
- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.)
|
||||
|
||||
Providers come in two flavors:
|
||||
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
|
||||
- **Inline**: the provider is fully specified and implemented within the Llama Stack codebase. It may be a simple wrapper around an existing library, or a full fledged implementation within Llama Stack.
|
||||
|
||||
Most importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.
|
398
docs/docs/concepts/apis/external.mdx
Normal file
398
docs/docs/concepts/apis/external.mdx
Normal file
|
@ -0,0 +1,398 @@
|
|||
---
|
||||
title: External APIs
|
||||
description: Understanding external APIs in Llama Stack
|
||||
sidebar_label: External APIs
|
||||
sidebar_position: 4
|
||||
---
|
||||
# External APIs
|
||||
|
||||
Llama Stack supports external APIs that live outside of the main codebase. This allows you to:
|
||||
- Create and maintain your own APIs independently
|
||||
- Share APIs with others without contributing to the main codebase
|
||||
- Keep API-specific code separate from the core Llama Stack code
|
||||
|
||||
## Configuration
|
||||
|
||||
To enable external APIs, you need to configure the `external_apis_dir` in your Llama Stack configuration. This directory should contain your external API specifications:
|
||||
|
||||
```yaml
|
||||
external_apis_dir: ~/.llama/apis.d/
|
||||
```
|
||||
|
||||
## Directory Structure
|
||||
|
||||
The external APIs directory should follow this structure:
|
||||
|
||||
```
|
||||
apis.d/
|
||||
custom_api1.yaml
|
||||
custom_api2.yaml
|
||||
```
|
||||
|
||||
Each YAML file in these directories defines an API specification.
|
||||
|
||||
## API Specification
|
||||
|
||||
Here's an example of an external API specification for a weather API:
|
||||
|
||||
```yaml
|
||||
module: weather
|
||||
api_dependencies:
|
||||
- inference
|
||||
protocol: WeatherAPI
|
||||
name: weather
|
||||
pip_packages:
|
||||
- llama-stack-api-weather
|
||||
```
|
||||
|
||||
### API Specification Fields
|
||||
|
||||
- `module`: Python module containing the API implementation
|
||||
- `protocol`: Name of the protocol class for the API
|
||||
- `name`: Name of the API
|
||||
- `pip_packages`: List of pip packages to install the API, typically a single package
|
||||
|
||||
## Required Implementation
|
||||
|
||||
External APIs must expose a `available_providers()` function in their module that returns a list of provider names:
|
||||
|
||||
```python
|
||||
# llama_stack_api_weather/api.py
|
||||
from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
|
||||
|
||||
|
||||
def available_providers() -> list[ProviderSpec]:
|
||||
return [
|
||||
InlineProviderSpec(
|
||||
api=Api.weather,
|
||||
provider_type="inline::darksky",
|
||||
pip_packages=[],
|
||||
module="llama_stack_provider_darksky",
|
||||
config_class="llama_stack_provider_darksky.DarkSkyWeatherImplConfig",
|
||||
),
|
||||
]
|
||||
```
|
||||
|
||||
A Protocol class like so:
|
||||
|
||||
```python
|
||||
# llama_stack_api_weather/api.py
|
||||
from typing import Protocol
|
||||
|
||||
from llama_stack.schema_utils import webmethod
|
||||
|
||||
|
||||
class WeatherAPI(Protocol):
|
||||
"""
|
||||
A protocol for the Weather API.
|
||||
"""
|
||||
|
||||
@webmethod(route="/locations", method="GET")
|
||||
async def get_available_locations() -> dict[str, list[str]]:
|
||||
"""
|
||||
Get the available locations.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
## Example: Custom API
|
||||
|
||||
Here's a complete example of creating and using a custom API:
|
||||
|
||||
1. First, create the API package:
|
||||
|
||||
```bash
|
||||
mkdir -p llama-stack-api-weather
|
||||
cd llama-stack-api-weather
|
||||
mkdir src/llama_stack_api_weather
|
||||
git init
|
||||
uv init
|
||||
```
|
||||
|
||||
2. Edit `pyproject.toml`:
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "llama-stack-api-weather"
|
||||
version = "0.1.0"
|
||||
description = "Weather API for Llama Stack"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.12"
|
||||
dependencies = ["llama-stack", "pydantic"]
|
||||
|
||||
[build-system]
|
||||
requires = ["setuptools"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["src"]
|
||||
include = ["llama_stack_api_weather", "llama_stack_api_weather.*"]
|
||||
```
|
||||
|
||||
3. Create the initial files:
|
||||
|
||||
```bash
|
||||
touch src/llama_stack_api_weather/__init__.py
|
||||
touch src/llama_stack_api_weather/api.py
|
||||
```
|
||||
|
||||
```python
|
||||
# llama-stack-api-weather/src/llama_stack_api_weather/__init__.py
|
||||
"""Weather API for Llama Stack."""
|
||||
|
||||
from .api import WeatherAPI, available_providers
|
||||
|
||||
__all__ = ["WeatherAPI", "available_providers"]
|
||||
```
|
||||
|
||||
4. Create the API implementation:
|
||||
|
||||
```python
|
||||
# llama-stack-api-weather/src/llama_stack_api_weather/weather.py
|
||||
from typing import Protocol
|
||||
|
||||
from llama_stack.providers.datatypes import (
|
||||
AdapterSpec,
|
||||
Api,
|
||||
ProviderSpec,
|
||||
RemoteProviderSpec,
|
||||
)
|
||||
from llama_stack.schema_utils import webmethod
|
||||
|
||||
|
||||
def available_providers() -> list[ProviderSpec]:
|
||||
return [
|
||||
RemoteProviderSpec(
|
||||
api=Api.weather,
|
||||
provider_type="remote::kaze",
|
||||
config_class="llama_stack_provider_kaze.KazeProviderConfig",
|
||||
adapter=AdapterSpec(
|
||||
adapter_type="kaze",
|
||||
module="llama_stack_provider_kaze",
|
||||
pip_packages=["llama_stack_provider_kaze"],
|
||||
config_class="llama_stack_provider_kaze.KazeProviderConfig",
|
||||
),
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
class WeatherProvider(Protocol):
|
||||
"""
|
||||
A protocol for the Weather API.
|
||||
"""
|
||||
|
||||
@webmethod(route="/weather/locations", method="GET")
|
||||
async def get_available_locations() -> dict[str, list[str]]:
|
||||
"""
|
||||
Get the available locations.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
5. Create the API specification:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/apis.d/weather.yaml
|
||||
module: llama_stack_api_weather
|
||||
name: weather
|
||||
pip_packages: ["llama-stack-api-weather"]
|
||||
protocol: WeatherProvider
|
||||
|
||||
```
|
||||
|
||||
6. Install the API package:
|
||||
|
||||
```bash
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
7. Configure Llama Stack to use external APIs:
|
||||
|
||||
```yaml
|
||||
version: "2"
|
||||
image_name: "llama-stack-api-weather"
|
||||
apis:
|
||||
- weather
|
||||
providers: {}
|
||||
external_apis_dir: ~/.llama/apis.d
|
||||
```
|
||||
|
||||
The API will now be available at `/v1/weather/locations`.
|
||||
|
||||
## Example: custom provider for the weather API
|
||||
|
||||
1. Create the provider package:
|
||||
|
||||
```bash
|
||||
mkdir -p llama-stack-provider-kaze
|
||||
cd llama-stack-provider-kaze
|
||||
uv init
|
||||
```
|
||||
|
||||
2. Edit `pyproject.toml`:
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "llama-stack-provider-kaze"
|
||||
version = "0.1.0"
|
||||
description = "Kaze weather provider for Llama Stack"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.12"
|
||||
dependencies = ["llama-stack", "pydantic", "aiohttp"]
|
||||
|
||||
[build-system]
|
||||
requires = ["setuptools"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["src"]
|
||||
include = ["llama_stack_provider_kaze", "llama_stack_provider_kaze.*"]
|
||||
```
|
||||
|
||||
3. Create the initial files:
|
||||
|
||||
```bash
|
||||
touch src/llama_stack_provider_kaze/__init__.py
|
||||
touch src/llama_stack_provider_kaze/kaze.py
|
||||
```
|
||||
|
||||
4. Create the provider implementation:
|
||||
|
||||
|
||||
Initialization function:
|
||||
|
||||
```python
|
||||
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/__init__.py
|
||||
"""Kaze weather provider for Llama Stack."""
|
||||
|
||||
from .config import KazeProviderConfig
|
||||
from .kaze import WeatherKazeAdapter
|
||||
|
||||
__all__ = ["KazeProviderConfig", "WeatherKazeAdapter"]
|
||||
|
||||
|
||||
async def get_adapter_impl(config: KazeProviderConfig, _deps):
|
||||
from .kaze import WeatherKazeAdapter
|
||||
|
||||
impl = WeatherKazeAdapter(config)
|
||||
await impl.initialize()
|
||||
return impl
|
||||
```
|
||||
|
||||
Configuration:
|
||||
|
||||
```python
|
||||
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/config.py
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class KazeProviderConfig(BaseModel):
|
||||
"""Configuration for the Kaze weather provider."""
|
||||
|
||||
base_url: str = Field(
|
||||
"https://api.kaze.io/v1",
|
||||
description="Base URL for the Kaze weather API",
|
||||
)
|
||||
```
|
||||
|
||||
Main implementation:
|
||||
|
||||
```python
|
||||
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/kaze.py
|
||||
from llama_stack_api_weather.api import WeatherProvider
|
||||
|
||||
from .config import KazeProviderConfig
|
||||
|
||||
|
||||
class WeatherKazeAdapter(WeatherProvider):
|
||||
"""Kaze weather provider implementation."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
config: KazeProviderConfig,
|
||||
) -> None:
|
||||
self.config = config
|
||||
|
||||
async def initialize(self) -> None:
|
||||
pass
|
||||
|
||||
async def get_available_locations(self) -> dict[str, list[str]]:
|
||||
"""Get available weather locations."""
|
||||
return {"locations": ["Paris", "Tokyo"]}
|
||||
```
|
||||
|
||||
5. Create the provider specification:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/providers.d/remote/weather/kaze.yaml
|
||||
adapter:
|
||||
adapter_type: kaze
|
||||
pip_packages: ["llama_stack_provider_kaze"]
|
||||
config_class: llama_stack_provider_kaze.config.KazeProviderConfig
|
||||
module: llama_stack_provider_kaze
|
||||
optional_api_dependencies: []
|
||||
```
|
||||
|
||||
6. Install the provider package:
|
||||
|
||||
```bash
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
7. Configure Llama Stack to use the provider:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/run-byoa.yaml
|
||||
version: "2"
|
||||
image_name: "llama-stack-api-weather"
|
||||
apis:
|
||||
- weather
|
||||
providers:
|
||||
weather:
|
||||
- provider_id: kaze
|
||||
provider_type: remote::kaze
|
||||
config: {}
|
||||
external_apis_dir: ~/.llama/apis.d
|
||||
external_providers_dir: ~/.llama/providers.d
|
||||
server:
|
||||
port: 8321
|
||||
```
|
||||
|
||||
8. Run the server:
|
||||
|
||||
```bash
|
||||
python -m llama_stack.core.server.server --yaml-config ~/.llama/run-byoa.yaml
|
||||
```
|
||||
|
||||
9. Test the API:
|
||||
|
||||
```bash
|
||||
curl -sSf http://127.0.0.1:8321/v1/weather/locations
|
||||
{"locations":["Paris","Tokyo"]}%
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Package Naming**: Use a clear and descriptive name for your API package.
|
||||
|
||||
2. **Version Management**: Keep your API package versioned and compatible with the Llama Stack version you're using.
|
||||
|
||||
3. **Dependencies**: Only include the minimum required dependencies in your API package.
|
||||
|
||||
4. **Documentation**: Include clear documentation in your API package about:
|
||||
- Installation requirements
|
||||
- Configuration options
|
||||
- API endpoints and usage
|
||||
- Any limitations or known issues
|
||||
|
||||
5. **Testing**: Include tests in your API package to ensure it works correctly with Llama Stack.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If your external API isn't being loaded:
|
||||
|
||||
1. Check that the `external_apis_dir` path is correct and accessible.
|
||||
2. Verify that the YAML files are properly formatted.
|
||||
3. Ensure all required Python packages are installed.
|
||||
4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more information using `LLAMA_STACK_LOGGING=all=debug`.
|
||||
5. Verify that the API package is installed in your Python environment.
|
28
docs/docs/concepts/apis/index.mdx
Normal file
28
docs/docs/concepts/apis/index.mdx
Normal file
|
@ -0,0 +1,28 @@
|
|||
---
|
||||
title: APIs
|
||||
description: Available REST APIs and planned capabilities in Llama Stack
|
||||
sidebar_label: APIs
|
||||
sidebar_position: 1
|
||||
---
|
||||
|
||||
# APIs
|
||||
|
||||
A Llama Stack API is described as a collection of REST endpoints. We currently support the following APIs:
|
||||
|
||||
- **Inference**: run inference with a LLM
|
||||
- **Safety**: apply safety policies to the output at a Systems (not only model) level
|
||||
- **Agents**: run multi-step agentic workflows with LLMs with tool usage, memory (RAG), etc.
|
||||
- **DatasetIO**: interface with datasets and data loaders
|
||||
- **Scoring**: evaluate outputs of the system
|
||||
- **Eval**: generate outputs (via Inference or Agents) and perform scoring
|
||||
- **VectorIO**: perform operations on vector stores, such as adding documents, searching, and deleting documents
|
||||
- **Telemetry**: collect telemetry data from the system
|
||||
- **Post Training**: fine-tune a model
|
||||
- **Tool Runtime**: interact with various tools and protocols
|
||||
- **Responses**: generate responses from an LLM using this OpenAI compatible API.
|
||||
|
||||
We are working on adding a few more APIs to complete the application lifecycle. These will include:
|
||||
- **Batch Inference**: run inference on a dataset of inputs
|
||||
- **Batch Agents**: run agents on a dataset of inputs
|
||||
- **Synthetic Data Generation**: generate synthetic data for model development
|
||||
- **Batches**: OpenAI-compatible batch management for inference
|
74
docs/docs/concepts/architecture.mdx
Normal file
74
docs/docs/concepts/architecture.mdx
Normal file
|
@ -0,0 +1,74 @@
|
|||
---
|
||||
title: Llama Stack Architecture
|
||||
description: Understanding Llama Stack's service-oriented design and benefits
|
||||
sidebar_label: Architecture
|
||||
sidebar_position: 2
|
||||
---
|
||||
|
||||
# Llama Stack architecture
|
||||
|
||||
Llama Stack allows you to build different layers of distributions for your AI workloads using various SDKs and API providers.
|
||||
|
||||
<img src="/img/llama-stack.png" alt="Llama Stack" width="400" />
|
||||
|
||||
## Benefits of Llama stack
|
||||
|
||||
### Current challenges in custom AI applications
|
||||
|
||||
Building production AI applications today requires solving multiple challenges:
|
||||
|
||||
**Infrastructure Complexity**
|
||||
|
||||
- Running large language models efficiently requires specialized infrastructure.
|
||||
- Different deployment scenarios (local development, cloud, edge) need different solutions.
|
||||
- Moving from development to production often requires significant rework.
|
||||
|
||||
**Essential Capabilities**
|
||||
|
||||
- Safety guardrails and content filtering are necessary in an enterprise setting.
|
||||
- Just model inference is not enough - Knowledge retrieval and RAG capabilities are required.
|
||||
- Nearly any application needs composable multi-step workflows.
|
||||
- Without monitoring, observability and evaluation, you end up operating in the dark.
|
||||
|
||||
**Lack of Flexibility and Choice**
|
||||
|
||||
- Directly integrating with multiple providers creates tight coupling.
|
||||
- Different providers have different APIs and abstractions.
|
||||
- Changing providers requires significant code changes.
|
||||
|
||||
### Our Solution: A Universal Stack
|
||||
|
||||
Llama Stack addresses these challenges through a service-oriented, API-first approach:
|
||||
|
||||
**Develop Anywhere, Deploy Everywhere**
|
||||
- Start locally with CPU-only setups
|
||||
- Move to GPU acceleration when needed
|
||||
- Deploy to cloud or edge without code changes
|
||||
- Same APIs and developer experience everywhere
|
||||
|
||||
**Production-Ready Building Blocks**
|
||||
- Pre-built safety guardrails and content filtering
|
||||
- Built-in RAG and agent capabilities
|
||||
- Comprehensive evaluation toolkit
|
||||
- Full observability and monitoring
|
||||
|
||||
**True Provider Independence**
|
||||
- Swap providers without application changes
|
||||
- Mix and match best-in-class implementations
|
||||
- Federation and fallback support
|
||||
- No vendor lock-in
|
||||
|
||||
**Robust Ecosystem**
|
||||
- Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies).
|
||||
- Ecosystem offers tailored infrastructure, software, and services for deploying a variety of models.
|
||||
|
||||
|
||||
## Our Philosophy
|
||||
|
||||
- **Service-Oriented**: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
|
||||
- **Composability**: Every component is independent but works together seamlessly
|
||||
- **Production Ready**: Built for real-world applications, not just demos
|
||||
- **Turnkey Solutions**: Easy to deploy built in solutions for popular deployment scenarios
|
||||
|
||||
|
||||
With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations.
|
16
docs/docs/concepts/distributions.mdx
Normal file
16
docs/docs/concepts/distributions.mdx
Normal file
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
title: Distributions
|
||||
description: Pre-packaged provider configurations for different deployment scenarios
|
||||
sidebar_label: Distributions
|
||||
sidebar_position: 5
|
||||
---
|
||||
|
||||
# Distributions
|
||||
|
||||
While there is a lot of flexibility to mix-and-match providers, often users will work with a specific set of providers (hardware support, contractual obligations, etc.) We therefore need to provide a _convenient shorthand_ for such collections. We call this shorthand a **Llama Stack Distribution** or a **Distro**. One can think of it as specific pre-packaged versions of the Llama Stack. Here are some examples:
|
||||
|
||||
**Remotely Hosted Distro**: These are the simplest to consume from a user perspective. You can simply obtain the API key for these providers, point to a URL and have _all_ Llama Stack APIs working out of the box. Currently, [Fireworks](https://fireworks.ai/) and [Together](https://together.xyz/) provide such easy-to-consume Llama Stack distributions.
|
||||
|
||||
**Locally Hosted Distro**: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.
|
||||
|
||||
**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](/docs/distributions/ondevice_distro/ios_sdk) and [Android](/docs/distributions/ondevice_distro/android_sdk)
|
43
docs/docs/concepts/index.mdx
Normal file
43
docs/docs/concepts/index.mdx
Normal file
|
@ -0,0 +1,43 @@
|
|||
# Core Concepts
|
||||
|
||||
Given Llama Stack's service-oriented philosophy, a few concepts and workflows arise which may not feel completely natural in the LLM landscape, especially if you are coming with a background in other frameworks.
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
This section covers the fundamental concepts of Llama Stack:
|
||||
|
||||
- **[Architecture](./architecture.md)** - Learn about Llama Stack's architectural design and principles
|
||||
- **[APIs](./apis/index.mdx)** - Understanding the core APIs and their stability levels
|
||||
- [API Overview](./apis/index.mdx) - Core APIs available in Llama Stack
|
||||
- [API Providers](./apis/api_providers.mdx) - How providers implement APIs
|
||||
- [API Stability Leveling](./apis/api_leveling.mdx) - API stability and versioning
|
||||
- **[Distributions](./distributions.md)** - Pre-configured deployment packages
|
||||
- **[Resources](./resources.md)** - Understanding Llama Stack resources and their lifecycle
|
||||
- **[External Integration](./external.md)** - Integrating with external services and providers
|
||||
|
||||
## Getting Started
|
||||
|
||||
If you're new to Llama Stack, we recommend starting with:
|
||||
|
||||
1. **[Architecture](./architecture.md)** - Understand the overall system design
|
||||
2. **[APIs](./apis/index.mdx)** - Learn about the available APIs and their purpose
|
||||
3. **[Distributions](./distributions.md)** - Choose a pre-configured setup for your use case
|
||||
|
||||
Each concept builds upon the previous ones to give you a comprehensive understanding of how Llama Stack works and how to use it effectively.---
|
||||
title: Core Concepts
|
||||
description: Understanding Llama Stack's service-oriented philosophy and key concepts
|
||||
sidebar_label: Overview
|
||||
sidebar_position: 1
|
||||
---
|
||||
|
||||
# Core Concepts
|
||||
|
||||
Given Llama Stack's service-oriented philosophy, a few concepts and workflows arise which may not feel completely natural in the LLM landscape, especially if you are coming with a background in other frameworks.
|
||||
|
||||
This section covers the key concepts you need to understand to work effectively with Llama Stack:
|
||||
|
||||
- **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits
|
||||
- **[APIs](./apis)** - Available REST APIs and planned capabilities
|
||||
- **[API Providers](./api_providers)** - Remote vs inline provider implementations
|
||||
- **[Distributions](./distributions)** - Pre-packaged provider configurations
|
||||
- **[Resources](./resources)** - Resource federation and registration
|
26
docs/docs/concepts/resources.mdx
Normal file
26
docs/docs/concepts/resources.mdx
Normal file
|
@ -0,0 +1,26 @@
|
|||
---
|
||||
title: Resources
|
||||
description: Resource federation and registration in Llama Stack
|
||||
sidebar_label: Resources
|
||||
sidebar_position: 6
|
||||
---
|
||||
|
||||
# Resources
|
||||
|
||||
Some of these APIs are associated with a set of **Resources**. Here is the mapping of APIs to resources:
|
||||
|
||||
- **Inference**, **Eval** and **Post Training** are associated with `Model` resources.
|
||||
- **Safety** is associated with `Shield` resources.
|
||||
- **Tool Runtime** is associated with `ToolGroup` resources.
|
||||
- **DatasetIO** is associated with `Dataset` resources.
|
||||
- **VectorIO** is associated with `VectorDB` resources.
|
||||
- **Scoring** is associated with `ScoringFunction` resources.
|
||||
- **Eval** is associated with `Model` and `Benchmark` resources.
|
||||
|
||||
Furthermore, we allow these resources to be **federated** across multiple providers. For example, you may have some Llama models served by Fireworks while others are served by AWS Bedrock. Regardless, they will all work seamlessly with the same uniform Inference API provided by Llama Stack.
|
||||
|
||||
:::tip Registering Resources
|
||||
|
||||
Given this architecture, it is necessary for the Stack to know which provider to use for a given resource. This means you need to explicitly _register_ resources (including models) before you can use them with the associated APIs.
|
||||
|
||||
:::
|
Loading…
Add table
Add a link
Reference in a new issue