docs: Updated documentation and configuration to make things easier for the unfamiliar

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
Francisco Javier Arceo 2025-03-31 13:08:22 -04:00
parent 9b478f3756
commit 2847216efb
10 changed files with 69 additions and 32 deletions

View file

@ -1,4 +1,4 @@
# Building AI Applications # Building AI Applications (Examples)
Llama Stack provides all the building blocks needed to create sophisticated AI applications. Llama Stack provides all the building blocks needed to create sophisticated AI applications.

View file

@ -16,6 +16,7 @@ from docutils import nodes
from pathlib import Path from pathlib import Path
import requests import requests
import json import json
from datetime import datetime
# Read version from pyproject.toml # Read version from pyproject.toml
with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") as f: with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") as f:
@ -28,7 +29,7 @@ with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") a
llama_stack_version_link = f"<a href='{llama_stack_version_url}'>release notes</a>" llama_stack_version_link = f"<a href='{llama_stack_version_url}'>release notes</a>"
project = "llama-stack" project = "llama-stack"
copyright = "2025, Meta" copyright = f"{datetime.now().year}, Meta"
author = "Meta" author = "Meta"
# -- General configuration --------------------------------------------------- # -- General configuration ---------------------------------------------------
@ -104,6 +105,8 @@ source_suffix = {
# html_theme = "alabaster" # html_theme = "alabaster"
html_theme_options = { html_theme_options = {
"canonical_url": "https://github.com/meta-llama/llama-stack", "canonical_url": "https://github.com/meta-llama/llama-stack",
'collapse_navigation': False,
# "style_nav_header_background": "#c3c9d4", # "style_nav_header_background": "#c3c9d4",
} }

View file

@ -1,4 +1,4 @@
# Configuring a Stack # Configuring a "Stack"
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution: The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:

View file

@ -1,10 +1,12 @@
# Using Llama Stack as a Library # Using Llama Stack as a Library
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. This avoids the overhead of setting up a server. ## Setup Llama Stack without a Server
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library.
This avoids the overhead of setting up a server.
```bash ```bash
# setup # setup
uv pip install llama-stack uv pip install llama-stack
llama stack build --template together --image-type venv llama stack build --template ollama --image-type venv
``` ```
```python ```python

View file

@ -1,32 +1,18 @@
# Starting a Llama Stack Server # Distributions Overview
You can run a Llama Stack server in one of the following ways: A distribution is a pre-packaged set of Llama Stack components that can be deployed together.
**As a Library**:
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
**Container**:
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
**Conda**:
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
**Kubernetes**:
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
This section provides an overview of the distributions available in Llama Stack.
```{toctree} ```{toctree}
:maxdepth: 1 :maxdepth: 3
:hidden:
importing_as_library importing_as_library
configuration configuration
list_of_distributions
kubernetes_deployment kubernetes_deployment
building_distro
on_device_distro
remote_hosted_distro
self_hosted_distro
``` ```

View file

@ -1,6 +1,9 @@
# Kubernetes Deployment Guide # Kubernetes Deployment Guide
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes. Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster.
### Prerequisites
In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
First, create a local Kubernetes cluster via Kind: First, create a local Kubernetes cluster via Kind:
@ -33,6 +36,7 @@ data:
token: $(HF_TOKEN) token: $(HF_TOKEN)
``` ```
Next, start the vLLM server as a Kubernetes Deployment and Service: Next, start the vLLM server as a Kubernetes Deployment and Service:
```bash ```bash
@ -127,6 +131,7 @@ EOF
podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
``` ```
### Deploying Llama Stack Server in Kubernetes
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service: We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
@ -187,6 +192,7 @@ spec:
EOF EOF
``` ```
### Verifying the Deployment
We can check that the LlamaStack server has started: We can check that the LlamaStack server has started:
```bash ```bash

View file

@ -1,4 +1,4 @@
# List of Distributions # Available List of Distributions
Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box. Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.

View file

@ -0,0 +1,32 @@
# Starting a Llama Stack Server
You can run a Llama Stack server in one of the following ways:
**As a Library**:
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
**Container**:
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
**Conda**:
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
**Kubernetes**:
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
```{toctree}
:maxdepth: 1
:hidden:
importing_as_library
configuration
kubernetes_deployment
```

View file

@ -1,10 +1,11 @@
# Quick Start # Quick Start
In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple RAG agent. In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to build a simple [RAG (Retrieval Augmented Generation)](../building_applications/rag.md) agent.
A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions. A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions.
In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. For this guide, we will use [Ollama](https://ollama.com/) as the inference provider. In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
Ollama is an LLM runtime that allows you to run Llama models locally.
### 1. Start Ollama ### 1. Start Ollama
@ -24,7 +25,7 @@ If you do not have ollama, you can install it from [here](https://ollama.com/dow
### 2. Pick a client environment ### 2. Pick a client environment
Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through an REST interface. You can interact with the Stack in two ways: Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through a REST interface. You can interact with the Stack in two ways:
* Install the `llama-stack-client` PyPI package and point `LlamaStackClient` to a local or remote Llama Stack server. * Install the `llama-stack-client` PyPI package and point `LlamaStackClient` to a local or remote Llama Stack server.
* Or, install the `llama-stack` PyPI package and use the Stack as a library using `LlamaStackAsLibraryClient`. * Or, install the `llama-stack` PyPI package and use the Stack as a library using `LlamaStackAsLibraryClient`.

View file

@ -6,6 +6,7 @@ Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_v
# Llama Stack # Llama Stack
## What is Llama Stack?
Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides
@ -22,6 +23,12 @@ Llama Stack defines and standardizes the core building blocks needed to bring ge
Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available. Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.
## How does Llama Stack work?
Llama Stack consists of a [server](./distributions/index.md) (with multiple pluggable API [providers](./providers/index.md)) and [client SDKs](#available-sdks) meant to
be used in your applications. The server can be run in a variety of environments, including local (inline)
development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and
Kotlin.
## Quick Links ## Quick Links
- New to Llama Stack? Start with the [Introduction](introduction/index) to understand our motivation and vision. - New to Llama Stack? Start with the [Introduction](introduction/index) to understand our motivation and vision.