docs: Updated documentation and configuration to make things easier for the unfamiliar

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2026-01-07 02:39:57 +00:00 · 2025-03-31 13:08:22 -04:00 · 2025-03-31 13:08:22 -04:00 · 2847216efb
commit 2847216efb
parent 9b478f3756
10 changed files with 69 additions and 32 deletions
--- a/docs/source/building_applications/index.md
+++ b/docs/source/building_applications/index.md
@ -1,4 +1,4 @@
-# Building AI Applications
+# Building AI Applications (Examples)
 Llama Stack provides all the building blocks needed to create sophisticated AI applications.
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -16,6 +16,7 @@ from docutils import nodes
 from pathlib import Path
 import requests
 import json
 from datetime import datetime
 # Read version from pyproject.toml
 with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") as f:
@ -28,7 +29,7 @@ with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") a
    llama_stack_version_link = f"<a href='{llama_stack_version_url}'>release notes</a>"
 project = "llama-stack"
-copyright = "2025, Meta"
+copyright = f"{datetime.now().year}, Meta"
 author = "Meta"
 # -- General configuration ---------------------------------------------------
@ -104,6 +105,8 @@ source_suffix = {
 # html_theme = "alabaster"
 html_theme_options = {
    "canonical_url": "https://github.com/meta-llama/llama-stack",
    'collapse_navigation': False,
    # "style_nav_header_background": "#c3c9d4",
 }
--- a/docs/source/distributions/configuration.md
+++ b/docs/source/distributions/configuration.md
@ -1,4 +1,4 @@
-# Configuring a Stack
+# Configuring a "Stack"
 The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
--- a/docs/source/distributions/importing_as_library.md
+++ b/docs/source/distributions/importing_as_library.md
@ -1,10 +1,12 @@
 # Using Llama Stack as a Library
-If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. This avoids the overhead of setting up a server.
+## Setup Llama Stack without a Server
 If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library.
 This avoids the overhead of setting up a server.
 ```bash
 # setup
 uv pip install llama-stack
-llama stack build --template together --image-type venv
+llama stack build --template ollama --image-type venv
 ```
 ```python
--- a/docs/source/distributions/index.md
+++ b/docs/source/distributions/index.md
@ -1,32 +1,18 @@
-# Starting a Llama Stack Server
+# Distributions Overview
-You can run a Llama Stack server in one of the following ways:
+A distribution is a pre-packaged set of Llama Stack components that can be deployed together.
 **As a Library**:
 This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
 **Container**:
 Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
 **Conda**:
 If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
 **Kubernetes**:
 If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
 This section provides an overview of the distributions available in Llama Stack.
 ```{toctree}
-:maxdepth: 1
+:maxdepth: 3
 :hidden:
 importing_as_library
 configuration
 list_of_distributions
 kubernetes_deployment
 building_distro
 on_device_distro
 remote_hosted_distro
 self_hosted_distro
 ```
--- a/docs/source/distributions/kubernetes_deployment.md
+++ b/docs/source/distributions/kubernetes_deployment.md
@ -1,6 +1,9 @@
 # Kubernetes Deployment Guide
-Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
+Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster.
 ### Prerequisites
 In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
 First, create a local Kubernetes cluster via Kind:
@ -33,6 +36,7 @@ data:
  token: $(HF_TOKEN)
 ```
 Next, start the vLLM server as a Kubernetes Deployment and Service:
 ```bash
@ -127,6 +131,7 @@ EOF
 podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
 ```
 ### Deploying Llama Stack Server in Kubernetes
 We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
@ -187,6 +192,7 @@ spec:
 EOF
 ```
 ### Verifying the Deployment
 We can check that the LlamaStack server has started:
 ```bash
--- a/docs/source/distributions/list_of_distributions.md
+++ b/docs/source/distributions/list_of_distributions.md
@ -1,4 +1,4 @@
-# List of Distributions
+# Available List of Distributions
 Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
--- a/docs/source/distributions/starting_llama_stack_server.md
+++ b/docs/source/distributions/starting_llama_stack_server.md
@ -0,0 +1,32 @@
 # Starting a Llama Stack Server
 You can run a Llama Stack server in one of the following ways:
 **As a Library**:
 This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
 **Container**:
 Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
 **Conda**:
 If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
 **Kubernetes**:
 If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
 ```{toctree}
 :maxdepth: 1
 :hidden:
 importing_as_library
 configuration
 kubernetes_deployment
 ```
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@ -1,10 +1,11 @@
 # Quick Start
-In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple RAG agent.
+In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to build a simple [RAG (Retrieval Augmented Generation)](../building_applications/rag.md) agent.
 A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions.
 In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
 Ollama is an LLM runtime that allows you to run Llama models locally.
 ### 1. Start Ollama
@ -24,7 +25,7 @@ If you do not have ollama, you can install it from [here](https://ollama.com/dow
 ### 2. Pick a client environment
-Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through an REST interface. You can interact with the Stack in two ways:
+Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through a REST interface. You can interact with the Stack in two ways:
 * Install the `llama-stack-client` PyPI package and point `LlamaStackClient` to a local or remote Llama Stack server.
 * Or, install the `llama-stack` PyPI package and use the Stack as a library using `LlamaStackAsLibraryClient`.
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -6,6 +6,7 @@ Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_v
 # Llama Stack
 ## What is Llama Stack?
 Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides
@ -22,6 +23,12 @@ Llama Stack defines and standardizes the core building blocks needed to bring ge
 Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.
 ## How does Llama Stack work?
 Llama Stack consists of a [server](./distributions/index.md) (with multiple pluggable API [providers](./providers/index.md)) and [client SDKs](#available-sdks) meant to
 be used in your applications. The server can be run in a variety of environments, including local (inline)
 development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and
 Kotlin.
 ## Quick Links
 - New to Llama Stack? Start with the [Introduction](introduction/index) to understand our motivation and vision.
`@ -1,4 +1,4 @@`
	`# Building AI Applications`	`# Building AI Applications (Examples)`

	`Llama Stack provides all the building blocks needed to create sophisticated AI applications.`	`Llama Stack provides all the building blocks needed to create sophisticated AI applications.`
`@ -1,4 +1,4 @@`
	`# Configuring a Stack`	`# Configuring a "Stack"`

	`The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:`	`The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:`
`@ -1,4 +1,4 @@`
	`# List of Distributions`	`# Available List of Distributions`

	`Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.`	`Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.`