diff --git a/docs/source/building_applications/index.md b/docs/source/building_applications/index.md index fa1542676..abe548971 100644 --- a/docs/source/building_applications/index.md +++ b/docs/source/building_applications/index.md @@ -1,4 +1,4 @@ -# Building AI Applications +# Building AI Applications (Examples) Llama Stack provides all the building blocks needed to create sophisticated AI applications. diff --git a/docs/source/conf.py b/docs/source/conf.py index fa91a346c..6099caad7 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -16,6 +16,7 @@ from docutils import nodes from pathlib import Path import requests import json +from datetime import datetime # Read version from pyproject.toml with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") as f: @@ -28,7 +29,7 @@ with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") a llama_stack_version_link = f"release notes" project = "llama-stack" -copyright = "2025, Meta" +copyright = f"{datetime.now().year}, Meta" author = "Meta" # -- General configuration --------------------------------------------------- @@ -104,6 +105,8 @@ source_suffix = { # html_theme = "alabaster" html_theme_options = { "canonical_url": "https://github.com/meta-llama/llama-stack", + 'collapse_navigation': False, + # "style_nav_header_background": "#c3c9d4", } diff --git a/docs/source/distributions/configuration.md b/docs/source/distributions/configuration.md index 0f766dcd5..6cd5e161f 100644 --- a/docs/source/distributions/configuration.md +++ b/docs/source/distributions/configuration.md @@ -1,4 +1,4 @@ -# Configuring a Stack +# Configuring a "Stack" The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution: diff --git a/docs/source/distributions/importing_as_library.md b/docs/source/distributions/importing_as_library.md index 496574c03..29a5669b3 100644 --- a/docs/source/distributions/importing_as_library.md +++ b/docs/source/distributions/importing_as_library.md @@ -1,10 +1,12 @@ # Using Llama Stack as a Library -If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. This avoids the overhead of setting up a server. +## Setup Llama Stack without a Server +If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. +This avoids the overhead of setting up a server. ```bash # setup uv pip install llama-stack -llama stack build --template together --image-type venv +llama stack build --template ollama --image-type venv ``` ```python diff --git a/docs/source/distributions/index.md b/docs/source/distributions/index.md index 9be2e9ec5..103a6131f 100644 --- a/docs/source/distributions/index.md +++ b/docs/source/distributions/index.md @@ -1,32 +1,18 @@ -# Starting a Llama Stack Server +# Distributions Overview -You can run a Llama Stack server in one of the following ways: - -**As a Library**: - -This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library) - - -**Container**: - -Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details. - - -**Conda**: - -If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details. - - -**Kubernetes**: - -If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details. +A distribution is a pre-packaged set of Llama Stack components that can be deployed together. +This section provides an overview of the distributions available in Llama Stack. ```{toctree} -:maxdepth: 1 -:hidden: +:maxdepth: 3 importing_as_library configuration +list_of_distributions kubernetes_deployment +building_distro +on_device_distro +remote_hosted_distro +self_hosted_distro ``` diff --git a/docs/source/distributions/kubernetes_deployment.md b/docs/source/distributions/kubernetes_deployment.md index 1b4467934..8ff3f0408 100644 --- a/docs/source/distributions/kubernetes_deployment.md +++ b/docs/source/distributions/kubernetes_deployment.md @@ -1,6 +1,9 @@ # Kubernetes Deployment Guide -Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes. +Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. + +### Prerequisites +In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes. First, create a local Kubernetes cluster via Kind: @@ -33,6 +36,7 @@ data: token: $(HF_TOKEN) ``` + Next, start the vLLM server as a Kubernetes Deployment and Service: ```bash @@ -127,6 +131,7 @@ EOF podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack ``` +### Deploying Llama Stack Server in Kubernetes We can then start the Llama Stack server by deploying a Kubernetes Pod and Service: @@ -187,6 +192,7 @@ spec: EOF ``` +### Verifying the Deployment We can check that the LlamaStack server has started: ```bash diff --git a/docs/source/distributions/selection.md b/docs/source/distributions/list_of_distributions.md similarity index 98% rename from docs/source/distributions/selection.md rename to docs/source/distributions/list_of_distributions.md index 269b14bce..5f3616634 100644 --- a/docs/source/distributions/selection.md +++ b/docs/source/distributions/list_of_distributions.md @@ -1,4 +1,4 @@ -# List of Distributions +# Available List of Distributions Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box. diff --git a/docs/source/distributions/starting_llama_stack_server.md b/docs/source/distributions/starting_llama_stack_server.md new file mode 100644 index 000000000..9be2e9ec5 --- /dev/null +++ b/docs/source/distributions/starting_llama_stack_server.md @@ -0,0 +1,32 @@ +# Starting a Llama Stack Server + +You can run a Llama Stack server in one of the following ways: + +**As a Library**: + +This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library) + + +**Container**: + +Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details. + + +**Conda**: + +If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details. + + +**Kubernetes**: + +If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details. + + +```{toctree} +:maxdepth: 1 +:hidden: + +importing_as_library +configuration +kubernetes_deployment +``` diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index e8ca05d76..13cfd4d2f 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -1,10 +1,11 @@ # Quick Start -In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple RAG agent. +In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to build a simple [RAG (Retrieval Augmented Generation)](../building_applications/rag.md) agent. A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions. In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. For this guide, we will use [Ollama](https://ollama.com/) as the inference provider. +Ollama is an LLM runtime that allows you to run Llama models locally. ### 1. Start Ollama @@ -24,7 +25,7 @@ If you do not have ollama, you can install it from [here](https://ollama.com/dow ### 2. Pick a client environment -Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through an REST interface. You can interact with the Stack in two ways: +Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through a REST interface. You can interact with the Stack in two ways: * Install the `llama-stack-client` PyPI package and point `LlamaStackClient` to a local or remote Llama Stack server. * Or, install the `llama-stack` PyPI package and use the Stack as a library using `LlamaStackAsLibraryClient`. diff --git a/docs/source/index.md b/docs/source/index.md index 659f955cb..22f4ae3fb 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -6,6 +6,7 @@ Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_v # Llama Stack +## What is Llama Stack? Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides @@ -22,6 +23,12 @@ Llama Stack defines and standardizes the core building blocks needed to bring ge Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available. +## How does Llama Stack work? +Llama Stack consists of a [server](./distributions/index.md) (with multiple pluggable API [providers](./providers/index.md)) and [client SDKs](#available-sdks) meant to +be used in your applications. The server can be run in a variety of environments, including local (inline) +development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and +Kotlin. + ## Quick Links - New to Llama Stack? Start with the [Introduction](introduction/index) to understand our motivation and vision.