forked from phoenix-oss/llama-stack-mirror
docs: Updated documentation and Sphinx configuration (#1845)
# What does this PR do? The goal of this PR is to make the pages easier to navigate by surfacing the child pages on the navbar, updating some of the copy, moving some of the files around. Some changes: 1. Clarifying Titles 2. Restructuring "Distributions" more formally in its own page to be consistent with Providers and adding some clarity to the child pages to surface them and make them easier to navigate 3. Updated sphinx config to not collapse navigation by default 4. Updated copyright year to be calculated dynamically 5. Moved `docs/source/distributions/index.md` -> `docs/source/distributions/starting_llama_stack_server.md` Another for https://github.com/meta-llama/llama-stack/issues/1815 ## Test Plan Tested locally and pages build (screen shots for example). ## Documentation ### Before:  ### After:  Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
60430da48a
commit
d495922949
10 changed files with 69 additions and 32 deletions
|
@ -1,4 +1,4 @@
|
|||
# Building AI Applications
|
||||
# Building AI Applications (Examples)
|
||||
|
||||
Llama Stack provides all the building blocks needed to create sophisticated AI applications.
|
||||
|
||||
|
|
|
@ -16,6 +16,7 @@ from docutils import nodes
|
|||
from pathlib import Path
|
||||
import requests
|
||||
import json
|
||||
from datetime import datetime
|
||||
|
||||
# Read version from pyproject.toml
|
||||
with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") as f:
|
||||
|
@ -28,7 +29,7 @@ with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") a
|
|||
llama_stack_version_link = f"<a href='{llama_stack_version_url}'>release notes</a>"
|
||||
|
||||
project = "llama-stack"
|
||||
copyright = "2025, Meta"
|
||||
copyright = f"{datetime.now().year}, Meta"
|
||||
author = "Meta"
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
|
@ -104,6 +105,8 @@ source_suffix = {
|
|||
# html_theme = "alabaster"
|
||||
html_theme_options = {
|
||||
"canonical_url": "https://github.com/meta-llama/llama-stack",
|
||||
'collapse_navigation': False,
|
||||
|
||||
# "style_nav_header_background": "#c3c9d4",
|
||||
}
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Configuring a Stack
|
||||
# Configuring a "Stack"
|
||||
|
||||
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
|
||||
|
||||
|
|
|
@ -1,10 +1,12 @@
|
|||
# Using Llama Stack as a Library
|
||||
|
||||
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. This avoids the overhead of setting up a server.
|
||||
## Setup Llama Stack without a Server
|
||||
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library.
|
||||
This avoids the overhead of setting up a server.
|
||||
```bash
|
||||
# setup
|
||||
uv pip install llama-stack
|
||||
llama stack build --template together --image-type venv
|
||||
llama stack build --template ollama --image-type venv
|
||||
```
|
||||
|
||||
```python
|
||||
|
|
|
@ -1,32 +1,18 @@
|
|||
# Starting a Llama Stack Server
|
||||
# Distributions Overview
|
||||
|
||||
You can run a Llama Stack server in one of the following ways:
|
||||
|
||||
**As a Library**:
|
||||
|
||||
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
|
||||
|
||||
|
||||
**Container**:
|
||||
|
||||
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
|
||||
|
||||
|
||||
**Conda**:
|
||||
|
||||
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
|
||||
|
||||
|
||||
**Kubernetes**:
|
||||
|
||||
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
|
||||
A distribution is a pre-packaged set of Llama Stack components that can be deployed together.
|
||||
|
||||
This section provides an overview of the distributions available in Llama Stack.
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
:maxdepth: 3
|
||||
|
||||
importing_as_library
|
||||
configuration
|
||||
list_of_distributions
|
||||
kubernetes_deployment
|
||||
building_distro
|
||||
on_device_distro
|
||||
remote_hosted_distro
|
||||
self_hosted_distro
|
||||
```
|
||||
|
|
|
@ -1,6 +1,9 @@
|
|||
# Kubernetes Deployment Guide
|
||||
|
||||
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
||||
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster.
|
||||
|
||||
### Prerequisites
|
||||
In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
||||
|
||||
First, create a local Kubernetes cluster via Kind:
|
||||
|
||||
|
@ -33,6 +36,7 @@ data:
|
|||
token: $(HF_TOKEN)
|
||||
```
|
||||
|
||||
|
||||
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
||||
|
||||
```bash
|
||||
|
@ -127,6 +131,7 @@ EOF
|
|||
podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
|
||||
```
|
||||
|
||||
### Deploying Llama Stack Server in Kubernetes
|
||||
|
||||
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
||||
|
||||
|
@ -187,6 +192,7 @@ spec:
|
|||
EOF
|
||||
```
|
||||
|
||||
### Verifying the Deployment
|
||||
We can check that the LlamaStack server has started:
|
||||
|
||||
```bash
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# List of Distributions
|
||||
# Available List of Distributions
|
||||
|
||||
Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
|
||||
|
32
docs/source/distributions/starting_llama_stack_server.md
Normal file
32
docs/source/distributions/starting_llama_stack_server.md
Normal file
|
@ -0,0 +1,32 @@
|
|||
# Starting a Llama Stack Server
|
||||
|
||||
You can run a Llama Stack server in one of the following ways:
|
||||
|
||||
**As a Library**:
|
||||
|
||||
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
|
||||
|
||||
|
||||
**Container**:
|
||||
|
||||
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
|
||||
|
||||
|
||||
**Conda**:
|
||||
|
||||
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
|
||||
|
||||
|
||||
**Kubernetes**:
|
||||
|
||||
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
|
||||
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
importing_as_library
|
||||
configuration
|
||||
kubernetes_deployment
|
||||
```
|
|
@ -1,10 +1,11 @@
|
|||
# Quick Start
|
||||
|
||||
In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple RAG agent.
|
||||
In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to build a simple [RAG (Retrieval Augmented Generation)](../building_applications/rag.md) agent.
|
||||
|
||||
A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions.
|
||||
|
||||
In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
|
||||
Ollama is an LLM runtime that allows you to run Llama models locally.
|
||||
|
||||
|
||||
### 1. Start Ollama
|
||||
|
@ -24,7 +25,7 @@ If you do not have ollama, you can install it from [here](https://ollama.com/dow
|
|||
|
||||
### 2. Pick a client environment
|
||||
|
||||
Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through an REST interface. You can interact with the Stack in two ways:
|
||||
Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through a REST interface. You can interact with the Stack in two ways:
|
||||
|
||||
* Install the `llama-stack-client` PyPI package and point `LlamaStackClient` to a local or remote Llama Stack server.
|
||||
* Or, install the `llama-stack` PyPI package and use the Stack as a library using `LlamaStackAsLibraryClient`.
|
||||
|
|
|
@ -6,6 +6,7 @@ Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_v
|
|||
|
||||
# Llama Stack
|
||||
|
||||
## What is Llama Stack?
|
||||
|
||||
Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides
|
||||
|
||||
|
@ -22,6 +23,12 @@ Llama Stack defines and standardizes the core building blocks needed to bring ge
|
|||
|
||||
Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.
|
||||
|
||||
## How does Llama Stack work?
|
||||
Llama Stack consists of a [server](./distributions/index.md) (with multiple pluggable API [providers](./providers/index.md)) and [client SDKs](#available-sdks) meant to
|
||||
be used in your applications. The server can be run in a variety of environments, including local (inline)
|
||||
development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and
|
||||
Kotlin.
|
||||
|
||||
## Quick Links
|
||||
|
||||
- New to Llama Stack? Start with the [Introduction](introduction/index) to understand our motivation and vision.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue