mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-05 18:22:41 +00:00
docs: Updated documentation and configuration to make things easier for the unfamiliar
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
9b478f3756
commit
2847216efb
10 changed files with 69 additions and 32 deletions
|
@ -1,4 +1,4 @@
|
||||||
# Building AI Applications
|
# Building AI Applications (Examples)
|
||||||
|
|
||||||
Llama Stack provides all the building blocks needed to create sophisticated AI applications.
|
Llama Stack provides all the building blocks needed to create sophisticated AI applications.
|
||||||
|
|
||||||
|
|
|
@ -16,6 +16,7 @@ from docutils import nodes
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
import requests
|
import requests
|
||||||
import json
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
# Read version from pyproject.toml
|
# Read version from pyproject.toml
|
||||||
with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") as f:
|
with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") as f:
|
||||||
|
@ -28,7 +29,7 @@ with Path(__file__).parent.parent.parent.joinpath("pyproject.toml").open("rb") a
|
||||||
llama_stack_version_link = f"<a href='{llama_stack_version_url}'>release notes</a>"
|
llama_stack_version_link = f"<a href='{llama_stack_version_url}'>release notes</a>"
|
||||||
|
|
||||||
project = "llama-stack"
|
project = "llama-stack"
|
||||||
copyright = "2025, Meta"
|
copyright = f"{datetime.now().year}, Meta"
|
||||||
author = "Meta"
|
author = "Meta"
|
||||||
|
|
||||||
# -- General configuration ---------------------------------------------------
|
# -- General configuration ---------------------------------------------------
|
||||||
|
@ -104,6 +105,8 @@ source_suffix = {
|
||||||
# html_theme = "alabaster"
|
# html_theme = "alabaster"
|
||||||
html_theme_options = {
|
html_theme_options = {
|
||||||
"canonical_url": "https://github.com/meta-llama/llama-stack",
|
"canonical_url": "https://github.com/meta-llama/llama-stack",
|
||||||
|
'collapse_navigation': False,
|
||||||
|
|
||||||
# "style_nav_header_background": "#c3c9d4",
|
# "style_nav_header_background": "#c3c9d4",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
# Configuring a Stack
|
# Configuring a "Stack"
|
||||||
|
|
||||||
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
|
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
|
||||||
|
|
||||||
|
|
|
@ -1,10 +1,12 @@
|
||||||
# Using Llama Stack as a Library
|
# Using Llama Stack as a Library
|
||||||
|
|
||||||
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. This avoids the overhead of setting up a server.
|
## Setup Llama Stack without a Server
|
||||||
|
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library.
|
||||||
|
This avoids the overhead of setting up a server.
|
||||||
```bash
|
```bash
|
||||||
# setup
|
# setup
|
||||||
uv pip install llama-stack
|
uv pip install llama-stack
|
||||||
llama stack build --template together --image-type venv
|
llama stack build --template ollama --image-type venv
|
||||||
```
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|
|
@ -1,32 +1,18 @@
|
||||||
# Starting a Llama Stack Server
|
# Distributions Overview
|
||||||
|
|
||||||
You can run a Llama Stack server in one of the following ways:
|
A distribution is a pre-packaged set of Llama Stack components that can be deployed together.
|
||||||
|
|
||||||
**As a Library**:
|
|
||||||
|
|
||||||
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
|
|
||||||
|
|
||||||
|
|
||||||
**Container**:
|
|
||||||
|
|
||||||
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
|
|
||||||
|
|
||||||
|
|
||||||
**Conda**:
|
|
||||||
|
|
||||||
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
|
|
||||||
|
|
||||||
|
|
||||||
**Kubernetes**:
|
|
||||||
|
|
||||||
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
|
|
||||||
|
|
||||||
|
This section provides an overview of the distributions available in Llama Stack.
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
:maxdepth: 3
|
||||||
:hidden:
|
|
||||||
|
|
||||||
importing_as_library
|
importing_as_library
|
||||||
configuration
|
configuration
|
||||||
|
list_of_distributions
|
||||||
kubernetes_deployment
|
kubernetes_deployment
|
||||||
|
building_distro
|
||||||
|
on_device_distro
|
||||||
|
remote_hosted_distro
|
||||||
|
self_hosted_distro
|
||||||
```
|
```
|
||||||
|
|
|
@ -1,6 +1,9 @@
|
||||||
# Kubernetes Deployment Guide
|
# Kubernetes Deployment Guide
|
||||||
|
|
||||||
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
||||||
|
|
||||||
First, create a local Kubernetes cluster via Kind:
|
First, create a local Kubernetes cluster via Kind:
|
||||||
|
|
||||||
|
@ -33,6 +36,7 @@ data:
|
||||||
token: $(HF_TOKEN)
|
token: $(HF_TOKEN)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -127,6 +131,7 @@ EOF
|
||||||
podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
|
podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Deploying Llama Stack Server in Kubernetes
|
||||||
|
|
||||||
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
||||||
|
|
||||||
|
@ -187,6 +192,7 @@ spec:
|
||||||
EOF
|
EOF
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Verifying the Deployment
|
||||||
We can check that the LlamaStack server has started:
|
We can check that the LlamaStack server has started:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
# List of Distributions
|
# Available List of Distributions
|
||||||
|
|
||||||
Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
|
Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
|
||||||
|
|
32
docs/source/distributions/starting_llama_stack_server.md
Normal file
32
docs/source/distributions/starting_llama_stack_server.md
Normal file
|
@ -0,0 +1,32 @@
|
||||||
|
# Starting a Llama Stack Server
|
||||||
|
|
||||||
|
You can run a Llama Stack server in one of the following ways:
|
||||||
|
|
||||||
|
**As a Library**:
|
||||||
|
|
||||||
|
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
|
||||||
|
|
||||||
|
|
||||||
|
**Container**:
|
||||||
|
|
||||||
|
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
|
||||||
|
|
||||||
|
|
||||||
|
**Conda**:
|
||||||
|
|
||||||
|
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
|
||||||
|
|
||||||
|
|
||||||
|
**Kubernetes**:
|
||||||
|
|
||||||
|
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
|
||||||
|
|
||||||
|
|
||||||
|
```{toctree}
|
||||||
|
:maxdepth: 1
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
importing_as_library
|
||||||
|
configuration
|
||||||
|
kubernetes_deployment
|
||||||
|
```
|
|
@ -1,10 +1,11 @@
|
||||||
# Quick Start
|
# Quick Start
|
||||||
|
|
||||||
In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple RAG agent.
|
In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to build a simple [RAG (Retrieval Augmented Generation)](../building_applications/rag.md) agent.
|
||||||
|
|
||||||
A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions.
|
A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions.
|
||||||
|
|
||||||
In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
|
In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
|
||||||
|
Ollama is an LLM runtime that allows you to run Llama models locally.
|
||||||
|
|
||||||
|
|
||||||
### 1. Start Ollama
|
### 1. Start Ollama
|
||||||
|
@ -24,7 +25,7 @@ If you do not have ollama, you can install it from [here](https://ollama.com/dow
|
||||||
|
|
||||||
### 2. Pick a client environment
|
### 2. Pick a client environment
|
||||||
|
|
||||||
Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through an REST interface. You can interact with the Stack in two ways:
|
Llama Stack has a service-oriented architecture, so every interaction with the Stack happens through a REST interface. You can interact with the Stack in two ways:
|
||||||
|
|
||||||
* Install the `llama-stack-client` PyPI package and point `LlamaStackClient` to a local or remote Llama Stack server.
|
* Install the `llama-stack-client` PyPI package and point `LlamaStackClient` to a local or remote Llama Stack server.
|
||||||
* Or, install the `llama-stack` PyPI package and use the Stack as a library using `LlamaStackAsLibraryClient`.
|
* Or, install the `llama-stack` PyPI package and use the Stack as a library using `LlamaStackAsLibraryClient`.
|
||||||
|
|
|
@ -6,6 +6,7 @@ Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_v
|
||||||
|
|
||||||
# Llama Stack
|
# Llama Stack
|
||||||
|
|
||||||
|
## What is Llama Stack?
|
||||||
|
|
||||||
Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides
|
Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides
|
||||||
|
|
||||||
|
@ -22,6 +23,12 @@ Llama Stack defines and standardizes the core building blocks needed to bring ge
|
||||||
|
|
||||||
Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.
|
Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.
|
||||||
|
|
||||||
|
## How does Llama Stack work?
|
||||||
|
Llama Stack consists of a [server](./distributions/index.md) (with multiple pluggable API [providers](./providers/index.md)) and [client SDKs](#available-sdks) meant to
|
||||||
|
be used in your applications. The server can be run in a variety of environments, including local (inline)
|
||||||
|
development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and
|
||||||
|
Kotlin.
|
||||||
|
|
||||||
## Quick Links
|
## Quick Links
|
||||||
|
|
||||||
- New to Llama Stack? Start with the [Introduction](introduction/index) to understand our motivation and vision.
|
- New to Llama Stack? Start with the [Introduction](introduction/index) to understand our motivation and vision.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue