3.7 KiB
Getting Started with Llama Stack
This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our documentation for more on Llama Stack's capabilities, or visit llama-stack-apps for example apps.
Installation
The llama
CLI tool helps you manage the Llama toolchain & agentic systems. After installing the llama-stack
package, the llama
command should be available in your path.
You can install this repository in two ways:
-
Install as a package: Install directly from PyPI with:
pip install llama-stack
-
Install from source: Follow these steps to install from the source code:
mkdir -p ~/local cd ~/local git clone git@github.com:meta-llama/llama-stack.git conda create -n stack python=3.10 conda activate stack cd llama-stack $CONDA_PREFIX/bin/pip install -e .
Refer to the CLI Reference for details on Llama CLI commands.
Starting Up Llama Stack Server
There are two ways to start the Llama Stack server:
-
Using Docker: We provide a pre-built Docker image of Llama Stack, available in the distributions folder.
Note: For GPU inference, set environment variables to specify the local directory with your model checkpoints and enable GPU inference.
export LLAMA_CHECKPOINT_DIR=~/.llama
Download Llama models with:
llama download --model-id Llama3.1-8B-Instruct
Start a Docker container with:
cd llama-stack/distributions/meta-reference-gpu docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
Tip: For remote providers, use
docker compose up
with scripts in the distributions folder. -
Build->Configure->Run via Conda: For development, build a LlamaStack distribution from scratch.
llama stack build
Enter build information interactively:llama stack build
llama stack configure
Runllama stack configure <name>
using the name from the build step.llama stack configure my-local-stack
llama stack run
Start the server with:llama stack run my-local-stack
Testing with Client
After setup, test the server with a client:
cd /path/to/llama-stack
conda activate <env>
python -m llama_stack.apis.inference.client localhost 5000
You can also send a POST request:
curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
"model": "Llama3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write me a 2-sentence poem about the moon"}
],
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}'
For testing safety, run:
python -m llama_stack.apis.safety.client localhost 5000
Check our client SDKs for various languages: Python, Node, Swift, and Kotlin.
Advanced Guides
For more on custom Llama Stack distributions, refer to our Building a Llama Stack Distribution guide.