Getting Started with Llama Stack

This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our documentation for more on Llama Stack's capabilities, or visit llama-stack-apps for example apps.

Installation

The llama CLI tool helps you manage the Llama toolchain & agentic systems. After installing the llama-stack package, the llama command should be available in your path.

You can install this repository in two ways:

Install as a package: Install directly from PyPI with:
```
pip install llama-stack
```

Install from source: Follow these steps to install from the source code:

mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git

conda create -n stack python=3.10
conda activate stack

cd llama-stack
$CONDA_PREFIX/bin/pip install -e .

Refer to the CLI Reference for details on Llama CLI commands.

Starting Up Llama Stack Server

There are two ways to start the Llama Stack server:

Using Docker: We provide a pre-built Docker image of Llama Stack, available in the distributions folder.

Note: For GPU inference, set environment variables to specify the local directory with your model checkpoints and enable GPU inference.
```
export LLAMA_CHECKPOINT_DIR=~/.llama
```
Download Llama models with:
```
llama download --model-id Llama3.1-8B-Instruct
```
Start a Docker container with:
```
cd llama-stack/distributions/meta-reference-gpu
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
```
Tip: For remote providers, use docker compose up with scripts in the distributions folder.
Build->Configure->Run via Conda: For development, build a LlamaStack distribution from scratch.

llama stack build Enter build information interactively:
```
llama stack build
```
llama stack configure Run llama stack configure <name> using the name from the build step.
```
llama stack configure my-local-stack
```
llama stack run Start the server with:
```
llama stack run my-local-stack
```

Testing with Client

After setup, test the server with a client:

cd /path/to/llama-stack
conda activate <env>

python -m llama_stack.apis.inference.client localhost 5000

You can also send a POST request:

curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
    "model": "Llama3.1-8B-Instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
    ],
    "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}'

For testing safety, run:

python -m llama_stack.apis.safety.client localhost 5000

Check our client SDKs for various languages: Python, Node, Swift, and Kotlin.

Advanced Guides

For more on custom Llama Stack distributions, refer to our Building a Llama Stack Distribution guide.

3.7 KiB Raw Blame History

Getting Started with Llama Stack

Installation

Starting Up Llama Stack Server

Testing with Client

Advanced Guides

3.7 KiB

Raw Blame History