2.9 KiB
Quickstart
This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our documentation for more on Llama Stack's capabilities, or visit llama-stack-apps for example apps.
0. Prerequsite
Feel free to skip this step if you already have the prerequsite installed.
- conda (steps to install)
1. Installation
The llama
CLI tool helps you manage the Llama toolchain & agentic systems. After installing the llama-stack
package, the llama
command should be available in your path.
Install as a package: Install directly from PyPI with:
pip install llama-stack
2. Download Llama models:
llama download --model-id Llama3.1-8B-Instruct
You will have to follow the instructions in the cli to complete the download, get a instant license here: URL to license.
3. Build->Configure->Run via Conda:
For development, build a LlamaStack distribution from scratch.
llama stack build
Enter build information interactively:
llama stack build
llama stack configure
Run llama stack configure <name>
using the name from the build step.
llama stack configure my-local-stack
llama stack run
Start the server with:
llama stack run my-local-stack
4. Testing with Client
After setup, test the server with a POST request:
curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
"model": "Llama3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write me a 2-sentence poem about the moon"}
],
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}'
5. Inference
After setup, test the server with a POST request:
curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
"model": "Llama3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write me a 2-sentence poem about the moon"}
],
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}'
Check our client SDKs for various languages: Python, Node, Swift, and Kotlin.
Advanced Guides
For more on custom Llama Stack distributions, refer to our Building a Llama Stack Distribution guide.
Next Steps:
check out