Unified APIs for Inference, RAG, Agents, Tools, Safety, and Telemetry
Get up and running with Llama Stack in just a few commands. Build your first RAG application locally.
{`# Install uv and start Ollama
ollama run llama3.2:3b --keepalive 60m
# Install server dependencies
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
# Run Llama Stack server
OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter
# Try the Python SDK
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(
base_url="http://localhost:8321"
)
response = client.chat.completions.create(
model="Llama3.2-3B-Instruct",
messages=[{
"role": "user",
"content": "What is machine learning?"
}]
)`}
One consistent interface for all your AI needs - inference, safety, agents, and more.
Swap between providers without code changes. Start local, deploy anywhere.
Built-in safety, monitoring, and evaluation tools for enterprise applications.
SDKs for Python, Node.js, iOS, Android, and REST APIs for any language.
Complete toolkit for building AI applications with Llama Stack
Official client libraries for multiple programming languages
Connect with developers building the future of AI applications