Composable building blocks to build Llama Apps https://llama-stack.readthedocs.io
Find a file
Hardik Shah 65f07c3d63
Update Documentation (#838)
# What does this PR do?

Update README and other documentation


## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 20:38:52 -08:00
.github [CICD] add simple test step for docker build workflow, fix prefix bug (#821) 2025-01-18 15:16:05 -08:00
distributions [CICD] add simple test step for docker build workflow, fix prefix bug (#821) 2025-01-18 15:16:05 -08:00
docs Update Documentation (#838) 2025-01-22 20:38:52 -08:00
llama_stack Fix tool tests 2025-01-22 20:31:18 -08:00
rfcs Update RFC-0001-llama-stack.md (#134) 2024-09-27 09:14:36 -07:00
tests/client-sdk Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
.flake8 ci: Run pre-commit checks in CI (#176) 2024-10-10 11:21:59 -07:00
.gitignore Update Telemetry API so OpenAPI generation can work (#640) 2024-12-16 13:00:14 -08:00
.gitmodules impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00
.pre-commit-config.yaml Add a pre-commit for distro_codegen but it does not work yet 2024-11-18 15:21:13 -08:00
.readthedocs.yaml first version of readthedocs (#278) 2024-10-22 10:15:58 +05:30
CHANGELOG.md add changelog (#487) 2024-11-19 17:36:08 -08:00
CODE_OF_CONDUCT.md Initial commit 2024-07-23 08:32:33 -07:00
CONTRIBUTING.md Update CONTRIBUTING to include info about pre-commit 2024-11-18 18:17:54 -08:00
LICENSE Update LICENSE (#47) 2024-08-29 07:39:50 -07:00
MANIFEST.in codegen per-distro dependencies; not hooked into setup.py yet 2024-11-19 09:54:30 -08:00
pyproject.toml Initial commit 2024-07-23 08:32:33 -07:00
README.md Update Documentation (#838) 2025-01-22 20:38:52 -08:00
requirements.txt Bump version to 0.0.63 2024-12-17 23:15:27 -08:00
SECURITY.md Create SECURITY.md 2024-10-08 13:30:40 -04:00
setup.py Bump version to 0.0.63 2024-12-17 23:15:27 -08:00

Llama Stack

PyPI version PyPI - Downloads Discord

Quick Start | Documentation | Colab Notebook

Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments.

We focus on making it easy to build production applications with the Llama model family - from the latest Llama 3.3 to specialized models like Llama Guard for safety.

Llama Stack

Key Features

  • Unified API Layer for:

    • Inference: Run LLM models efficiently
    • Safety: Apply content filtering and safety policies
    • DatasetIO: Store and retrieve knowledge for RAG
    • Agents: Build multi-step agentic workflows
    • Evaluation: Test and improve model and agent quality
    • Telemetry: Collect and analyze usage data and complex agentic traces
    • Post Training ( Coming Soon ): Fine tune models for specific use cases
  • Rich Provider Ecosystem

    • Local Development: Meta's Reference,Ollama, vLLM, TGI
    • Self-hosted: Chroma, pgvector, Nvidia NIM
    • Cloud: Fireworks, Together, Nvidia, AWS Bedrock, Groq, Cerebras
    • On-device: iOS and Android support
  • Built for Production

    • Pre-packaged distributions for common deployment scenarios
    • Comprehensive evaluation capabilities
    • Full observability and monitoring
    • Provider federation and fallback

Supported Llama Stack Implementations

API Providers

API Provider Builder Environments Agents Inference Memory Safety Telemetry
Meta Reference Single Node ✔️ ✔️ ✔️ ✔️ ✔️
Cerebras Hosted ✔️
Fireworks Hosted ✔️ ✔️ ✔️
AWS Bedrock Hosted ✔️ ✔️
Together Hosted ✔️ ✔️ ✔️
Groq Hosted ✔️
Ollama Single Node ✔️
TGI Hosted and Single Node ✔️
NVIDIA NIM Hosted and Single Node ✔️
Chroma Single Node ✔️
PG Vector Single Node ✔️
PyTorch ExecuTorch On-device iOS ✔️ ✔️
vLLM Hosted and Single Node ✔️

Distributions

A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:

Distribution Llama Stack Docker Start This Distribution
Meta Reference llamastack/distribution-meta-reference-gpu Guide
Meta Reference Quantized llamastack/distribution-meta-reference-quantized-gpu Guide
Cerebras llamastack/distribution-cerebras Guide
Ollama llamastack/distribution-ollama Guide
TGI llamastack/distribution-tgi Guide
Together llamastack/distribution-together Guide
Fireworks llamastack/distribution-fireworks Guide
vLLM llamastack/distribution-remote-vllm Guide

Installation

You have two ways to install this repository:

  1. Install as a package: You can install the repository directly from PyPI by running the following command:

    pip install llama-stack
    
  2. Install from source: If you prefer to install from the source code, make sure you have conda installed. Then, follow these steps:

     mkdir -p ~/local
     cd ~/local
     git clone git@github.com:meta-llama/llama-stack.git
    
     conda create -n stack python=3.10
     conda activate stack
    
     cd llama-stack
     pip install -e .
    

Documentation

Please checkout our Documentation page for more details.

Llama Stack Client SDKs

Language Client SDK Package
Python llama-stack-client-python PyPI version
Swift llama-stack-client-swift Swift Package Index
Node llama-stack-client-node NPM version
Kotlin llama-stack-client-kotlin Maven version

Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from python, node, swift, and kotlin programming languages to quickly build your applications.

You can find more example scripts with client SDKs to talk with the Llama Stack server in our llama-stack-apps repo.