llama-stack-mirror/README.md

# Llama Stack

[![PyPI version](https://img.shields.io/pypi/v/llama_stack.svg)](https://pypi.org/project/llama_stack/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-stack)](https://pypi.org/project/llama-stack/)
[![License](https://img.shields.io/pypi/l/llama_stack.svg)](https://github.com/meta-llama/llama-stack/blob/main/LICENSE)
[![Discord](https://img.shields.io/discord/1257833999603335178?color=6A7EC2&logo=discord&logoColor=ffffff)](https://discord.gg/llama-stack)
[![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)
[![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)

[**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)

### ✨🎉 Llama 4 Support  🎉✨
We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.

<details>

<summary>👋 Click here to see how to run Llama 4 models on Llama Stack </summary>

\
*Note you need 8xH100 GPU-host to run these models*

```bash
pip install -U llama_stack

MODEL="Llama-4-Scout-17B-16E-Instruct"
# get meta url from llama.com
llama model download --source meta --model-id $MODEL --meta-url <META_URL>

# start a llama stack server
INFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu

# install client to interact with the server
pip install llama-stack-client
```
### CLI
```bash
# Run a chat completion
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id meta-llama/$MODEL \
--message "write a haiku for meta's llama 4 models"

ChatCompletionResponse(
    completion_message=CompletionMessage(content="Whispers in code born\nLlama's gentle, wise heartbeat\nFuture's soft unfold", role='assistant', stop_reason='end_of_turn', tool_calls=[]),
    logprobs=None,
    metrics=[Metric(metric='prompt_tokens', value=21.0, unit=None), Metric(metric='completion_tokens', value=28.0, unit=None), Metric(metric='total_tokens', value=49.0, unit=None)]
)
```
### Python SDK
```python
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url=f"http://localhost:8321")

model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
prompt = "Write a haiku about coding"

print(f"User> {prompt}")
response = client.inference.chat_completion(
    model_id=model_id,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt},
    ],
)
print(f"Assistant> {response.completion_message.content}")
```
As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!


</details>

### 🚀 One-Line Installer 🚀

To try Llama Stack locally, run:

```bash
curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh
```

### Overview

Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides

- **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
- **Plugin architecture** to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
- **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment.
- **Multiple developer interfaces** like CLI and SDKs for Python, Typescript, iOS, and Android.
- **Standalone applications** as examples for how to build production-grade AI applications with Llama Stack.

```mermaid
%%{
	init: {
		'theme': 'base',
		'themeVariables': {
			'fontSize':'100px'
		}
	}
}%%
graph TD
	%% === Classes to control layout ===
	classDef inv rankSpacing:0,diagramPadding:0,nodeSpacing:1000,padding:0,opacity:0,stroke-width:0

	%% === Classes for color-coded nodes (scaled stroke width & corners) ===
	classDef darkGray fill:#ddd,stroke:#999,stroke-width:0px,rx:40px,ry:40px
	classDef agentYellow fill:#FDF3B6,stroke:#999,stroke-width:5px,rx:40px,ry:40px
	classDef violetBlock fill:#EEE8F4,stroke:#999,stroke-width:5px,rx:40px,ry:40px
	classDef lightGreen fill:#D8EDC1,stroke:#999,stroke-width:5px,rx:40px,ry:40px
	classDef telemetryGray fill:#E7E7E7,stroke:#999,stroke-width:5px,rx:40px,ry:40px
	%% === Top layer ===
	top[Llama&nbsp;Stack&nbsp;Client&nbsp;SDKs,&nbsp;CLI,&nbsp;User&nbsp;Interfaces]:::darkGray
	top ~~~ M1
	%% dummy right subgraph for alighnment and balance
	subgraph R
	end
	top ~~~ R
	subgraph Middle[" "]
		subgraph M1[" "]
			Agents:::agentYellow
			PostTraining[Post&nbsp;Training]:::violetBlock
		end

		M1 ~~~ M2
		subgraph M2[" "]
			classDef lightBlue fill:#C7E4F7,stroke:#999,stroke-width:5px,rx:40px,ry:40px
			classDef lightRed fill:#F8C1B1,stroke:#999,stroke-width:5px,rx:40px,ry:40px
			classDef lightOrange fill:#F9D591,stroke:#999,stroke-width:5px,rx:40px,ry:40px
			VectorIO:::lightBlue
			Inference:::lightRed
			Evals:::lightOrange
			SyntheticData[Synthetic&nbsp;Data]:::violetBlock
		end

		M2 ~~~ M3
		subgraph M3[" "]
			Safety:::lightGreen
			BatchInference[Batch&nbsp;Inference]:::violetBlock
			BatchAgents[Batch&nbsp;Agents]:::agentYellow
		end

		M3 ~~~ M4
		subgraph M4[" "]
		end
		M4 ~~~ M5
		subgraph M5[" "]
		%% === Dashed border classes (scaled stroke/dash/corners) ===
			classDef resBlue fill:#C9DCEC,stroke:#999,stroke-width:5px,rx:40px,ry:40px,stroke-dasharray:50 50
			classDef resGray fill:#EEE,stroke:#999,stroke-width:5px,rx:40px,ry:40px,stroke-dasharray:50 50
			classDef resYellow fill:#F6E3B3,stroke:#999,stroke-width:5px,rx:40px,ry:40px,stroke-dasharray:50 50
			classDef resGreen fill:#D8EDC1,stroke:#999,stroke-width:5px,rx:40px,ry:40px,stroke-dasharray:50 50
			VectorDBs:::resBlue
			Models:::resGray
			Shields:::resGreen
			Datasets:::resYellow
		end
		M5 ~~~ M6
		subgraph M6[" "]
        _[" "]:::inv
		end
		M6 ~~~ M7
		subgraph M7[" "]
			Telemetry:::telemetryGray
		end
	end
	M7 ~~~ SP
	%% dummy left subgraphs for alighnment and balance
	top ~~~ L
	subgraph L
	end
	L ~~~ L2
	subgraph L2
	end
	SP[&nbsp;&nbsp;Service&nbsp;Providers&nbsp;&nbsp;]:::darkGray
	class Top,M1,M2,M3,M4,M5,M7,R,L,L2 inv
	classDef hr height:1,width:1500,fill:#EEE,stroke:#999,stroke-width:10,stroke-dasharray:10 50
	class L,R,M6 hr
	classDef MiddleC fill:#eee,rx:40px,ry:40px
	class Middle MiddleC
```

### Llama Stack Benefits
- **Flexible Options**: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
- **Consistent Experience**: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
- **Robust Ecosystem**: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.

By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.

### API Providers
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.

| **API Provider Builder** |    **Environments**    | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** |
|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:|
|      Meta Reference      |      Single Node       |     ✅      |       ✅       |     ✅      |     ✅      |       ✅       |               |
|        SambaNova         |         Hosted         |            |       ✅       |            |     ✅      |               |                  |
|         Cerebras         |         Hosted         |            |       ✅       |            |            |               |                  |
|        Fireworks         |         Hosted         |     ✅      |       ✅       |     ✅      |            |               |                |
|       AWS Bedrock        |         Hosted         |            |       ✅       |            |     ✅      |               |                |
|         Together         |         Hosted         |     ✅      |       ✅       |            |     ✅      |               |                |
|           Groq           |         Hosted         |            |       ✅       |            |            |               |                 |
|          Ollama          |      Single Node       |            |       ✅       |            |            |               |                 |
|           TGI            | Hosted and Single Node |            |       ✅       |            |            |               |                 |
|        NVIDIA NIM        | Hosted and Single Node |            |       ✅       |            |            |               |                 |
|          Chroma          |      Single Node       |            |               |     ✅      |            |               |                 |
|        PG Vector         |      Single Node       |            |               |     ✅      |            |               |                 |
|    PyTorch ExecuTorch    |     On-device iOS      |     ✅      |       ✅       |            |            |               |                |
|           vLLM           | Hosted and Single Node |            |       ✅       |            |            |               |                 |
|          OpenAI          |         Hosted         |            |       ✅       |            |            |               |                 |
|        Anthropic         |         Hosted         |            |       ✅       |            |            |               |                 |
|          Gemini          |         Hosted         |            |       ✅       |            |            |               |                 |
|          watsonx         |         Hosted         |            |       ✅       |            |            |               |                 |
|        HuggingFace       |       Single Node      |            |                |            |            |               |       ✅        |
|         TorchTune        |       Single Node      |            |                |            |            |               |       ✅        |
|       NVIDIA NEMO        |         Hosted         |            |                |            |            |               |       ✅        |


### Distributions

A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:

|               **Distribution**                |                                                                    **Llama Stack Docker**                                                                     |                                                 Start This Distribution                                                  |
|:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|
|                Meta Reference                 |           [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general)           |      [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html)      |
|                   SambaNova                   |                     [llamastack/distribution-sambanova](https://hub.docker.com/repository/docker/llamastack/distribution-sambanova/general)                     |   [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/sambanova.html)   |
|                   Cerebras                    |                     [llamastack/distribution-cerebras](https://hub.docker.com/repository/docker/llamastack/distribution-cerebras/general)                     |   [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/cerebras.html)   |
|                    Ollama                     |                       [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general)                       |            [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/ollama.html)            |
|                      TGI                      |                          [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general)                          |             [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html)              |
|                   Together                    |                     [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general)                     |           [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html)           |
|                   Fireworks                   |                    [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general)                    |          [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html)           |
| vLLM |                  [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general)                  |         [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html)          |


### Documentation

Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest/index.html) page for more details.

* CLI references
    * [llama (server-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html): Guide for using the `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
    * [llama (client-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_stack_client_cli_reference.html): Guide for using the `llama-stack-client` CLI, which allows you to query information about the distribution.
* Getting Started
    * [Quick guide to start a Llama Stack server](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).
    * [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs
    * The complete Llama Stack lesson [Colab notebook](https://colab.research.google.com/drive/1dtVmxotBsI4cGZQNsJRYPrLiDeT0Wnwt) of the new [Llama 3.2 course on Deeplearning.ai](https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/8/llama-stack).
    * A [Zero-to-Hero Guide](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) that guide you through all the key components of llama stack with code samples.
* [Contributing](CONTRIBUTING.md)
    * [Adding a new API Provider](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) to walk-through how to add a new API provider.

### Llama Stack Client SDKs

|  **Language** |  **Client SDK** | **Package** |
| :----: | :----: | :----: |
| Python |  [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [![PyPI version](https://img.shields.io/pypi/v/llama_stack_client.svg)](https://pypi.org/project/llama_stack_client/)
| Swift  | [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift) | [![Swift Package Index](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Fmeta-llama%2Fllama-stack-client-swift%2Fbadge%3Ftype%3Dswift-versions)](https://swiftpackageindex.com/meta-llama/llama-stack-client-swift)
| Typescript   | [llama-stack-client-typescript](https://github.com/meta-llama/llama-stack-client-typescript) | [![NPM version](https://img.shields.io/npm/v/llama-stack-client.svg)](https://npmjs.org/package/llama-stack-client)
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | [![Maven version](https://img.shields.io/maven-central/v/com.llama.llamastack/llama-stack-client-kotlin)](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin)

Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.

You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.