llama-stack-mirror/docs/source/providers/index.md

# Providers Overview

The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
- LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.),
- Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.),
- Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.),
- Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.)

Providers come in two flavors:
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
- **Inline**: the provider is fully specified and implemented within the Llama Stack codebase. It may be a simple wrapper around an existing library, or a full fledged implementation within Llama Stack.

Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.

## Available Providers

Here is a comprehensive list of all available API providers in Llama Stack:

| API Provider Builder    | Environments      | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |Tool Runtime| Scoring |
|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:----------:|:-------:|
| Meta Reference         | Single Node        |   ✅   |    ✅     |    ✅    |   ✅   |    ✅     |      ✅      |  ✅  |    ✅     |      ✅    |         |
| SambaNova              | Hosted             |        |    ✅     |          |   ✅   |           |              |      |           |             |         |
| Cerebras               | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| Fireworks              | Hosted             |   ✅   |    ✅     |    ✅    |        |           |              |      |           |             |         |
| AWS Bedrock            | Hosted             |        |    ✅     |          |   ✅   |           |              |      |           |             |         |
| Together               | Hosted             |   ✅   |    ✅     |          |   ✅   |           |              |      |           |             |         |
| Groq                   | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| Ollama                 | Single Node        |        |    ✅     |          |        |           |              |      |           |             |         |
| TGI                    | Hosted/Single Node |        |    ✅     |          |        |           |              |      |           |             |         |
| NVIDIA NIM             | Hosted/Single Node |        |    ✅     |          |   ✅   |           |              |      |           |             |         |
| ChromaDB               | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |         |
| PG Vector              | Single Node        |        |           |    ✅    |        |           |              |      |           |             |         |
| vLLM                   | Single Node        |        |    ✅     |          |        |           |              |      |           |             |         |
| OpenAI                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| Anthropic              | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| Gemini                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| WatsonX                | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| HuggingFace            | Single Node        |        |           |          |        |           |      ✅      |      |    ✅     |             |         |
| TorchTune              | Single Node        |        |           |          |        |           |      ✅      |      |           |             |         |
| NVIDIA NEMO            | Hosted             |        |    ✅     |    ✅    |        |           |      ✅      |  ✅  |    ✅     |             |         |
| NVIDIA                 | Hosted             |        |           |          |        |           |      ✅      |  ✅  |    ✅     |             |         |
| FAISS                  | Single Node        |        |           |    ✅    |        |           |              |      |           |             |         |
| SQLite-Vec             | Single Node        |        |           |    ✅    |        |           |              |      |           |             |         |
| Qdrant                 | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |         |
| Weaviate               | Hosted             |        |           |    ✅    |        |           |              |      |           |             |         |
| Milvus                 | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |         |
| Prompt Guard           | Single Node        |        |           |          |   ✅   |           |              |      |           |             |         |
| Llama Guard            | Single Node        |        |           |          |   ✅   |           |              |      |           |             |         |
| Code Scanner           | Single Node        |        |           |          |   ✅   |           |              |      |           |             |         |
| Brave Search           | Hosted             |        |           |          |        |           |              |      |           |      ✅     |         |
| Bing Search            | Hosted             |        |           |          |        |           |              |      |           |      ✅     |         |
| RAG Runtime            | Single Node        |        |           |          |        |           |              |      |           |      ✅     |         |
| Model Context Protocol | Hosted             |        |           |          |        |           |              |      |           |      ✅     |         |
| Sentence Transformers  | Single Node        |        |    ✅     |          |        |           |              |      |           |             |         |
| Braintrust             | Single Node        |        |           |          |        |           |              |      |           |             |    ✅   |
| Basic                  | Single Node        |        |           |          |        |           |              |      |           |             |    ✅   |
| LLM-as-Judge           | Single Node        |        |           |          |        |           |              |      |           |             |    ✅   |
| Databricks             | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| RunPod                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| Passthrough            | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
| PyTorch ExecuTorch     | On-device iOS, Android |   ✅   |    ✅     |          |        |           |              |      |           |             |         |
## External Providers

Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently.

```{toctree}
:maxdepth: 1

external
```

## Agents
Run multi-step agentic workflows with LLMs with tool usage, memory (RAG), etc.

```{toctree}
:maxdepth: 1

agents/index
```

## DatasetIO
Interfaces with datasets and data loaders.

```{toctree}
:maxdepth: 1

datasetio/index
```

## Eval
Generates outputs (via Inference or Agents) and perform scoring.

```{toctree}
:maxdepth: 1

eval/index
```

## Inference
Runs inference with an LLM.

```{toctree}
:maxdepth: 1

inference/index
```

## Post Training
Fine-tunes a model.

```{toctree}
:maxdepth: 1

post_training/index
```

## Safety
Applies safety policies to the output at a Systems (not only model) level.

```{toctree}
:maxdepth: 1

safety/index
```

## Scoring
Evaluates the outputs of the system.

```{toctree}
:maxdepth: 1

scoring/index
```

## Telemetry
Collects telemetry data from the system.

```{toctree}
:maxdepth: 1

telemetry/index
```

## Tool Runtime
Is associated with the ToolGroup resouces.

```{toctree}
:maxdepth: 1

tool_runtime/index
```

## Vector IO

Vector IO refers to operations on vector databases, such as adding documents, searching, and deleting documents.
Vector IO plays a crucial role in [Retreival Augmented Generation (RAG)](../..//building_applications/rag), where the vector
io and database are used to store and retrieve documents for retrieval.

```{toctree}
:maxdepth: 1

vector_io/index
```