llama-stack-mirror/docs/source/providers/index.md
Wen Zhou 040424acf5
docs: update full list of providers with matched APIs and dockerhub images (#2452)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- add model_type in example
- change "Memory" to "VectorIO" as column name
- update index.md and README.md

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
run pre-commit to catch changes.

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-07-03 10:12:56 +02:00

2.7 KiB

Providers Overview

The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:

  • LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.),
  • Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.),
  • Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.),
  • Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.)

Providers come in two flavors:

  • Remote: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
  • Inline: the provider is fully specified and implemented within the Llama Stack codebase. It may be a simple wrapper around an existing library, or a full fledged implementation within Llama Stack.

Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.

External Providers

Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently.

:maxdepth: 1

external

Agents

Run multi-step agentic workflows with LLMs with tool usage, memory (RAG), etc.

:maxdepth: 1

agents/index

DatasetIO

Interfaces with datasets and data loaders.

:maxdepth: 1

datasetio/index

Eval

Generates outputs (via Inference or Agents) and perform scoring.

:maxdepth: 1

eval/index

Inference

Runs inference with an LLM.

:maxdepth: 1

inference/index

Post Training

Fine-tunes a model.

:maxdepth: 1

post_training/index

Safety

Applies safety policies to the output at a Systems (not only model) level.

:maxdepth: 1

safety/index

Scoring

Evaluates the outputs of the system.

:maxdepth: 1

scoring/index

Telemetry

Collects telemetry data from the system.

:maxdepth: 1

telemetry/index

Tool Runtime

Is associated with the ToolGroup resouces.

:maxdepth: 1

tool_runtime/index

Vector IO

Vector IO refers to operations on vector databases, such as adding documents, searching, and deleting documents. Vector IO plays a crucial role in Retreival Augmented Generation (RAG), where the vector io and database are used to store and retrieve documents for retrieval.

:maxdepth: 1

vector_io/index