Docs: update full list of providers with matched APIs

- add model_type in example
- change "Memory" to "VectorIO" as column name
- update providers table
- update images from dockerhub
- update index.md

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit bb6c96c11c204fb195d81fa15431728c409847fc)
This commit is contained in:
Wen Zhou 2025-06-16 19:08:48 +02:00
parent d165000bbc
commit c9b0cc6439
4 changed files with 131 additions and 32 deletions

View file

@ -35,6 +35,8 @@ pip install llama-stack-client
### CLI ### CLI
```bash ```bash
# Run a chat completion # Run a chat completion
MODEL="Llama-4-Scout-17B-16E-Instruct"
llama-stack-client --endpoint http://localhost:8321 \ llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \ inference chat-completion \
--model-id meta-llama/$MODEL \ --model-id meta-llama/$MODEL \
@ -107,30 +109,51 @@ By reducing friction and complexity, Llama Stack empowers developers to focus on
### API Providers ### API Providers
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.
| **API Provider Builder** | **Environments** | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** | | API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |Tool Runtime| Scoring |
|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:| |:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:----------:|:-------:|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | | | Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| SambaNova | Hosted | | ✅ | | ✅ | | | | SambaNova | Hosted | | ✅ | | ✅ | | | | | | |
| Cerebras | Hosted | | ✅ | | | | | | Cerebras | Hosted | | ✅ | | | | | | | | |
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | | |
| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | | |
| Together | Hosted | ✅ | ✅ | | ✅ | | | | Together | Hosted | ✅ | ✅ | | ✅ | | | | | | |
| Groq | Hosted | | ✅ | | | | | | Groq | Hosted | | ✅ | | | | | | | | |
| Ollama | Single Node | | ✅ | | | | | | Ollama | Single Node | | ✅ | | | | | | | | |
| TGI | Hosted and Single Node | | ✅ | | | | | | TGI | Hosted/Single Node | | ✅ | | | | | | | | |
| NVIDIA NIM | Hosted and Single Node | | ✅ | | | | | | NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | | | |
| Chroma | Single Node | | | ✅ | | | | | ChromaDB | Hosted/Single Node | | | ✅ | | | | | | | |
| PG Vector | Single Node | | | ✅ | | | | | PG Vector | Single Node | | | ✅ | | | | | | | |
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | | |
| vLLM | Hosted and Single Node | | ✅ | | | | | | vLLM | Single Node | | ✅ | | | | | | | | |
| OpenAI | Hosted | | ✅ | | | | | | OpenAI | Hosted | | ✅ | | | | | | | | |
| Anthropic | Hosted | | ✅ | | | | | | Anthropic | Hosted | | ✅ | | | | | | | | |
| Gemini | Hosted | | ✅ | | | | | | Gemini | Hosted | | ✅ | | | | | | | | |
| watsonx | Hosted | | ✅ | | | | | | WatsonX | Hosted | | ✅ | | | | | | | | |
| HuggingFace | Single Node | | | | | | ✅ | | HuggingFace | Single Node | | | | | | ✅ | | ✅ | | |
| TorchTune | Single Node | | | | | | ✅ | | TorchTune | Single Node | | | | | | ✅ | | | | |
| NVIDIA NEMO | Hosted | | | | | | ✅ | | NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | | |
| NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ | | |
| FAISS | Single Node | | | ✅ | | | | | | | |
| SQLite-Vec | Single Node | | | ✅ | | | | | | | |
| Qdrant | Hosted/Single Node | | | ✅ | | | | | | | |
| Weaviate | Hosted | | | ✅ | | | | | | | |
| Milvus | Hosted/Single Node | | | ✅ | | | | | | | |
| Prompt Guard | Single Node | | | | ✅ | | | | | | |
| Llama Guard | Single Node | | | | ✅ | | | | | | |
| Code Scanner | Single Node | | | | ✅ | | | | | | |
| Brave Search | Hosted | | | | | | | | | ✅ | |
| Bing Search | Hosted | | | | | | | | | ✅ | |
| RAG Runtime | Single Node | | | | | | | | | ✅ | |
| Model Context Protocol | Hosted | | | | | | | | | ✅ | |
| Sentence Transformers | Single Node | | ✅ | | | | | | | | |
| Braintrust | Single Node | | | | | | | | | | ✅ |
| Basic | Single Node | | | | | | | | | | ✅ |
| LLM-as-Judge | Single Node | | | | | | | | | | ✅ |
| Databricks | Hosted | | ✅ | | | | | | | | |
| RunPod | Hosted | | ✅ | | | | | | | | |
| Passthrough | Hosted | | ✅ | | | | | | | | |
> **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation for providers like KubeFlow Training, KubeFlow Pipelines, RamaLama, and TrustyAI LM-Eval.
### Distributions ### Distributions
@ -145,7 +168,10 @@ A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider
| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) | | TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) |
| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) | | Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) |
| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) | | Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) |
| AWS Bedrock | [llamastack/distribution-bedrock](https://hub.docker.com/repository/docker/llamastack/distribution-bedrock/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/bedrock.html) |
| vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) | | vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) |
| Starter | [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general) | |
| PostgreSQL | [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general) | |
### Documentation ### Documentation

View file

@ -77,10 +77,10 @@ Next up is the most critical part: the set of providers that the stack will use
```yaml ```yaml
providers: providers:
inference: inference:
# provider_id is a string you can choose freely # provider_id is a string you can choose freely
- provider_id: ollama - provider_id: ollama
# provider_type is a string that specifies the type of provider. # provider_type is a string that specifies the type of provider.
# in this case, the provider for inference is ollama and it is run remotely (outside of the distribution) # in this case, the provider for inference is ollama and it runs remotely (outside of the distribution)
provider_type: remote::ollama provider_type: remote::ollama
# config is a dictionary that contains the configuration for the provider. # config is a dictionary that contains the configuration for the provider.
# in this case, the configuration is the url of the ollama server # in this case, the configuration is the url of the ollama server
@ -88,7 +88,7 @@ providers:
url: ${env.OLLAMA_URL:=http://localhost:11434} url: ${env.OLLAMA_URL:=http://localhost:11434}
``` ```
A few things to note: A few things to note:
- A _provider instance_ is identified with an (id, type, configuration) triplet. - A _provider instance_ is identified with an (id, type, config) triplet.
- The id is a string you can choose freely. - The id is a string you can choose freely.
- You can instantiate any number of provider instances of the same type. - You can instantiate any number of provider instances of the same type.
- The configuration dictionary is provider-specific. - The configuration dictionary is provider-specific.
@ -187,7 +187,7 @@ The environment variable substitution system is type-safe:
## Resources ## Resources
Finally, let's look at the `models` section: Let's look at the `models` section:
```yaml ```yaml
models: models:
@ -195,8 +195,9 @@ models:
model_id: ${env.INFERENCE_MODEL} model_id: ${env.INFERENCE_MODEL}
provider_id: ollama provider_id: ollama
provider_model_id: null provider_model_id: null
model_type: llm
``` ```
A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to always register models before using them, some Stack servers may come up a list of "already known and available" models. A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models.
What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`. What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`.

View file

@ -73,17 +73,26 @@ A number of "adapters" are available for some popular Inference and Vector Store
| OpenAI | Hosted | | OpenAI | Hosted |
| Anthropic | Hosted | | Anthropic | Hosted |
| Gemini | Hosted | | Gemini | Hosted |
| WatsonX | Hosted |
**Agents API**
| **Provider** | **Environments** |
| :----: | :----: |
| Meta Reference | Single Node |
| Fireworks | Hosted |
| Together | Hosted |
| PyTorch ExecuTorch | On-device iOS |
**Vector IO API** **Vector IO API**
| **Provider** | **Environments** | | **Provider** | **Environments** |
| :----: | :----: | | :----: | :----: |
| FAISS | Single Node | | FAISS | Single Node |
| SQLite-Vec| Single Node | | SQLite-Vec | Single Node |
| Chroma | Hosted and Single Node | | Chroma | Hosted and Single Node |
| Milvus | Hosted and Single Node | | Milvus | Hosted and Single Node |
| Postgres (PGVector) | Hosted and Single Node | | Postgres (PGVector) | Hosted and Single Node |
| Weaviate | Hosted | | Weaviate | Hosted |
| Qdrant | Hosted and Single Node |
**Safety API** **Safety API**
| **Provider** | **Environments** | | **Provider** | **Environments** |
@ -93,6 +102,30 @@ A number of "adapters" are available for some popular Inference and Vector Store
| Code Scanner | Single Node | | Code Scanner | Single Node |
| AWS Bedrock | Hosted | | AWS Bedrock | Hosted |
**Post Training API**
| **Provider** | **Environments** |
| :----: | :----: |
| Meta Reference | Single Node |
| HuggingFace | Single Node |
| TorchTune | Single Node |
| NVIDIA NEMO | Hosted |
**Eval API**
| **Provider** | **Environments** |
| :----: | :----: |
| Meta Reference | Single Node |
| NVIDIA NEMO | Hosted |
**Telemetry API**
| **Provider** | **Environments** |
| :----: | :----: |
| Meta Reference | Single Node |
**Tool Runtime API**
| **Provider** | **Environments** |
| :----: | :----: |
| Brave Search | Hosted |
| RAG Runtime | Single Node |
```{toctree} ```{toctree}
:hidden: :hidden:

View file

@ -1,9 +1,10 @@
# Providers Overview # Providers Overview
The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include: The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.), - LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.),
- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.), - Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.),
- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.) - Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.),
- Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.)
Providers come in two flavors: Providers come in two flavors:
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code. - **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
@ -11,6 +12,44 @@ Providers come in two flavors:
Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally. Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.
## Available Providers
Here is a comprehensive list of all available API providers in Llama Stack:
| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO | Tool Runtime |
|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:------------:|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SambaNova | Hosted | | ✅ | | ✅ | | | | | |
| Cerebras | Hosted | | ✅ | | | | | | | |
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | |
| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | |
| Together | Hosted | ✅ | ✅ | | ✅ | | | | | |
| Groq | Hosted | | ✅ | | | | | | | |
| Ollama | Single Node | | ✅ | | | | | | | |
| TGI | Hosted/Single Node | | ✅ | | | | | | | |
| NVIDIA NIM | Hosted/Single Node | | ✅ | | | | | | | |
| Chroma | Single Node | | | ✅ | | | | | | |
| PG Vector | Single Node | | | ✅ | | | | | | |
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | |
| vLLM | Single Node | | ✅ | | | | | | | |
| OpenAI | Hosted | | ✅ | | | | | | | |
| Anthropic | Hosted | | ✅ | | | | | | | |
| Gemini | Hosted | | ✅ | | | | | | | |
| WatsonX | Hosted | | ✅ | | | | | | | |
| HuggingFace | Single Node | | | | | | ✅ | | ✅ | |
| TorchTune | Single Node | | | | | | ✅ | | | |
| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | |
| FAISS | Single Node | | | ✅ | | | | | | |
| SQLite-Vec | Single Node | | | ✅ | | | | | | |
| Qdrant | Hosted/Single Node | | | ✅ | | | | | | |
| Weaviate | Hosted | | | ✅ | | | | | | |
| Milvus | Hosted/Single Node | | | ✅ | | | | | | |
| Prompt Guard | Single Node | | | | ✅ | | | | | |
| Llama Guard | Single Node | | | | ✅ | | | | | |
| Code Scanner | Single Node | | | | ✅ | | | | | |
| Brave Search | Hosted | | | | | | | | | ✅ |
| RAG Runtime | Single Node | | | | | | | | | ✅ |
## External Providers ## External Providers
Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently. Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently.