mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 19:04:19 +00:00
Docs: update full list of providers with matched APIs
- add model_type in example - change "Memory" to "VectorIO" as column name - update providers table - update images from dockerhub - update index.md Signed-off-by: Wen Zhou <wenzhou@redhat.com>
This commit is contained in:
parent
cfee63bd0d
commit
bb6c96c11c
4 changed files with 131 additions and 33 deletions
72
README.md
72
README.md
|
@ -35,6 +35,8 @@ pip install llama-stack-client
|
||||||
### CLI
|
### CLI
|
||||||
```bash
|
```bash
|
||||||
# Run a chat completion
|
# Run a chat completion
|
||||||
|
MODEL="Llama-4-Scout-17B-16E-Instruct"
|
||||||
|
|
||||||
llama-stack-client --endpoint http://localhost:8321 \
|
llama-stack-client --endpoint http://localhost:8321 \
|
||||||
inference chat-completion \
|
inference chat-completion \
|
||||||
--model-id meta-llama/$MODEL \
|
--model-id meta-llama/$MODEL \
|
||||||
|
@ -107,30 +109,51 @@ By reducing friction and complexity, Llama Stack empowers developers to focus on
|
||||||
### API Providers
|
### API Providers
|
||||||
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.
|
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.
|
||||||
|
|
||||||
| **API Provider Builder** | **Environments** | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** |
|
| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |Tool Runtime| Scoring |
|
||||||
|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:|
|
|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:----------:|:-------:|
|
||||||
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
||||||
| SambaNova | Hosted | | ✅ | | ✅ | | |
|
| SambaNova | Hosted | | ✅ | | ✅ | | | | | | |
|
||||||
| Cerebras | Hosted | | ✅ | | | | |
|
| Cerebras | Hosted | | ✅ | | | | | | | | |
|
||||||
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | |
|
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | | |
|
||||||
| AWS Bedrock | Hosted | | ✅ | | ✅ | | |
|
| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | | |
|
||||||
| Together | Hosted | ✅ | ✅ | | ✅ | | |
|
| Together | Hosted | ✅ | ✅ | | ✅ | | | | | | |
|
||||||
| Groq | Hosted | | ✅ | | | | |
|
| Groq | Hosted | | ✅ | | | | | | | | |
|
||||||
| Ollama | Single Node | | ✅ | | | | |
|
| Ollama | Single Node | | ✅ | | | | | | | | |
|
||||||
| TGI | Hosted and Single Node | | ✅ | | | | |
|
| TGI | Hosted/Single Node | | ✅ | | | | | | | | |
|
||||||
| NVIDIA NIM | Hosted and Single Node | | ✅ | | | | |
|
| NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | | | |
|
||||||
| Chroma | Single Node | | | ✅ | | | |
|
| ChromaDB | Hosted/Single Node | | | ✅ | | | | | | | |
|
||||||
| PG Vector | Single Node | | | ✅ | | | |
|
| PG Vector | Single Node | | | ✅ | | | | | | | |
|
||||||
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | |
|
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | | |
|
||||||
| vLLM | Hosted and Single Node | | ✅ | | | | |
|
| vLLM | Single Node | | ✅ | | | | | | | | |
|
||||||
| OpenAI | Hosted | | ✅ | | | | |
|
| OpenAI | Hosted | | ✅ | | | | | | | | |
|
||||||
| Anthropic | Hosted | | ✅ | | | | |
|
| Anthropic | Hosted | | ✅ | | | | | | | | |
|
||||||
| Gemini | Hosted | | ✅ | | | | |
|
| Gemini | Hosted | | ✅ | | | | | | | | |
|
||||||
| watsonx | Hosted | | ✅ | | | | |
|
| WatsonX | Hosted | | ✅ | | | | | | | | |
|
||||||
| HuggingFace | Single Node | | | | | | ✅ |
|
| HuggingFace | Single Node | | | | | | ✅ | | ✅ | | |
|
||||||
| TorchTune | Single Node | | | | | | ✅ |
|
| TorchTune | Single Node | | | | | | ✅ | | | | |
|
||||||
| NVIDIA NEMO | Hosted | | | | | | ✅ |
|
| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | | |
|
||||||
|
| NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ | | |
|
||||||
|
| FAISS | Single Node | | | ✅ | | | | | | | |
|
||||||
|
| SQLite-Vec | Single Node | | | ✅ | | | | | | | |
|
||||||
|
| Qdrant | Hosted/Single Node | | | ✅ | | | | | | | |
|
||||||
|
| Weaviate | Hosted | | | ✅ | | | | | | | |
|
||||||
|
| Milvus | Hosted/Single Node | | | ✅ | | | | | | | |
|
||||||
|
| Prompt Guard | Single Node | | | | ✅ | | | | | | |
|
||||||
|
| Llama Guard | Single Node | | | | ✅ | | | | | | |
|
||||||
|
| Code Scanner | Single Node | | | | ✅ | | | | | | |
|
||||||
|
| Brave Search | Hosted | | | | | | | | | ✅ | |
|
||||||
|
| Bing Search | Hosted | | | | | | | | | ✅ | |
|
||||||
|
| RAG Runtime | Single Node | | | | | | | | | ✅ | |
|
||||||
|
| Model Context Protocol | Hosted | | | | | | | | | ✅ | |
|
||||||
|
| Sentence Transformers | Single Node | | ✅ | | | | | | | | |
|
||||||
|
| Braintrust | Single Node | | | | | | | | | | ✅ |
|
||||||
|
| Basic | Single Node | | | | | | | | | | ✅ |
|
||||||
|
| LLM-as-Judge | Single Node | | | | | | | | | | ✅ |
|
||||||
|
| Databricks | Hosted | | ✅ | | | | | | | | |
|
||||||
|
| RunPod | Hosted | | ✅ | | | | | | | | |
|
||||||
|
| Passthrough | Hosted | | ✅ | | | | | | | | |
|
||||||
|
|
||||||
|
> **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation for providers like KubeFlow Training, KubeFlow Pipelines, RamaLama, and TrustyAI LM-Eval.
|
||||||
|
|
||||||
### Distributions
|
### Distributions
|
||||||
|
|
||||||
|
@ -145,7 +168,10 @@ A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider
|
||||||
| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) |
|
| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) |
|
||||||
| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) |
|
| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) |
|
||||||
| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) |
|
| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) |
|
||||||
|
| AWS Bedrock | [llamastack/distribution-bedrock](https://hub.docker.com/repository/docker/llamastack/distribution-bedrock/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/bedrock.html) |
|
||||||
| vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) |
|
| vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) |
|
||||||
|
| Starter | [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general) | |
|
||||||
|
| PostgreSQL | [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general) | |
|
||||||
|
|
||||||
|
|
||||||
### Documentation
|
### Documentation
|
||||||
|
|
|
@ -67,7 +67,6 @@ Let's break this down into the different sections. The first section specifies t
|
||||||
apis:
|
apis:
|
||||||
- agents
|
- agents
|
||||||
- inference
|
- inference
|
||||||
- memory
|
|
||||||
- safety
|
- safety
|
||||||
- telemetry
|
- telemetry
|
||||||
```
|
```
|
||||||
|
@ -80,7 +79,7 @@ providers:
|
||||||
# provider_id is a string you can choose freely
|
# provider_id is a string you can choose freely
|
||||||
- provider_id: ollama
|
- provider_id: ollama
|
||||||
# provider_type is a string that specifies the type of provider.
|
# provider_type is a string that specifies the type of provider.
|
||||||
# in this case, the provider for inference is ollama and it is run remotely (outside of the distribution)
|
# in this case, the provider for inference is ollama and it runs remotely (outside of the distribution)
|
||||||
provider_type: remote::ollama
|
provider_type: remote::ollama
|
||||||
# config is a dictionary that contains the configuration for the provider.
|
# config is a dictionary that contains the configuration for the provider.
|
||||||
# in this case, the configuration is the url of the ollama server
|
# in this case, the configuration is the url of the ollama server
|
||||||
|
@ -88,7 +87,7 @@ providers:
|
||||||
url: ${env.OLLAMA_URL:http://localhost:11434}
|
url: ${env.OLLAMA_URL:http://localhost:11434}
|
||||||
```
|
```
|
||||||
A few things to note:
|
A few things to note:
|
||||||
- A _provider instance_ is identified with an (id, type, configuration) triplet.
|
- A _provider instance_ is identified with an (id, type, config) triplet.
|
||||||
- The id is a string you can choose freely.
|
- The id is a string you can choose freely.
|
||||||
- You can instantiate any number of provider instances of the same type.
|
- You can instantiate any number of provider instances of the same type.
|
||||||
- The configuration dictionary is provider-specific.
|
- The configuration dictionary is provider-specific.
|
||||||
|
@ -96,7 +95,7 @@ A few things to note:
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
Finally, let's look at the `models` section:
|
Let's look at the `models` section:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
|
@ -104,8 +103,9 @@ models:
|
||||||
model_id: ${env.INFERENCE_MODEL}
|
model_id: ${env.INFERENCE_MODEL}
|
||||||
provider_id: ollama
|
provider_id: ollama
|
||||||
provider_model_id: null
|
provider_model_id: null
|
||||||
|
model_type: llm
|
||||||
```
|
```
|
||||||
A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to always register models before using them, some Stack servers may come up a list of "already known and available" models.
|
A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models.
|
||||||
|
|
||||||
What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`.
|
What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`.
|
||||||
|
|
||||||
|
|
|
@ -73,7 +73,15 @@ A number of "adapters" are available for some popular Inference and Vector Store
|
||||||
| OpenAI | Hosted |
|
| OpenAI | Hosted |
|
||||||
| Anthropic | Hosted |
|
| Anthropic | Hosted |
|
||||||
| Gemini | Hosted |
|
| Gemini | Hosted |
|
||||||
|
| WatsonX | Hosted |
|
||||||
|
|
||||||
|
**Agents API**
|
||||||
|
| **Provider** | **Environments** |
|
||||||
|
| :----: | :----: |
|
||||||
|
| Meta Reference | Single Node |
|
||||||
|
| Fireworks | Hosted |
|
||||||
|
| Together | Hosted |
|
||||||
|
| PyTorch ExecuTorch | On-device iOS |
|
||||||
|
|
||||||
**Vector IO API**
|
**Vector IO API**
|
||||||
| **Provider** | **Environments** |
|
| **Provider** | **Environments** |
|
||||||
|
@ -84,6 +92,7 @@ A number of "adapters" are available for some popular Inference and Vector Store
|
||||||
| Milvus | Hosted and Single Node |
|
| Milvus | Hosted and Single Node |
|
||||||
| Postgres (PGVector) | Hosted and Single Node |
|
| Postgres (PGVector) | Hosted and Single Node |
|
||||||
| Weaviate | Hosted |
|
| Weaviate | Hosted |
|
||||||
|
| Qdrant | Hosted and Single Node |
|
||||||
|
|
||||||
**Safety API**
|
**Safety API**
|
||||||
| **Provider** | **Environments** |
|
| **Provider** | **Environments** |
|
||||||
|
@ -93,6 +102,30 @@ A number of "adapters" are available for some popular Inference and Vector Store
|
||||||
| Code Scanner | Single Node |
|
| Code Scanner | Single Node |
|
||||||
| AWS Bedrock | Hosted |
|
| AWS Bedrock | Hosted |
|
||||||
|
|
||||||
|
**Post Training API**
|
||||||
|
| **Provider** | **Environments** |
|
||||||
|
| :----: | :----: |
|
||||||
|
| Meta Reference | Single Node |
|
||||||
|
| HuggingFace | Single Node |
|
||||||
|
| TorchTune | Single Node |
|
||||||
|
| NVIDIA NEMO | Hosted |
|
||||||
|
|
||||||
|
**Eval API**
|
||||||
|
| **Provider** | **Environments** |
|
||||||
|
| :----: | :----: |
|
||||||
|
| Meta Reference | Single Node |
|
||||||
|
| NVIDIA NEMO | Hosted |
|
||||||
|
|
||||||
|
**Telemetry API**
|
||||||
|
| **Provider** | **Environments** |
|
||||||
|
| :----: | :----: |
|
||||||
|
| Meta Reference | Single Node |
|
||||||
|
|
||||||
|
**Tool Runtime API**
|
||||||
|
| **Provider** | **Environments** |
|
||||||
|
| :----: | :----: |
|
||||||
|
| Brave Search | Hosted |
|
||||||
|
| RAG Runtime | Single Node |
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:hidden:
|
:hidden:
|
||||||
|
|
|
@ -1,9 +1,10 @@
|
||||||
# Providers Overview
|
# Providers Overview
|
||||||
|
|
||||||
The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
|
The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
|
||||||
- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.),
|
- LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.),
|
||||||
- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.),
|
- Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.),
|
||||||
- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.)
|
- Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.),
|
||||||
|
- Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.)
|
||||||
|
|
||||||
Providers come in two flavors:
|
Providers come in two flavors:
|
||||||
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
|
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
|
||||||
|
@ -11,6 +12,44 @@ Providers come in two flavors:
|
||||||
|
|
||||||
Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.
|
Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.
|
||||||
|
|
||||||
|
## Available Providers
|
||||||
|
|
||||||
|
Here is a comprehensive list of all available API providers in Llama Stack:
|
||||||
|
|
||||||
|
| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO | Tool Runtime |
|
||||||
|
|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:------------:|
|
||||||
|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||||
|
| SambaNova | Hosted | | ✅ | | ✅ | | | | | |
|
||||||
|
| Cerebras | Hosted | | ✅ | | | | | | | |
|
||||||
|
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | |
|
||||||
|
| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | |
|
||||||
|
| Together | Hosted | ✅ | ✅ | | ✅ | | | | | |
|
||||||
|
| Groq | Hosted | | ✅ | | | | | | | |
|
||||||
|
| Ollama | Single Node | | ✅ | | | | | | | |
|
||||||
|
| TGI | Hosted/Single Node | | ✅ | | | | | | | |
|
||||||
|
| NVIDIA NIM | Hosted/Single Node | | ✅ | | | | | | | |
|
||||||
|
| Chroma | Single Node | | | ✅ | | | | | | |
|
||||||
|
| PG Vector | Single Node | | | ✅ | | | | | | |
|
||||||
|
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | |
|
||||||
|
| vLLM | Single Node | | ✅ | | | | | | | |
|
||||||
|
| OpenAI | Hosted | | ✅ | | | | | | | |
|
||||||
|
| Anthropic | Hosted | | ✅ | | | | | | | |
|
||||||
|
| Gemini | Hosted | | ✅ | | | | | | | |
|
||||||
|
| WatsonX | Hosted | | ✅ | | | | | | | |
|
||||||
|
| HuggingFace | Single Node | | | | | | ✅ | | ✅ | |
|
||||||
|
| TorchTune | Single Node | | | | | | ✅ | | | |
|
||||||
|
| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | |
|
||||||
|
| FAISS | Single Node | | | ✅ | | | | | | |
|
||||||
|
| SQLite-Vec | Single Node | | | ✅ | | | | | | |
|
||||||
|
| Qdrant | Hosted/Single Node | | | ✅ | | | | | | |
|
||||||
|
| Weaviate | Hosted | | | ✅ | | | | | | |
|
||||||
|
| Milvus | Hosted/Single Node | | | ✅ | | | | | | |
|
||||||
|
| Prompt Guard | Single Node | | | | ✅ | | | | | |
|
||||||
|
| Llama Guard | Single Node | | | | ✅ | | | | | |
|
||||||
|
| Code Scanner | Single Node | | | | ✅ | | | | | |
|
||||||
|
| Brave Search | Hosted | | | | | | | | | ✅ |
|
||||||
|
| RAG Runtime | Single Node | | | | | | | | | ✅ |
|
||||||
|
|
||||||
## External Providers
|
## External Providers
|
||||||
|
|
||||||
Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently. See the [External Providers Guide](external) for details.
|
Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently. See the [External Providers Guide](external) for details.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue