mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-07 06:20:45 +00:00
docs: update full list of providers with matched APIs and dockerhub images (#2452)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - add model_type in example - change "Memory" to "VectorIO" as column name - update index.md and README.md <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> run pre-commit to catch changes. --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
parent
5b07755556
commit
040424acf5
4 changed files with 87 additions and 37 deletions
|
@ -77,10 +77,10 @@ Next up is the most critical part: the set of providers that the stack will use
|
|||
```yaml
|
||||
providers:
|
||||
inference:
|
||||
# provider_id is a string you can choose freely
|
||||
# provider_id is a string you can choose freely
|
||||
- provider_id: ollama
|
||||
# provider_type is a string that specifies the type of provider.
|
||||
# in this case, the provider for inference is ollama and it is run remotely (outside of the distribution)
|
||||
# in this case, the provider for inference is ollama and it runs remotely (outside of the distribution)
|
||||
provider_type: remote::ollama
|
||||
# config is a dictionary that contains the configuration for the provider.
|
||||
# in this case, the configuration is the url of the ollama server
|
||||
|
@ -88,7 +88,7 @@ providers:
|
|||
url: ${env.OLLAMA_URL:=http://localhost:11434}
|
||||
```
|
||||
A few things to note:
|
||||
- A _provider instance_ is identified with an (id, type, configuration) triplet.
|
||||
- A _provider instance_ is identified with an (id, type, config) triplet.
|
||||
- The id is a string you can choose freely.
|
||||
- You can instantiate any number of provider instances of the same type.
|
||||
- The configuration dictionary is provider-specific.
|
||||
|
@ -187,7 +187,7 @@ The environment variable substitution system is type-safe:
|
|||
|
||||
## Resources
|
||||
|
||||
Finally, let's look at the `models` section:
|
||||
Let's look at the `models` section:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
|
@ -195,8 +195,9 @@ models:
|
|||
model_id: ${env.INFERENCE_MODEL}
|
||||
provider_id: ollama
|
||||
provider_model_id: null
|
||||
model_type: llm
|
||||
```
|
||||
A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to always register models before using them, some Stack servers may come up a list of "already known and available" models.
|
||||
A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models.
|
||||
|
||||
What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`.
|
||||
|
||||
|
|
|
@ -73,17 +73,26 @@ A number of "adapters" are available for some popular Inference and Vector Store
|
|||
| OpenAI | Hosted |
|
||||
| Anthropic | Hosted |
|
||||
| Gemini | Hosted |
|
||||
| WatsonX | Hosted |
|
||||
|
||||
**Agents API**
|
||||
| **Provider** | **Environments** |
|
||||
| :----: | :----: |
|
||||
| Meta Reference | Single Node |
|
||||
| Fireworks | Hosted |
|
||||
| Together | Hosted |
|
||||
| PyTorch ExecuTorch | On-device iOS |
|
||||
|
||||
**Vector IO API**
|
||||
| **Provider** | **Environments** |
|
||||
| :----: | :----: |
|
||||
| FAISS | Single Node |
|
||||
| SQLite-Vec| Single Node |
|
||||
| SQLite-Vec | Single Node |
|
||||
| Chroma | Hosted and Single Node |
|
||||
| Milvus | Hosted and Single Node |
|
||||
| Postgres (PGVector) | Hosted and Single Node |
|
||||
| Weaviate | Hosted |
|
||||
| Qdrant | Hosted and Single Node |
|
||||
|
||||
**Safety API**
|
||||
| **Provider** | **Environments** |
|
||||
|
@ -93,6 +102,30 @@ A number of "adapters" are available for some popular Inference and Vector Store
|
|||
| Code Scanner | Single Node |
|
||||
| AWS Bedrock | Hosted |
|
||||
|
||||
**Post Training API**
|
||||
| **Provider** | **Environments** |
|
||||
| :----: | :----: |
|
||||
| Meta Reference | Single Node |
|
||||
| HuggingFace | Single Node |
|
||||
| TorchTune | Single Node |
|
||||
| NVIDIA NEMO | Hosted |
|
||||
|
||||
**Eval API**
|
||||
| **Provider** | **Environments** |
|
||||
| :----: | :----: |
|
||||
| Meta Reference | Single Node |
|
||||
| NVIDIA NEMO | Hosted |
|
||||
|
||||
**Telemetry API**
|
||||
| **Provider** | **Environments** |
|
||||
| :----: | :----: |
|
||||
| Meta Reference | Single Node |
|
||||
|
||||
**Tool Runtime API**
|
||||
| **Provider** | **Environments** |
|
||||
| :----: | :----: |
|
||||
| Brave Search | Hosted |
|
||||
| RAG Runtime | Single Node |
|
||||
|
||||
```{toctree}
|
||||
:hidden:
|
||||
|
|
|
@ -1,9 +1,10 @@
|
|||
# Providers Overview
|
||||
|
||||
The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
|
||||
- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.),
|
||||
- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.),
|
||||
- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.)
|
||||
- LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.),
|
||||
- Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.),
|
||||
- Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.),
|
||||
- Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.)
|
||||
|
||||
Providers come in two flavors:
|
||||
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue