From 040424acf58094b4227170cef853d768d32c62df Mon Sep 17 00:00:00 2001 From: Wen Zhou Date: Thu, 3 Jul 2025 10:12:56 +0200 Subject: [PATCH] docs: update full list of providers with matched APIs and dockerhub images (#2452) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? - add model_type in example - change "Memory" to "VectorIO" as column name - update index.md and README.md ## Test Plan run pre-commit to catch changes. --------- Signed-off-by: Wen Zhou Co-authored-by: Sébastien Han --- README.md | 71 +++++++++++++--------- docs/source/distributions/configuration.md | 11 ++-- docs/source/index.md | 35 ++++++++++- docs/source/providers/index.md | 7 ++- 4 files changed, 87 insertions(+), 37 deletions(-) diff --git a/README.md b/README.md index 7f34c3340..3b5358ec2 100644 --- a/README.md +++ b/README.md @@ -35,6 +35,8 @@ pip install llama-stack-client ### CLI ```bash # Run a chat completion +MODEL="Llama-4-Scout-17B-16E-Instruct" + llama-stack-client --endpoint http://localhost:8321 \ inference chat-completion \ --model-id meta-llama/$MODEL \ @@ -106,46 +108,59 @@ By reducing friction and complexity, Llama Stack empowers developers to focus on ### API Providers Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. +Please checkout for [full list](https://llama-stack.readthedocs.io/en/latest/providers/index.html) -| **API Provider Builder** | **Environments** | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** | -|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:| -| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | | -| SambaNova | Hosted | | ✅ | | ✅ | | | -| Cerebras | Hosted | | ✅ | | | | | -| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | -| AWS Bedrock | Hosted | | ✅ | | ✅ | | | -| Together | Hosted | ✅ | ✅ | | ✅ | | | -| Groq | Hosted | | ✅ | | | | | -| Ollama | Single Node | | ✅ | | | | | -| TGI | Hosted and Single Node | | ✅ | | | | | -| NVIDIA NIM | Hosted and Single Node | | ✅ | | | | | -| Chroma | Single Node | | | ✅ | | | | -| PG Vector | Single Node | | | ✅ | | | | -| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | -| vLLM | Hosted and Single Node | | ✅ | | | | | -| OpenAI | Hosted | | ✅ | | | | | -| Anthropic | Hosted | | ✅ | | | | | -| Gemini | Hosted | | ✅ | | | | | -| watsonx | Hosted | | ✅ | | | | | -| HuggingFace | Single Node | | | | | | ✅ | -| TorchTune | Single Node | | | | | | ✅ | -| NVIDIA NEMO | Hosted | | | | | | ✅ | +| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO | +|:-------------------:|:------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:--------:| +| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| SambaNova | Hosted | | ✅ | | ✅ | | | | | +| Cerebras | Hosted | | ✅ | | | | | | | +| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | +| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | +| Together | Hosted | ✅ | ✅ | | ✅ | | | | | +| Groq | Hosted | | ✅ | | | | | | | +| Ollama | Single Node | | ✅ | | | | | | | +| TGI | Hosted/Single Node | | ✅ | | | | | | | +| NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | | +| ChromaDB | Hosted/Single Node | | | ✅ | | | | | | +| PG Vector | Single Node | | | ✅ | | | | | | +| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | +| vLLM | Single Node | | ✅ | | | | | | | +| OpenAI | Hosted | | ✅ | | | | | | | +| Anthropic | Hosted | | ✅ | | | | | | | +| Gemini | Hosted | | ✅ | | | | | | | +| WatsonX | Hosted | | ✅ | | | | | | | +| HuggingFace | Single Node | | | | | | ✅ | | ✅ | +| TorchTune | Single Node | | | | | | ✅ | | | +| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | +| NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ | +> **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation. ### Distributions -A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support: +A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. +Here are some of the distributions we support: | **Distribution** | **Llama Stack Docker** | Start This Distribution | |:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:| | Meta Reference | [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html) | -| SambaNova | [llamastack/distribution-sambanova](https://hub.docker.com/repository/docker/llamastack/distribution-sambanova/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/sambanova.html) | -| Cerebras | [llamastack/distribution-cerebras](https://hub.docker.com/repository/docker/llamastack/distribution-cerebras/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/cerebras.html) | +| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) +| vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) +| Starter | [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general) | | +| PostgreSQL | [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general) | | + + +Here are the ones out of support scope but still avaiable from Dockerhub: + +| **Distribution** | **Llama Stack Docker** | Start This Distribution | +|:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:| | Ollama | [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/ollama.html) | -| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) | | Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) | | Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) | -| vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) | +| AWS Bedrock | [llamastack/distribution-bedrock](https://hub.docker.com/repository/docker/llamastack/distribution-bedrock/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/bedrock.html) | +| SambaNova | [llamastack/distribution-sambanova](https://hub.docker.com/repository/docker/llamastack/distribution-sambanova/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/sambanova.html) | +| Cerebras | [llamastack/distribution-cerebras](https://hub.docker.com/repository/docker/llamastack/distribution-cerebras/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/cerebras.html) | | | | ### Documentation diff --git a/docs/source/distributions/configuration.md b/docs/source/distributions/configuration.md index 0a0ce994f..1bba6677e 100644 --- a/docs/source/distributions/configuration.md +++ b/docs/source/distributions/configuration.md @@ -77,10 +77,10 @@ Next up is the most critical part: the set of providers that the stack will use ```yaml providers: inference: - # provider_id is a string you can choose freely + # provider_id is a string you can choose freely - provider_id: ollama # provider_type is a string that specifies the type of provider. - # in this case, the provider for inference is ollama and it is run remotely (outside of the distribution) + # in this case, the provider for inference is ollama and it runs remotely (outside of the distribution) provider_type: remote::ollama # config is a dictionary that contains the configuration for the provider. # in this case, the configuration is the url of the ollama server @@ -88,7 +88,7 @@ providers: url: ${env.OLLAMA_URL:=http://localhost:11434} ``` A few things to note: -- A _provider instance_ is identified with an (id, type, configuration) triplet. +- A _provider instance_ is identified with an (id, type, config) triplet. - The id is a string you can choose freely. - You can instantiate any number of provider instances of the same type. - The configuration dictionary is provider-specific. @@ -187,7 +187,7 @@ The environment variable substitution system is type-safe: ## Resources -Finally, let's look at the `models` section: +Let's look at the `models` section: ```yaml models: @@ -195,8 +195,9 @@ models: model_id: ${env.INFERENCE_MODEL} provider_id: ollama provider_model_id: null + model_type: llm ``` -A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to always register models before using them, some Stack servers may come up a list of "already known and available" models. +A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models. What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`. diff --git a/docs/source/index.md b/docs/source/index.md index 1df5e8507..755b228e3 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -73,17 +73,26 @@ A number of "adapters" are available for some popular Inference and Vector Store | OpenAI | Hosted | | Anthropic | Hosted | | Gemini | Hosted | +| WatsonX | Hosted | +**Agents API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | +| Fireworks | Hosted | +| Together | Hosted | +| PyTorch ExecuTorch | On-device iOS | **Vector IO API** | **Provider** | **Environments** | | :----: | :----: | | FAISS | Single Node | -| SQLite-Vec| Single Node | +| SQLite-Vec | Single Node | | Chroma | Hosted and Single Node | | Milvus | Hosted and Single Node | | Postgres (PGVector) | Hosted and Single Node | | Weaviate | Hosted | +| Qdrant | Hosted and Single Node | **Safety API** | **Provider** | **Environments** | @@ -93,6 +102,30 @@ A number of "adapters" are available for some popular Inference and Vector Store | Code Scanner | Single Node | | AWS Bedrock | Hosted | +**Post Training API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | +| HuggingFace | Single Node | +| TorchTune | Single Node | +| NVIDIA NEMO | Hosted | + +**Eval API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | +| NVIDIA NEMO | Hosted | + +**Telemetry API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | + +**Tool Runtime API** +| **Provider** | **Environments** | +| :----: | :----: | +| Brave Search | Hosted | +| RAG Runtime | Single Node | ```{toctree} :hidden: diff --git a/docs/source/providers/index.md b/docs/source/providers/index.md index 3ea253685..f804582d7 100644 --- a/docs/source/providers/index.md +++ b/docs/source/providers/index.md @@ -1,9 +1,10 @@ # Providers Overview The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include: -- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.), -- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.), -- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.) +- LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.), +- Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.), +- Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.), +- Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.) Providers come in two flavors: - **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.