Merge bb6c96c11c into 40fdce79b3

2025-06-27 18:50:41 +00:00 · 2025-06-27 11:39:51 +02:00 · 2025-06-27 11:39:51 +02:00 · 726ce45161
commit 726ce45161
parent 40fdce79b3 bb6c96c11c
4 changed files with 131 additions and 33 deletions
--- a/README.md
+++ b/README.md
@ -35,6 +35,8 @@ pip install llama-stack-client
 ### CLI
 ```bash
 # Run a chat completion
+MODEL="Llama-4-Scout-17B-16E-Instruct"
+
 llama-stack-client --endpoint http://localhost:8321 \
 inference chat-completion \
 --model-id meta-llama/$MODEL \
@ -107,30 +109,51 @@ By reducing friction and complexity, Llama Stack empowers developers to focus on
 ### API Providers
 Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.

-| **API Provider Builder** |    **Environments**    | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** |
-|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:|
-|      Meta Reference      |      Single Node       |     ✅      |       ✅       |     ✅      |     ✅      |       ✅       |               |
-|        SambaNova         |         Hosted         |            |       ✅       |            |     ✅      |               |                  |
-|         Cerebras         |         Hosted         |            |       ✅       |            |            |               |                  |
-|        Fireworks         |         Hosted         |     ✅      |       ✅       |     ✅      |            |               |                |
-|       AWS Bedrock        |         Hosted         |            |       ✅       |            |     ✅      |               |                |
-|         Together         |         Hosted         |     ✅      |       ✅       |            |     ✅      |               |                |
-|           Groq           |         Hosted         |            |       ✅       |            |            |               |                 |
-|          Ollama          |      Single Node       |            |       ✅       |            |            |               |                 |
-|           TGI            | Hosted and Single Node |            |       ✅       |            |            |               |                 |
-|        NVIDIA NIM        | Hosted and Single Node |            |       ✅       |            |            |               |                 |
-|          Chroma          |      Single Node       |            |               |     ✅      |            |               |                 |
-|        PG Vector         |      Single Node       |            |               |     ✅      |            |               |                 |
-|    PyTorch ExecuTorch    |     On-device iOS      |     ✅      |       ✅       |            |            |               |                |
-|           vLLM           | Hosted and Single Node |            |       ✅       |            |            |               |                 |
-|          OpenAI          |         Hosted         |            |       ✅       |            |            |               |                 |
-|        Anthropic         |         Hosted         |            |       ✅       |            |            |               |                 |
-|          Gemini          |         Hosted         |            |       ✅       |            |            |               |                 |
-|          watsonx         |         Hosted         |            |       ✅       |            |            |               |                 |
-|        HuggingFace       |       Single Node      |            |                |            |            |               |       ✅        |
-|         TorchTune        |       Single Node      |            |                |            |            |               |       ✅        |
-|       NVIDIA NEMO        |         Hosted         |            |                |            |            |               |       ✅        |
+| API Provider Builder    | Environments      | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |Tool Runtime| Scoring |
+|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:----------:|:-------:|
+| Meta Reference         | Single Node        |   ✅   |    ✅     |    ✅    |   ✅   |    ✅     |      ✅      |  ✅  |    ✅     |      ✅    |         |
+| SambaNova              | Hosted             |        |    ✅     |          |   ✅   |           |              |      |           |             |         |
+| Cerebras               | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| Fireworks              | Hosted             |   ✅   |    ✅     |    ✅    |        |           |              |      |           |             |         |
+| AWS Bedrock            | Hosted             |        |    ✅     |          |   ✅   |           |              |      |           |             |         |
+| Together               | Hosted             |   ✅   |    ✅     |          |   ✅   |           |              |      |           |             |         |
+| Groq                   | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| Ollama                 | Single Node        |        |    ✅     |          |        |           |              |      |           |             |         |
+| TGI                    | Hosted/Single Node |        |    ✅     |          |        |           |              |      |           |             |         |
+| NVIDIA NIM             | Hosted/Single Node |        |    ✅     |          |   ✅   |           |              |      |           |             |         |
+| ChromaDB               | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |         |
+| PG Vector              | Single Node        |        |           |    ✅    |        |           |              |      |           |             |         |
+| PyTorch ExecuTorch     | On-device iOS      |   ✅   |    ✅     |          |        |           |              |      |           |             |         |
+| vLLM                   | Single Node        |        |    ✅     |          |        |           |              |      |           |             |         |
+| OpenAI                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| Anthropic              | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| Gemini                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| WatsonX                | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| HuggingFace            | Single Node        |        |           |          |        |           |      ✅      |      |    ✅     |             |         |
+| TorchTune              | Single Node        |        |           |          |        |           |      ✅      |      |           |             |         |
+| NVIDIA NEMO            | Hosted             |        |    ✅     |    ✅    |        |           |      ✅      |  ✅  |    ✅     |             |         |
+| NVIDIA                 | Hosted             |        |           |          |        |           |      ✅      |  ✅  |    ✅     |             |         |
+| FAISS                  | Single Node        |        |           |    ✅    |        |           |              |      |           |             |         |
+| SQLite-Vec             | Single Node        |        |           |    ✅    |        |           |              |      |           |             |         |
+| Qdrant                 | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |         |
+| Weaviate               | Hosted             |        |           |    ✅    |        |           |              |      |           |             |         |
+| Milvus                 | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |         |
+| Prompt Guard           | Single Node        |        |           |          |   ✅   |           |              |      |           |             |         |
+| Llama Guard            | Single Node        |        |           |          |   ✅   |           |              |      |           |             |         |
+| Code Scanner           | Single Node        |        |           |          |   ✅   |           |              |      |           |             |         |
+| Brave Search           | Hosted             |        |           |          |        |           |              |      |           |      ✅     |         |
+| Bing Search            | Hosted             |        |           |          |        |           |              |      |           |      ✅     |         |
+| RAG Runtime            | Single Node        |        |           |          |        |           |              |      |           |      ✅     |         |
+| Model Context Protocol | Hosted             |        |           |          |        |           |              |      |           |      ✅     |         |
+| Sentence Transformers  | Single Node        |        |    ✅     |          |        |           |              |      |           |             |         |
+| Braintrust             | Single Node        |        |           |          |        |           |              |      |           |             |    ✅   |
+| Basic                  | Single Node        |        |           |          |        |           |              |      |           |             |    ✅   |
+| LLM-as-Judge           | Single Node        |        |           |          |        |           |              |      |           |             |    ✅   |
+| Databricks             | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| RunPod                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |
+| Passthrough            | Hosted             |        |    ✅     |          |        |           |              |      |           |             |         |

+> **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation for providers like KubeFlow Training, KubeFlow Pipelines, RamaLama, and TrustyAI LM-Eval.

 ### Distributions

@ -145,7 +168,10 @@ A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider
 |                      TGI                      |                          [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general)                          |             [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html)              |
 |                   Together                    |                     [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general)                     |           [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html)           |
 |                   Fireworks                   |                    [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general)                    |          [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html)           |
+|                   AWS Bedrock                 |                    [llamastack/distribution-bedrock](https://hub.docker.com/repository/docker/llamastack/distribution-bedrock/general)                    |          [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/bedrock.html)           |
 | vLLM |                  [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general)                  |         [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html)          |
+|                   Starter                     |                    [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general)                    |                   |
+|                   PostgreSQL                  |                [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general)                |                  |


 ### Documentation
--- a/docs/source/distributions/configuration.md
+++ b/docs/source/distributions/configuration.md
@ -67,7 +67,6 @@ Let's break this down into the different sections. The first section specifies t
 apis:
 - agents
 - inference
- memory
 - safety
 - telemetry
 ```
@ -77,10 +76,10 @@ Next up is the most critical part: the set of providers that the stack will use
 ```yaml
 providers:
  inference:
-  # provider_id is a string you can choose freely
+    # provider_id is a string you can choose freely
  - provider_id: ollama
    # provider_type is a string that specifies the type of provider.
-    # in this case, the provider for inference is ollama and it is run remotely (outside of the distribution)
+    # in this case, the provider for inference is ollama and it runs remotely (outside of the distribution)
    provider_type: remote::ollama
    # config is a dictionary that contains the configuration for the provider.
    # in this case, the configuration is the url of the ollama server
@ -88,7 +87,7 @@ providers:
      url: ${env.OLLAMA_URL:=http://localhost:11434}
 ```
 A few things to note:
- A _provider instance_ is identified with an (id, type, configuration) triplet.
+- A _provider instance_ is identified with an (id, type, config) triplet.
 - The id is a string you can choose freely.
 - You can instantiate any number of provider instances of the same type.
 - The configuration dictionary is provider-specific.
@ -185,7 +184,7 @@ The environment variable substitution system is type-safe:

 ## Resources

-Finally, let's look at the `models` section:
+Let's look at the `models` section:

 ```yaml
 models:
@ -193,8 +192,9 @@ models:
  model_id: ${env.INFERENCE_MODEL}
  provider_id: ollama
  provider_model_id: null
+  model_type: llm
 ```
-A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to always register models before using them, some Stack servers may come up a list of "already known and available" models.
+A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models.

 What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`.

--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -73,17 +73,26 @@ A number of "adapters" are available for some popular Inference and Vector Store
 |  OpenAI  |  Hosted  |
 |  Anthropic  |  Hosted  |
 |  Gemini  |  Hosted  |
+|  WatsonX  |  Hosted  |

+**Agents API**
+|  **Provider** |  **Environments** |
+| :----: | :----: |
+|  Meta Reference  |  Single Node |
+|  Fireworks  |  Hosted  |
+|  Together  |  Hosted  |
+|  PyTorch ExecuTorch | On-device iOS |

 **Vector IO API**
 |  **Provider** |  **Environments** |
 | :----: | :----: |
 |  FAISS | Single Node |
-|  SQLite-Vec| Single Node |
+|  SQLite-Vec | Single Node |
 |  Chroma | Hosted and Single Node |
 |  Milvus | Hosted and Single Node |
 |  Postgres (PGVector) | Hosted and Single Node |
 |  Weaviate | Hosted |
+|  Qdrant  | Hosted and Single Node |

 **Safety API**
 |  **Provider** |  **Environments** |
@ -93,6 +102,30 @@ A number of "adapters" are available for some popular Inference and Vector Store
 |  Code Scanner | Single Node |
 |  AWS Bedrock | Hosted |

+**Post Training API**
+|  **Provider** |  **Environments** |
+| :----: | :----: |
+|  Meta Reference  |  Single Node |
+|  HuggingFace  |  Single Node |
+|  TorchTune  |  Single Node |
+|  NVIDIA NEMO  |  Hosted |
+
+**Eval API**
+|  **Provider** |  **Environments** |
+| :----: | :----: |
+|  Meta Reference  |  Single Node |
+|  NVIDIA NEMO  |  Hosted |
+
+**Telemetry API**
+|  **Provider** |  **Environments** |
+| :----: | :----: |
+|  Meta Reference  |  Single Node |
+
+**Tool Runtime API**
+|  **Provider** |  **Environments** |
+| :----: | :----: |
+|  Brave Search | Hosted |
+|  RAG Runtime | Single Node |

 ```{toctree}
 :hidden:
--- a/docs/source/providers/index.md
+++ b/docs/source/providers/index.md
@ -1,9 +1,10 @@
 # Providers Overview

 The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.),
- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.),
- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.)
+- LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.),
+- Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.),
+- Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.),
+- Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.)

 Providers come in two flavors:
 - **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
@ -11,6 +12,44 @@ Providers come in two flavors:

 Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.

+## Available Providers
+
+Here is a comprehensive list of all available API providers in Llama Stack:
+
+| API Provider Builder    | Environments      | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO | Tool Runtime |
+|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:------------:|
+| Meta Reference         | Single Node        |   ✅   |    ✅     |    ✅    |   ✅   |    ✅     |      ✅      |  ✅  |    ✅     |      ✅     |
+| SambaNova              | Hosted             |        |    ✅     |          |   ✅   |           |              |      |           |             |
+| Cerebras               | Hosted             |        |    ✅     |          |        |           |              |      |           |             |
+| Fireworks              | Hosted             |   ✅   |    ✅     |    ✅    |        |           |              |      |           |             |
+| AWS Bedrock            | Hosted             |        |    ✅     |          |   ✅   |           |              |      |           |             |
+| Together               | Hosted             |   ✅   |    ✅     |          |   ✅   |           |              |      |           |             |
+| Groq                   | Hosted             |        |    ✅     |          |        |           |              |      |           |             |
+| Ollama                 | Single Node        |        |    ✅     |          |        |           |              |      |           |             |
+| TGI                    | Hosted/Single Node |        |    ✅     |          |        |           |              |      |           |             |
+| NVIDIA NIM             | Hosted/Single Node |        |    ✅     |          |        |           |              |      |           |             |
+| Chroma                 | Single Node        |        |           |    ✅    |        |           |              |      |           |             |
+| PG Vector              | Single Node        |        |           |    ✅    |        |           |              |      |           |             |
+| PyTorch ExecuTorch     | On-device iOS      |   ✅   |    ✅     |          |        |           |              |      |           |             |
+| vLLM                   | Single Node        |        |    ✅     |          |        |           |              |      |           |             |
+| OpenAI                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |
+| Anthropic              | Hosted             |        |    ✅     |          |        |           |              |      |           |             |
+| Gemini                 | Hosted             |        |    ✅     |          |        |           |              |      |           |             |
+| WatsonX                | Hosted             |        |    ✅     |          |        |           |              |      |           |             |
+| HuggingFace            | Single Node        |        |           |          |        |           |      ✅      |      |    ✅     |             |
+| TorchTune              | Single Node        |        |           |          |        |           |      ✅      |      |           |             |
+| NVIDIA NEMO            | Hosted             |        |    ✅     |    ✅    |        |           |      ✅      |  ✅  |    ✅     |             |
+| FAISS                  | Single Node        |        |           |    ✅    |        |           |              |      |           |             |
+| SQLite-Vec             | Single Node        |        |           |    ✅    |        |           |              |      |           |             |
+| Qdrant                 | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |
+| Weaviate               | Hosted             |        |           |    ✅    |        |           |              |      |           |             |
+| Milvus                 | Hosted/Single Node |        |           |    ✅    |        |           |              |      |           |             |
+| Prompt Guard           | Single Node        |        |           |          |   ✅   |           |              |      |           |             |
+| Llama Guard            | Single Node        |        |           |          |   ✅   |           |              |      |           |             |
+| Code Scanner           | Single Node        |        |           |          |   ✅   |           |              |      |           |             |
+| Brave Search           | Hosted             |        |           |          |        |           |              |      |           |      ✅     |
+| RAG Runtime            | Single Node        |        |           |          |        |           |              |      |           |      ✅     |
+
 ## External Providers

 Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently. See the [External Providers Guide](external) for details.