diff --git a/README.md b/README.md index 7f34c3340..2772795f7 100644 --- a/README.md +++ b/README.md @@ -35,6 +35,8 @@ pip install llama-stack-client ### CLI ```bash # Run a chat completion +MODEL="Llama-4-Scout-17B-16E-Instruct" + llama-stack-client --endpoint http://localhost:8321 \ inference chat-completion \ --model-id meta-llama/$MODEL \ @@ -107,30 +109,51 @@ By reducing friction and complexity, Llama Stack empowers developers to focus on ### API Providers Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. -| **API Provider Builder** | **Environments** | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** | -|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:| -| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | | -| SambaNova | Hosted | | ✅ | | ✅ | | | -| Cerebras | Hosted | | ✅ | | | | | -| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | -| AWS Bedrock | Hosted | | ✅ | | ✅ | | | -| Together | Hosted | ✅ | ✅ | | ✅ | | | -| Groq | Hosted | | ✅ | | | | | -| Ollama | Single Node | | ✅ | | | | | -| TGI | Hosted and Single Node | | ✅ | | | | | -| NVIDIA NIM | Hosted and Single Node | | ✅ | | | | | -| Chroma | Single Node | | | ✅ | | | | -| PG Vector | Single Node | | | ✅ | | | | -| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | -| vLLM | Hosted and Single Node | | ✅ | | | | | -| OpenAI | Hosted | | ✅ | | | | | -| Anthropic | Hosted | | ✅ | | | | | -| Gemini | Hosted | | ✅ | | | | | -| watsonx | Hosted | | ✅ | | | | | -| HuggingFace | Single Node | | | | | | ✅ | -| TorchTune | Single Node | | | | | | ✅ | -| NVIDIA NEMO | Hosted | | | | | | ✅ | +| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |Tool Runtime| Scoring | +|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:----------:|:-------:| +| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | +| SambaNova | Hosted | | ✅ | | ✅ | | | | | | | +| Cerebras | Hosted | | ✅ | | | | | | | | | +| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | | | +| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | | | +| Together | Hosted | ✅ | ✅ | | ✅ | | | | | | | +| Groq | Hosted | | ✅ | | | | | | | | | +| Ollama | Single Node | | ✅ | | | | | | | | | +| TGI | Hosted/Single Node | | ✅ | | | | | | | | | +| NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | | | | +| ChromaDB | Hosted/Single Node | | | ✅ | | | | | | | | +| PG Vector | Single Node | | | ✅ | | | | | | | | +| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | | | +| vLLM | Single Node | | ✅ | | | | | | | | | +| OpenAI | Hosted | | ✅ | | | | | | | | | +| Anthropic | Hosted | | ✅ | | | | | | | | | +| Gemini | Hosted | | ✅ | | | | | | | | | +| WatsonX | Hosted | | ✅ | | | | | | | | | +| HuggingFace | Single Node | | | | | | ✅ | | ✅ | | | +| TorchTune | Single Node | | | | | | ✅ | | | | | +| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | | | +| NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ | | | +| FAISS | Single Node | | | ✅ | | | | | | | | +| SQLite-Vec | Single Node | | | ✅ | | | | | | | | +| Qdrant | Hosted/Single Node | | | ✅ | | | | | | | | +| Weaviate | Hosted | | | ✅ | | | | | | | | +| Milvus | Hosted/Single Node | | | ✅ | | | | | | | | +| Prompt Guard | Single Node | | | | ✅ | | | | | | | +| Llama Guard | Single Node | | | | ✅ | | | | | | | +| Code Scanner | Single Node | | | | ✅ | | | | | | | +| Brave Search | Hosted | | | | | | | | | ✅ | | +| Bing Search | Hosted | | | | | | | | | ✅ | | +| RAG Runtime | Single Node | | | | | | | | | ✅ | | +| Model Context Protocol | Hosted | | | | | | | | | ✅ | | +| Sentence Transformers | Single Node | | ✅ | | | | | | | | | +| Braintrust | Single Node | | | | | | | | | | ✅ | +| Basic | Single Node | | | | | | | | | | ✅ | +| LLM-as-Judge | Single Node | | | | | | | | | | ✅ | +| Databricks | Hosted | | ✅ | | | | | | | | | +| RunPod | Hosted | | ✅ | | | | | | | | | +| Passthrough | Hosted | | ✅ | | | | | | | | | +> **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation for providers like KubeFlow Training, KubeFlow Pipelines, RamaLama, and TrustyAI LM-Eval. ### Distributions @@ -145,7 +168,10 @@ A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider | TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) | | Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) | | Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) | +| AWS Bedrock | [llamastack/distribution-bedrock](https://hub.docker.com/repository/docker/llamastack/distribution-bedrock/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/bedrock.html) | | vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) | +| Starter | [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general) | | +| PostgreSQL | [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general) | | ### Documentation diff --git a/docs/source/distributions/configuration.md b/docs/source/distributions/configuration.md index dd73d93ea..0f90d24f7 100644 --- a/docs/source/distributions/configuration.md +++ b/docs/source/distributions/configuration.md @@ -67,7 +67,6 @@ Let's break this down into the different sections. The first section specifies t apis: - agents - inference -- memory - safety - telemetry ``` @@ -77,10 +76,10 @@ Next up is the most critical part: the set of providers that the stack will use ```yaml providers: inference: - # provider_id is a string you can choose freely + # provider_id is a string you can choose freely - provider_id: ollama # provider_type is a string that specifies the type of provider. - # in this case, the provider for inference is ollama and it is run remotely (outside of the distribution) + # in this case, the provider for inference is ollama and it runs remotely (outside of the distribution) provider_type: remote::ollama # config is a dictionary that contains the configuration for the provider. # in this case, the configuration is the url of the ollama server @@ -88,7 +87,7 @@ providers: url: ${env.OLLAMA_URL:http://localhost:11434} ``` A few things to note: -- A _provider instance_ is identified with an (id, type, configuration) triplet. +- A _provider instance_ is identified with an (id, type, config) triplet. - The id is a string you can choose freely. - You can instantiate any number of provider instances of the same type. - The configuration dictionary is provider-specific. @@ -96,7 +95,7 @@ A few things to note: ## Resources -Finally, let's look at the `models` section: +Let's look at the `models` section: ```yaml models: @@ -104,8 +103,9 @@ models: model_id: ${env.INFERENCE_MODEL} provider_id: ollama provider_model_id: null + model_type: llm ``` -A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to always register models before using them, some Stack servers may come up a list of "already known and available" models. +A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models. What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`. diff --git a/docs/source/index.md b/docs/source/index.md index 1df5e8507..755b228e3 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -73,17 +73,26 @@ A number of "adapters" are available for some popular Inference and Vector Store | OpenAI | Hosted | | Anthropic | Hosted | | Gemini | Hosted | +| WatsonX | Hosted | +**Agents API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | +| Fireworks | Hosted | +| Together | Hosted | +| PyTorch ExecuTorch | On-device iOS | **Vector IO API** | **Provider** | **Environments** | | :----: | :----: | | FAISS | Single Node | -| SQLite-Vec| Single Node | +| SQLite-Vec | Single Node | | Chroma | Hosted and Single Node | | Milvus | Hosted and Single Node | | Postgres (PGVector) | Hosted and Single Node | | Weaviate | Hosted | +| Qdrant | Hosted and Single Node | **Safety API** | **Provider** | **Environments** | @@ -93,6 +102,30 @@ A number of "adapters" are available for some popular Inference and Vector Store | Code Scanner | Single Node | | AWS Bedrock | Hosted | +**Post Training API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | +| HuggingFace | Single Node | +| TorchTune | Single Node | +| NVIDIA NEMO | Hosted | + +**Eval API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | +| NVIDIA NEMO | Hosted | + +**Telemetry API** +| **Provider** | **Environments** | +| :----: | :----: | +| Meta Reference | Single Node | + +**Tool Runtime API** +| **Provider** | **Environments** | +| :----: | :----: | +| Brave Search | Hosted | +| RAG Runtime | Single Node | ```{toctree} :hidden: diff --git a/docs/source/providers/index.md b/docs/source/providers/index.md index 1f5026479..b91a76913 100644 --- a/docs/source/providers/index.md +++ b/docs/source/providers/index.md @@ -1,9 +1,10 @@ # Providers Overview The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include: -- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.), -- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.), -- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.) +- LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.), +- Vector databases (e.g., FAISS, SQLite-Vec, ChromaDB, Weaviate, Qdrant, Milvus, PGVector, etc.), +- Safety providers (e.g., Meta's Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock Guardrails, etc.), +- Tool Runtime providers (e.g., RAG Runtime, Brave Search, etc.) Providers come in two flavors: - **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code. @@ -11,6 +12,44 @@ Providers come in two flavors: Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally. +## Available Providers + +Here is a comprehensive list of all available API providers in Llama Stack: + +| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO | Tool Runtime | +|:----------------------:|:------------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:---------:|:------------:| +| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| SambaNova | Hosted | | ✅ | | ✅ | | | | | | +| Cerebras | Hosted | | ✅ | | | | | | | | +| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | | +| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | | +| Together | Hosted | ✅ | ✅ | | ✅ | | | | | | +| Groq | Hosted | | ✅ | | | | | | | | +| Ollama | Single Node | | ✅ | | | | | | | | +| TGI | Hosted/Single Node | | ✅ | | | | | | | | +| NVIDIA NIM | Hosted/Single Node | | ✅ | | | | | | | | +| Chroma | Single Node | | | ✅ | | | | | | | +| PG Vector | Single Node | | | ✅ | | | | | | | +| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | | +| vLLM | Single Node | | ✅ | | | | | | | | +| OpenAI | Hosted | | ✅ | | | | | | | | +| Anthropic | Hosted | | ✅ | | | | | | | | +| Gemini | Hosted | | ✅ | | | | | | | | +| WatsonX | Hosted | | ✅ | | | | | | | | +| HuggingFace | Single Node | | | | | | ✅ | | ✅ | | +| TorchTune | Single Node | | | | | | ✅ | | | | +| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | | +| FAISS | Single Node | | | ✅ | | | | | | | +| SQLite-Vec | Single Node | | | ✅ | | | | | | | +| Qdrant | Hosted/Single Node | | | ✅ | | | | | | | +| Weaviate | Hosted | | | ✅ | | | | | | | +| Milvus | Hosted/Single Node | | | ✅ | | | | | | | +| Prompt Guard | Single Node | | | | ✅ | | | | | | +| Llama Guard | Single Node | | | | ✅ | | | | | | +| Code Scanner | Single Node | | | | ✅ | | | | | | +| Brave Search | Hosted | | | | | | | | | ✅ | +| RAG Runtime | Single Node | | | | | | | | | ✅ | + ## External Providers Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently. See the [External Providers Guide](external) for details.