mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-28 02:53:30 +00:00

wip

Signed-off-by: Sébastien Han <seb@redhat.com>

2025-06-27 10:15:45 +02:00

5.9 KiB

Raw Blame History

orphan
true

Starter Distribution

:maxdepth: 2
:hidden:

self

The llamastack/distribution-starter distribution is a comprehensive, multi-provider distribution that includes most of the available inference providers in Llama Stack. It's designed to be a one-stop solution for developers who want to experiment with different AI providers without having to configure each one individually.

Provider Composition

The starter distribution consists of the following configurations:

API	Provider(s)
agents	`inline::meta-reference`
datasetio	`remote::huggingface`, `inline::localfs`
eval	`inline::meta-reference`
files	`inline::localfs`
inference	`remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::tgi`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `remote::hf::serverless`, `remote::hf::endpoint`, `inline::sentence-transformers`, `remote::passthrough`
safety	`inline::llama-guard`
post_training	`inline::huggingface`
scoring	`inline::basic`, `inline::llm-as-judge`, `inline::braintrust`
telemetry	`inline::meta-reference`
tool_runtime	`remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol`
vector_io	`inline::faiss`, `inline::sqlite-vec`, `remote::chromadb`, `remote::pgvector`

Inference Providers

The starter distribution includes a comprehensive set of inference providers:

OpenAI: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings - point to the relevant provider configuration documentation for more details
Fireworks: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings
Together: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings
Anthropic: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings
Gemini: Gemini 1.5, 2.0, 2.5 models and text embeddings
Groq: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick)
SambaNova: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models
Cerebras: Cerebras AI models
NVIDIA: NVIDIA NIM models
HuggingFace: Serverless and endpoint models
Bedrock: AWS Bedrock models
Passthrough: Passthrough provider - use this to connect to any other inference provider that is not supported by Llama Stack
Ollama: Local Ollama models
vLLM: remote vLLM server
TGI: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use DEH_URL)
Sentence Transformers: Local embedding models

All providers are disabled by default. So you need to enable them by setting the environment variables. See Enabling Providers for more details.

Vector Providers

The starter distribution includes a comprehensive set of vector providers:

FAISS: Local FAISS vector store - enabled by default
SQLite: Local SQLite vector store - disabled by default
ChromaDB: Remote ChromaDB server - disabled by default
PGVector: Remote PGVector server - disabled by default

Enabling Providers

You can enable specific providers by setting their provider ID to a string value using environment variables.

For instance, to enable the Ollama provider, you can set the ENABLE_OLLAMA environment variable to ollama.

export ENABLE_OLLAMA=ollama

To disable a provider, you can set the environment variable to ENABLE_OLLAMA=__disabled__.

Running the Distribution

You can run the starter distribution via Docker or directly using the Llama Stack CLI.

Via Docker

This method allows you to get started quickly without having to build the distribution code.

LLAMA_STACK_PORT=8321
docker run \
  -it \
  --pull always \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -e ENABLE_OLLAMA=ollama \
  -e OLLAMA_INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
  llamastack/distribution-starter \
  --port $LLAMA_STACK_PORT

You can also use the llama stack run command to run the distribution.

llama stack run distributions/starter/run.yaml \
  --port 8321 \
  --env ENABLE_OLLAMA=ollama \
  --env OLLAMA_INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct

Storage

The starter distribution uses SQLite for local storage of various components:

Metadata store: ~/.llama/distributions/starter/registry.db
Inference store: ~/.llama/distributions/starter/inference_store.db
FAISS store: ~/.llama/distributions/starter/faiss_store.db
SQLite vector store: ~/.llama/distributions/starter/sqlite_vec.db
Files metadata: ~/.llama/distributions/starter/files_metadata.db
Agents store: ~/.llama/distributions/starter/agents_store.db
Responses store: ~/.llama/distributions/starter/responses_store.db
Trace store: ~/.llama/distributions/starter/trace_store.db
Evaluation store: ~/.llama/distributions/starter/meta_reference_eval.db
Dataset I/O stores: Various HuggingFace and local filesystem stores

Benefits of the Starter Distribution

Comprehensive Coverage: Includes most popular AI providers in one distribution
Flexible Configuration: Easy to enable/disable providers based on your needs
No Local GPU Required: Most providers are cloud-based, making it accessible to developers without high-end hardware
Easy Migration: Start with hosted providers and gradually move to local ones as needed
Production Ready: Includes safety, evaluation, and telemetry components
Tool Integration: Comes with web search, RAG, and model context protocol tools

The starter distribution is ideal for developers who want to experiment with different AI providers, build prototypes quickly, or create applications that can work with multiple AI backends.

5.9 KiB Raw Blame History