* Removes a bunch of distros * Removed distros were added into the "starter" distribution * Doc for "starter" has been added * Partially reverts https://github.com/meta-llama/llama-stack/pull/2482 since inference providers are disabled by default and can be turned on manually via env variable. * Disables safety in starter distro Closes: #2502 Signed-off-by: Sébastien Han <seb@redhat.com>
9.3 KiB
orphan |
---|
true |
Starter Distribution
:maxdepth: 2
:hidden:
self
The llamastack/distribution-starter
distribution is a comprehensive, multi-provider distribution that includes most of the available inference providers in Llama Stack. It's designed to be a one-stop solution for developers who want to experiment with different AI providers without having to configure each one individually.
Provider Composition
The starter distribution consists of the following provider configurations:
API | Provider(s) |
---|---|
agents | inline::meta-reference |
datasetio | remote::huggingface , inline::localfs |
eval | inline::meta-reference |
files | inline::localfs |
inference | remote::openai , remote::fireworks , remote::together , remote::ollama , remote::anthropic , remote::gemini , remote::groq , remote::sambanova , remote::vllm , remote::tgi , remote::cerebras , remote::llama-openai-compat , remote::nvidia , remote::hf::serverless , remote::hf::endpoint , inline::sentence-transformers |
safety | inline::llama-guard |
scoring | inline::basic , inline::llm-as-judge , inline::braintrust |
telemetry | inline::meta-reference |
tool_runtime | remote::brave-search , remote::tavily-search , inline::rag-runtime , remote::model-context-protocol |
vector_io | inline::faiss , inline::sqlite-vec , remote::chromadb , remote::pgvector |
Inference Providers
The starter distribution includes a comprehensive set of inference providers:
Hosted Providers
- OpenAI: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings
- Fireworks: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings
- Together: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings
- Anthropic: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings
- Gemini: Gemini 1.5, 2.0, 2.5 models and text embeddings
- Groq: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick)
- SambaNova: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models
- Cerebras: Cerebras AI models
- NVIDIA: NVIDIA NIM models
- HuggingFace: Serverless and endpoint models
- Bedrock: AWS Bedrock models
Local/Remote Providers
- Ollama: Local Ollama models
- vLLM: Local or remote vLLM server
- TGI: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use
DEH_URL
) - Sentence Transformers: Local embedding models
All providers are disabled by default. So you need to enable them by setting the environment variables.
Environment Variables
The following environment variables can be configured:
Server Configuration
LLAMA_STACK_PORT
: Port for the Llama Stack distribution server (default:8321
)
API Keys for Hosted Providers
OPENAI_API_KEY
: OpenAI API keyFIREWORKS_API_KEY
: Fireworks API keyTOGETHER_API_KEY
: Together API keyANTHROPIC_API_KEY
: Anthropic API keyGEMINI_API_KEY
: Google Gemini API keyGROQ_API_KEY
: Groq API keySAMBANOVA_API_KEY
: SambaNova API keyCEREBRAS_API_KEY
: Cerebras API keyLLAMA_API_KEY
: Llama API keyNVIDIA_API_KEY
: NVIDIA API keyHF_API_TOKEN
: HuggingFace API token
Local Provider Configuration
OLLAMA_URL
: Ollama server URL (default:http://localhost:11434
)VLLM_URL
: vLLM server URL (default:http://localhost:8000/v1
)VLLM_MAX_TOKENS
: vLLM max tokens (default:4096
)VLLM_API_TOKEN
: vLLM API token (default:fake
)VLLM_TLS_VERIFY
: vLLM TLS verification (default:true
)TGI_URL
: TGI server URL
Model Configuration
INFERENCE_MODEL
: HuggingFace model for serverless inferenceINFERENCE_ENDPOINT_NAME
: HuggingFace endpoint nameOLLAMA_INFERENCE_MODEL
: Ollama model nameOLLAMA_EMBEDDING_MODEL
: Ollama embedding model nameOLLAMA_EMBEDDING_DIMENSION
: Ollama embedding dimension (default:384
)VLLM_INFERENCE_MODEL
: vLLM model name
Vector Database Configuration
SQLITE_STORE_DIR
: SQLite store directory (default:~/.llama/distributions/starter
)ENABLE_SQLITE_VEC
: Enable SQLite vector providerENABLE_CHROMADB
: Enable ChromaDB providerENABLE_PGVECTOR
: Enable PGVector providerCHROMADB_URL
: ChromaDB server URLPGVECTOR_HOST
: PGVector host (default:localhost
)PGVECTOR_PORT
: PGVector port (default:5432
)PGVECTOR_DB
: PGVector database namePGVECTOR_USER
: PGVector usernamePGVECTOR_PASSWORD
: PGVector password
Tool Configuration
BRAVE_SEARCH_API_KEY
: Brave Search API keyTAVILY_SEARCH_API_KEY
: Tavily Search API key
Telemetry Configuration
OTEL_SERVICE_NAME
: OpenTelemetry service nameTELEMETRY_SINKS
: Telemetry sinks (default:console,sqlite
)
Enabling Providers
You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys.
Examples of Enabling Providers
Enable FAISS Vector Provider
export ENABLE_FAISS=faiss
Enable Ollama Models
export ENABLE_OLLAMA=ollama
Disable vLLM Models
export VLLM_INFERENCE_MODEL=__disabled__
Disable Optional Vector Providers
export ENABLE_SQLITE_VEC=__disabled__
export ENABLE_CHROMADB=__disabled__
export ENABLE_PGVECTOR=__disabled__
Provider ID Patterns
The starter distribution uses several patterns for provider IDs:
- Direct provider IDs:
faiss
,ollama
,vllm
- Environment-based provider IDs:
${env.ENABLE_SQLITE_VEC+sqlite-vec}
- Model-based provider IDs:
${env.OLLAMA_INFERENCE_MODEL:__disabled__}
When using the +
pattern (like ${env.ENABLE_SQLITE_VEC+sqlite-vec}
), the provider is enabled by default and can be disabled by setting the environment variable to __disabled__
.
When using the :
pattern (like ${env.OLLAMA_INFERENCE_MODEL:__disabled__}
), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
Running the Distribution
You can run the starter distribution via Docker or Conda.
Via Docker
This method allows you to get started quickly without having to build the distribution code.
LLAMA_STACK_PORT=8321
docker run \
-it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-e OPENAI_API_KEY=your_openai_key \
-e FIREWORKS_API_KEY=your_fireworks_key \
-e TOGETHER_API_KEY=your_together_key \
llamastack/distribution-starter \
--port $LLAMA_STACK_PORT
Via Conda
Make sure you have done uv pip install llama-stack
and have the Llama Stack CLI available.
llama stack build --template starter --image-type conda
llama stack run distributions/starter/run.yaml \
--port 8321 \
--env OPENAI_API_KEY=your_openai_key \
--env FIREWORKS_API_KEY=your_fireworks_key \
--env TOGETHER_API_KEY=your_together_key
Example Usage
Once the distribution is running, you can use any of the available models. Here are some examples:
Using OpenAI Models
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id openai/gpt-4o \
--message "Hello, how are you?"
Using Fireworks Models
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id fireworks/meta-llama/Llama-3.2-3B-Instruct \
--message "Write a short story about a robot."
Using Local Ollama Models
# First, make sure Ollama is running and you have a model
ollama run llama3.2:3b
# Then use it through Llama Stack
export OLLAMA_INFERENCE_MODEL=llama3.2:3b
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id ollama/llama3.2:3b \
--message "Explain quantum computing in simple terms."
Storage
The starter distribution uses SQLite for local storage of various components:
- Metadata store:
~/.llama/distributions/starter/registry.db
- Inference store:
~/.llama/distributions/starter/inference_store.db
- FAISS store:
~/.llama/distributions/starter/faiss_store.db
- SQLite vector store:
~/.llama/distributions/starter/sqlite_vec.db
- Files metadata:
~/.llama/distributions/starter/files_metadata.db
- Agents store:
~/.llama/distributions/starter/agents_store.db
- Responses store:
~/.llama/distributions/starter/responses_store.db
- Trace store:
~/.llama/distributions/starter/trace_store.db
- Evaluation store:
~/.llama/distributions/starter/meta_reference_eval.db
- Dataset I/O stores: Various HuggingFace and local filesystem stores
Benefits of the Starter Distribution
- Comprehensive Coverage: Includes most popular AI providers in one distribution
- Flexible Configuration: Easy to enable/disable providers based on your needs
- No Local GPU Required: Most providers are cloud-based, making it accessible to developers without high-end hardware
- Easy Migration: Start with hosted providers and gradually move to local ones as needed
- Production Ready: Includes safety, evaluation, and telemetry components
- Tool Integration: Comes with web search, RAG, and model context protocol tools
The starter distribution is ideal for developers who want to experiment with different AI providers, build prototypes quickly, or create applications that can work with multiple AI backends.