diff --git a/docs/source/distributions/self_hosted_distro/starter.md b/docs/source/distributions/self_hosted_distro/starter.md index e9fdb6f8d..730ccf165 100644 --- a/docs/source/distributions/self_hosted_distro/starter.md +++ b/docs/source/distributions/self_hosted_distro/starter.md @@ -15,7 +15,7 @@ The `llamastack/distribution-starter` distribution is a comprehensive, multi-pro ## Provider Composition -The starter distribution consists of the following provider configurations: +The starter distribution consists of the following configurations: | API | Provider(s) | |-----|-------------| @@ -23,8 +23,9 @@ The starter distribution consists of the following provider configurations: | datasetio | `remote::huggingface`, `inline::localfs` | | eval | `inline::meta-reference` | | files | `inline::localfs` | -| inference | `remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::tgi`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `remote::hf::serverless`, `remote::hf::endpoint`, `inline::sentence-transformers` | +| inference | `remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::tgi`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `remote::hf::serverless`, `remote::hf::endpoint`, `inline::sentence-transformers`, `remote::passthrough` | | safety | `inline::llama-guard` | +| post_training | `inline::huggingface` | | scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` | | telemetry | `inline::meta-reference` | | tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` | @@ -34,8 +35,8 @@ The starter distribution consists of the following provider configurations: The starter distribution includes a comprehensive set of inference providers: -### Hosted Providers -- **OpenAI**: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings +- **OpenAI**: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings - point to the relevant provider + configuration documentation for more details - **Fireworks**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - **Together**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - **Anthropic**: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings @@ -46,114 +47,40 @@ The starter distribution includes a comprehensive set of inference providers: - **NVIDIA**: NVIDIA NIM models - **HuggingFace**: Serverless and endpoint models - **Bedrock**: AWS Bedrock models - -### Local/Remote Providers +- **Passthrough**: Passthrough provider - use this to connect to any other inference provider that is not supported by Llama Stack - **Ollama**: Local Ollama models -- **vLLM**: Local or remote vLLM server +- **vLLM**: remote vLLM server - **TGI**: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use `DEH_URL`) - **Sentence Transformers**: Local embedding models -All providers are disabled by default. So you need to enable them by setting the environment variables. +All providers are **disabled** by default. So you need to enable them by setting the environment +variables. See [Enabling Providers](#enabling-providers) for more details. -## Environment Variables +## Vector Providers -The following environment variables can be configured: +The starter distribution includes a comprehensive set of vector providers: -### Server Configuration -- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`) - -### API Keys for Hosted Providers -- `OPENAI_API_KEY`: OpenAI API key -- `FIREWORKS_API_KEY`: Fireworks API key -- `TOGETHER_API_KEY`: Together API key -- `ANTHROPIC_API_KEY`: Anthropic API key -- `GEMINI_API_KEY`: Google Gemini API key -- `GROQ_API_KEY`: Groq API key -- `SAMBANOVA_API_KEY`: SambaNova API key -- `CEREBRAS_API_KEY`: Cerebras API key -- `LLAMA_API_KEY`: Llama API key -- `NVIDIA_API_KEY`: NVIDIA API key -- `HF_API_TOKEN`: HuggingFace API token - -### Local Provider Configuration -- `OLLAMA_URL`: Ollama server URL (default: `http://localhost:11434`) -- `VLLM_URL`: vLLM server URL (default: `http://localhost:8000/v1`) -- `VLLM_MAX_TOKENS`: vLLM max tokens (default: `4096`) -- `VLLM_API_TOKEN`: vLLM API token (default: `fake`) -- `VLLM_TLS_VERIFY`: vLLM TLS verification (default: `true`) -- `TGI_URL`: TGI server URL - -### Model Configuration -- `INFERENCE_MODEL`: HuggingFace model for serverless inference -- `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name -- `OLLAMA_INFERENCE_MODEL`: Ollama model name -- `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name -- `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`) -- `VLLM_INFERENCE_MODEL`: vLLM model name - -### Vector Database Configuration -- `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`) -- `ENABLE_SQLITE_VEC`: Enable SQLite vector provider -- `ENABLE_CHROMADB`: Enable ChromaDB provider -- `ENABLE_PGVECTOR`: Enable PGVector provider -- `CHROMADB_URL`: ChromaDB server URL -- `PGVECTOR_HOST`: PGVector host (default: `localhost`) -- `PGVECTOR_PORT`: PGVector port (default: `5432`) -- `PGVECTOR_DB`: PGVector database name -- `PGVECTOR_USER`: PGVector username -- `PGVECTOR_PASSWORD`: PGVector password - -### Tool Configuration -- `BRAVE_SEARCH_API_KEY`: Brave Search API key -- `TAVILY_SEARCH_API_KEY`: Tavily Search API key - -### Telemetry Configuration -- `OTEL_SERVICE_NAME`: OpenTelemetry service name -- `TELEMETRY_SINKS`: Telemetry sinks (default: `console,sqlite`) +- **FAISS**: Local FAISS vector store - enabled by default +- **SQLite**: Local SQLite vector store - disabled by default +- **ChromaDB**: Remote ChromaDB server - disabled by default +- **PGVector**: Remote PGVector server - disabled by default ## Enabling Providers -You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys. +You can enable specific providers by setting their provider ID to a string value using environment +variables. -### Examples of Enabling Providers +For instance, to enable the Ollama provider, you can set the `ENABLE_OLLAMA` environment variable to `ollama`. -#### Enable FAISS Vector Provider -```bash -export ENABLE_FAISS=faiss -``` - -#### Enable Ollama Models ```bash export ENABLE_OLLAMA=ollama ``` -#### Disable vLLM Models -```bash -export VLLM_INFERENCE_MODEL=__disabled__ -``` - -#### Disable Optional Vector Providers -```bash -export ENABLE_SQLITE_VEC=__disabled__ -export ENABLE_CHROMADB=__disabled__ -export ENABLE_PGVECTOR=__disabled__ -``` - -### Provider ID Patterns - -The starter distribution uses several patterns for provider IDs: - -1. **Direct provider IDs**: `faiss`, `ollama`, `vllm` -2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC+sqlite-vec}` -3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}` - -When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`. - -When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value. +To disable a provider, you can set the environment variable to `ENABLE_OLLAMA=__disabled__`. ## Running the Distribution -You can run the starter distribution via Docker or Conda. +You can run the starter distribution via Docker or directly using the Llama Stack CLI. ### Via Docker @@ -165,57 +92,19 @@ docker run \ -it \ --pull always \ -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ - -e OPENAI_API_KEY=your_openai_key \ - -e FIREWORKS_API_KEY=your_fireworks_key \ - -e TOGETHER_API_KEY=your_together_key \ + -e ENABLE_OLLAMA=ollama \ + -e OLLAMA_INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ llamastack/distribution-starter \ --port $LLAMA_STACK_PORT ``` -### Via Conda - -Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available. +You can also use the `llama stack run` command to run the distribution. ```bash -llama stack build --template starter --image-type conda llama stack run distributions/starter/run.yaml \ --port 8321 \ - --env OPENAI_API_KEY=your_openai_key \ - --env FIREWORKS_API_KEY=your_fireworks_key \ - --env TOGETHER_API_KEY=your_together_key -``` - -## Example Usage - -Once the distribution is running, you can use any of the available models. Here are some examples: - -### Using OpenAI Models -```bash -llama-stack-client --endpoint http://localhost:8321 \ -inference chat-completion \ ---model-id openai/gpt-4o \ ---message "Hello, how are you?" -``` - -### Using Fireworks Models -```bash -llama-stack-client --endpoint http://localhost:8321 \ -inference chat-completion \ ---model-id fireworks/meta-llama/Llama-3.2-3B-Instruct \ ---message "Write a short story about a robot." -``` - -### Using Local Ollama Models -```bash -# First, make sure Ollama is running and you have a model -ollama run llama3.2:3b - -# Then use it through Llama Stack -export OLLAMA_INFERENCE_MODEL=llama3.2:3b -llama-stack-client --endpoint http://localhost:8321 \ -inference chat-completion \ ---model-id ollama/llama3.2:3b \ ---message "Explain quantum computing in simple terms." + --env ENABLE_OLLAMA=ollama \ + --env OLLAMA_INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` ## Storage diff --git a/llama_stack/templates/starter/run.yaml b/llama_stack/templates/starter/run.yaml index c6aac5eae..fbc2c829a 100644 --- a/llama_stack/templates/starter/run.yaml +++ b/llama_stack/templates/starter/run.yaml @@ -106,11 +106,7 @@ providers: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/faiss_store.db -<<<<<<< HEAD - - provider_id: ${env.ENABLE_SQLITE_VEC:+sqlite-vec} -======= - provider_id: ${env.ENABLE_SQLITE_VEC:=__disabled__} ->>>>>>> fbcc565e (feat: consolidate most distros into "starter") provider_type: inline::sqlite-vec config: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db @@ -597,6 +593,11 @@ models: provider_id: ${env.ENABLE_OLLAMA:=__disabled__} provider_model_id: ${env.OLLAMA_EMBEDDING_MODEL:=__disabled__} model_type: embedding +- metadata: {} + model_id: ${env.ENABLE_OLLAMA:=__disabled__}/${env.OLLAMA_SAFETY_MODEL:=__disabled__} + provider_id: ${env.ENABLE_OLLAMA:=__disabled__} + provider_model_id: ${env.OLLAMA_SAFETY_MODEL:=__disabled__} + model_type: llm - metadata: {} model_id: ${env.ENABLE_ANTHROPIC:=__disabled__}/anthropic/claude-3-5-sonnet-latest provider_id: ${env.ENABLE_ANTHROPIC:=__disabled__} diff --git a/llama_stack/templates/starter/starter.py b/llama_stack/templates/starter/starter.py index 4f9bc55cf..bbeef6b72 100644 --- a/llama_stack/templates/starter/starter.py +++ b/llama_stack/templates/starter/starter.py @@ -110,6 +110,10 @@ def get_inference_providers() -> tuple[list[Provider], dict[str, list[ProviderMo "embedding_dimension": "${env.OLLAMA_EMBEDDING_DIMENSION:=384}", }, ), + ProviderModelEntry( + provider_model_id="${env.OLLAMA_SAFETY_MODEL:=__disabled__}", + model_type=ModelType.llm, + ), ], OllamaImplConfig.sample_run_config( url="${env.OLLAMA_URL:=http://localhost:11434}", raise_on_connect_error=False