fix: use OLLAMA_URL to activate Ollama provider in starter (#2963)

We tried to always keep Ollama enabled. However doing so makes the provider implementation half-assed -- should it error when it cannot connect to Ollama or not? What happens during periodic model refresh? Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to conditionally enable the provider. ## Test Plan Run `uv run llama stack build --template starter --image-type venv --run` with and without `OLLAMA_URL` set. Verify using `llama-stack-client provider list` that ollama is correctly enabled.
2025-12-03 09:53:45 +00:00 · 2025-07-30 10:11:17 -07:00 · 2025-07-30 10:11:17 -07:00 · fd2aaf4978
commit fd2aaf4978
parent b69bafba30
6 changed files with 23 additions and 41 deletions
--- a/docs/quick_start.ipynb
+++ b/docs/quick_start.ipynb
@ -150,7 +150,7 @@
        "def run_llama_stack_server_background():\n",
        "    log_file = open(\"llama_stack_server.log\", \"w\")\n",
        "    process = subprocess.Popen(\n",
-        "        f\"uv run --with llama-stack llama stack run starter --image-type venv --env INFERENCE_MODEL=llama3.2:3b\",\n",
+        "        f\"OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter --image-type venv",
        "        shell=True,\n",
        "        stdout=log_file,\n",
        "        stderr=log_file,\n",
--- a/docs/source/distributions/self_hosted_distro/starter.md
+++ b/docs/source/distributions/self_hosted_distro/starter.md
@ -100,10 +100,6 @@ The following environment variables can be configured:
 ### Model Configuration
 - `INFERENCE_MODEL`: HuggingFace model for serverless inference
 - `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
 - `OLLAMA_INFERENCE_MODEL`: Ollama model name
 - `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name
 - `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`)
 - `VLLM_INFERENCE_MODEL`: vLLM model name
 ### Vector Database Configuration
 - `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
@ -127,43 +123,25 @@ The following environment variables can be configured:
 ## Enabling Providers
-You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys.
+You can enable specific providers by setting appropriate environment variables. For example,
 ### Examples of Enabling Providers
 #### Enable FAISS Vector Provider
 ```bash
-export ENABLE_FAISS=faiss
+# self-hosted
 export OLLAMA_URL=http://localhost:11434   # enables the Ollama inference provider
 export VLLM_URL=http://localhost:8000/v1   # enables the vLLM inference provider
 export TGI_URL=http://localhost:8000/v1   # enables the TGI inference provider
 # cloud-hosted requiring API key configuration on the server
 export CEREBRAS_API_KEY=your_cerebras_api_key   # enables the Cerebras inference provider
 export NVIDIA_API_KEY=your_nvidia_api_key   # enables the NVIDIA inference provider
 # vector providers
 export MILVUS_URL=http://localhost:19530   # enables the Milvus vector provider
 export CHROMADB_URL=http://localhost:8000/v1   # enables the ChromaDB vector provider
 export PGVECTOR_DB=llama_stack_db   # enables the PGVector vector provider
 ```
-#### Enable Ollama Models
+This distribution comes with a default "llama-guard" shield that can be enabled by setting the `SAFETY_MODEL` environment variable to point to an appropriate Llama Guard model id. Use `llama-stack-client models list` to see the list of available models.
 ```bash
 export ENABLE_OLLAMA=ollama
 ```
 #### Disable vLLM Models
 ```bash
 export VLLM_INFERENCE_MODEL=__disabled__
 ```
 #### Disable Optional Vector Providers
 ```bash
 export ENABLE_SQLITE_VEC=__disabled__
 export ENABLE_CHROMADB=__disabled__
 export ENABLE_PGVECTOR=__disabled__
 ```
 ### Provider ID Patterns
 The starter distribution uses several patterns for provider IDs:
 1. **Direct provider IDs**: `faiss`, `ollama`, `vllm`
 2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC:+sqlite-vec}`
 3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`
 When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`.
 When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
 ## Running the Distribution
--- a/docs/source/getting_started/quickstart.md
+++ b/docs/source/getting_started/quickstart.md
@ -16,9 +16,12 @@ as the inference [provider](../providers/inference/index) for a Llama Model.
 ```bash
 ollama run llama3.2:3b --keepalive 60m
 ```
 #### Step 2: Run the Llama Stack server
 We will use `uv` to run the Llama Stack server.
 ```bash
 OLLAMA_URL=http://localhost:11434 \
  uv run --with llama-stack llama stack build --template starter --image-type venv --run
 ```
 #### Step 3: Run the demo
--- a/llama_stack/templates/ci-tests/run.yaml
+++ b/llama_stack/templates/ci-tests/run.yaml
@ -19,7 +19,7 @@ providers:
    config:
      base_url: https://api.cerebras.ai
      api_key: ${env.CEREBRAS_API_KEY:=}
-  - provider_id: ollama
+  - provider_id: ${env.OLLAMA_URL:+ollama}
    provider_type: remote::ollama
    config:
      url: ${env.OLLAMA_URL:=http://localhost:11434}
--- a/llama_stack/templates/starter/run.yaml
+++ b/llama_stack/templates/starter/run.yaml
@ -19,7 +19,7 @@ providers:
    config:
      base_url: https://api.cerebras.ai
      api_key: ${env.CEREBRAS_API_KEY:=}
-  - provider_id: ollama
+  - provider_id: ${env.OLLAMA_URL:+ollama}
    provider_type: remote::ollama
    config:
      url: ${env.OLLAMA_URL:=http://localhost:11434}
--- a/llama_stack/templates/starter/starter.py
+++ b/llama_stack/templates/starter/starter.py
@ -66,6 +66,7 @@ ENABLED_INFERENCE_PROVIDERS = [
 ]
 INFERENCE_PROVIDER_IDS = {
    "ollama": "${env.OLLAMA_URL:+ollama}",
    "vllm": "${env.VLLM_URL:+vllm}",
    "tgi": "${env.TGI_URL:+tgi}",
    "cerebras": "${env.CEREBRAS_API_KEY:+cerebras}",