fix: use OLLAMA_URL to activate Ollama provider in starter (#2963)

We tried to always keep Ollama enabled. However doing so makes the provider implementation half-assed -- should it error when it cannot connect to Ollama or not? What happens during periodic model refresh? Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to conditionally enable the provider. ## Test Plan Run `uv run llama stack build --template starter --image-type venv --run` with and without `OLLAMA_URL` set. Verify using `llama-stack-client provider list` that ollama is correctly enabled.
2025-12-03 18:00:36 +00:00 · 2025-07-30 10:11:17 -07:00 · 2025-07-30 10:11:17 -07:00 · fd2aaf4978
commit fd2aaf4978
parent b69bafba30
6 changed files with 23 additions and 41 deletions
--- a/docs/source/distributions/self_hosted_distro/starter.md
+++ b/docs/source/distributions/self_hosted_distro/starter.md
@ -100,10 +100,6 @@ The following environment variables can be configured:
 ### Model Configuration
 - `INFERENCE_MODEL`: HuggingFace model for serverless inference
 - `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
- `OLLAMA_INFERENCE_MODEL`: Ollama model name
- `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name
- `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`)
- `VLLM_INFERENCE_MODEL`: vLLM model name

 ### Vector Database Configuration
 - `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
@ -127,43 +123,25 @@ The following environment variables can be configured:

 ## Enabling Providers

-You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys.
+You can enable specific providers by setting appropriate environment variables. For example,

-### Examples of Enabling Providers
-
-#### Enable FAISS Vector Provider
 ```bash
-export ENABLE_FAISS=faiss
+# self-hosted
+export OLLAMA_URL=http://localhost:11434   # enables the Ollama inference provider
+export VLLM_URL=http://localhost:8000/v1   # enables the vLLM inference provider
+export TGI_URL=http://localhost:8000/v1   # enables the TGI inference provider
+
+# cloud-hosted requiring API key configuration on the server
+export CEREBRAS_API_KEY=your_cerebras_api_key   # enables the Cerebras inference provider
+export NVIDIA_API_KEY=your_nvidia_api_key   # enables the NVIDIA inference provider
+
+# vector providers
+export MILVUS_URL=http://localhost:19530   # enables the Milvus vector provider
+export CHROMADB_URL=http://localhost:8000/v1   # enables the ChromaDB vector provider
+export PGVECTOR_DB=llama_stack_db   # enables the PGVector vector provider
 ```

-#### Enable Ollama Models
-```bash
-export ENABLE_OLLAMA=ollama
-```
-
-#### Disable vLLM Models
-```bash
-export VLLM_INFERENCE_MODEL=__disabled__
-```
-
-#### Disable Optional Vector Providers
-```bash
-export ENABLE_SQLITE_VEC=__disabled__
-export ENABLE_CHROMADB=__disabled__
-export ENABLE_PGVECTOR=__disabled__
-```
-
-### Provider ID Patterns
-
-The starter distribution uses several patterns for provider IDs:
-
-1. **Direct provider IDs**: `faiss`, `ollama`, `vllm`
-2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC:+sqlite-vec}`
-3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`
-
-When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`.
-
-When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
+This distribution comes with a default "llama-guard" shield that can be enabled by setting the `SAFETY_MODEL` environment variable to point to an appropriate Llama Guard model id. Use `llama-stack-client models list` to see the list of available models.

 ## Running the Distribution

--- a/docs/source/getting_started/quickstart.md
+++ b/docs/source/getting_started/quickstart.md
@ -16,10 +16,13 @@ as the inference [provider](../providers/inference/index) for a Llama Model.
 ```bash
 ollama run llama3.2:3b --keepalive 60m
 ```
+
 #### Step 2: Run the Llama Stack server
+
 We will use `uv` to run the Llama Stack server.
 ```bash
-uv run --with llama-stack llama stack build --template starter --image-type venv --run
+OLLAMA_URL=http://localhost:11434 \
+  uv run --with llama-stack llama stack build --template starter --image-type venv --run
 ```
 #### Step 3: Run the demo
 Now open up a new terminal and copy the following script into a file named `demo_script.py`.