We tried to always keep Ollama enabled. However doing so makes the
provider implementation half-assed -- should it error when it cannot
connect to Ollama or not? What happens during periodic model refresh?
Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to
conditionally enable the provider.
## Test Plan
Run `uv run llama stack build --template starter --image-type venv
--run` with and without `OLLAMA_URL` set. Verify using
`llama-stack-client provider list` that ollama is correctly enabled.
# What does this PR do?
* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
since inference providers are disabled by default and can be turned on
manually via env variable.
* Disables safety in starter distro
Closes: https://github.com/meta-llama/llama-stack/issues/2502.
~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~
TODO:
- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama
Signed-off-by: Sébastien Han <seb@redhat.com>