feat: consolidate most distros into "starter" (#2516)

# What does this PR do? * Removes a bunch of distros * Removed distros were added into the "starter" distribution * Doc for "starter" has been added * Partially reverts https://github.com/meta-llama/llama-stack/pull/2482 since inference providers are disabled by default and can be turned on manually via env variable. * Disables safety in starter distro Closes: https://github.com/meta-llama/llama-stack/issues/2502. ~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama to work properly in the CI.~ TODO: - [ ] We can only update `install.sh` when we get a new release. - [x] Update providers documentation - [ ] Update notebooks to reference starter instead of ollama Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-09 21:18:38 +00:00 · 2025-07-04 15:58:03 +02:00 · 2025-07-04 15:58:03 +02:00 · c4349f532b
commit c4349f532b
parent f77d4d91f5
132 changed files with 1009 additions and 10845 deletions
--- a/docs/source/distributions/list_of_distributions.md
+++ b/docs/source/distributions/list_of_distributions.md
@ -1,51 +1,94 @@
-# Available List of Distributions
+# Available Distributions

-Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
+Llama Stack provides several pre-configured distributions to help you get started quickly. Choose the distribution that best fits your hardware and use case.

-## Selection of a Distribution / Template
+## Quick Reference

-Which templates / distributions to choose depends on the hardware you have for running LLM inference.
+| Distribution | Use Case | Hardware Requirements | Provider |
+|--------------|----------|----------------------|----------|
+| `distribution-starter` | General purpose, prototyping | Any (CPU/GPU) | Ollama, Remote APIs |
+| `distribution-meta-reference-gpu` | High-performance inference | GPU required | Local GPU inference |
+| Remote-hosted | Production, managed service | None | Partner providers |
+| iOS/Android SDK | Mobile applications | Mobile device | On-device inference |

- **Do you want a hosted Llama Stack endpoint?** If so, we suggest leveraging our partners who host Llama Stack endpoints. Namely, _fireworks.ai_ and _together.xyz_.
-  - Read more about it here - [Remote-Hosted Endpoints](remote_hosted_distro/index).
+## Choose Your Distribution

+### 🚀 Getting Started (Recommended for Beginners)

- **Do you have access to machines with GPUs?** If you wish to run Llama Stack locally or on a cloud instance and host your own Llama Stack endpoint, we suggest:
-  - {dockerhub}`distribution-remote-vllm` ([Guide](self_hosted_distro/remote-vllm))
-  - {dockerhub}`distribution-meta-reference-gpu` ([Guide](self_hosted_distro/meta-reference-gpu))
-  - {dockerhub}`distribution-tgi` ([Guide](self_hosted_distro/tgi))
-  - {dockerhub}`distribution-nvidia` ([Guide](self_hosted_distro/nvidia))
+**Use `distribution-starter` if you want to:**
+- Prototype quickly without GPU requirements
+- Use remote inference providers (Fireworks, Together, vLLM etc.)
+- Run locally with Ollama for development

- **Are you running on a "regular" desktop or laptop ?** We suggest using the ollama template for quick prototyping and get started without having to worry about needing GPUs.
-  - {dockerhub}`distribution-ollama` ([Guide](self_hosted_distro/ollama))
+```bash
+docker pull llama-stack/distribution-starter
+```

- **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?**  If so, we suggest:
-  - {dockerhub}`distribution-together` ([Guide](self_hosted_distro/together))
-  - {dockerhub}`distribution-fireworks` ([Guide](self_hosted_distro/fireworks))
+**Guides:** [Starter Distribution Guide](self_hosted_distro/starter)

- **Do you want to run Llama Stack inference on your iOS / Android device?**  Lastly, we also provide templates for running Llama Stack inference on your iOS / Android device:
-  - [iOS SDK](ondevice_distro/ios_sdk)
-  - [Android](ondevice_distro/android_sdk)
+### 🖥️ Self-Hosted with GPU

+**Use `distribution-meta-reference-gpu` if you:**
+- Have access to GPU hardware
+- Want maximum performance and control
+- Need to run inference locally

- **If none of the above fit your needs, you can also build your own [custom distribution](building_distro.md).**
+```bash
+docker pull llama-stack/distribution-meta-reference-gpu
+```

-### Distribution Details
+**Guides:** [Meta Reference GPU Guide](self_hosted_distro/meta-reference-gpu)
+
+### ☁️ Managed Hosting
+
+**Use remote-hosted endpoints if you:**
+- Don't want to manage infrastructure
+- Need production-ready reliability
+- Prefer managed services
+
+**Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)
+
+**Guides:** [Remote-Hosted Endpoints](remote_hosted_distro/index)
+
+### 📱 Mobile Development
+
+**Use mobile SDKs if you:**
+- Are building iOS or Android applications
+- Need on-device inference capabilities
+- Want offline functionality
+
+- [iOS SDK](ondevice_distro/ios_sdk)
+- [Android SDK](ondevice_distro/android_sdk)
+
+### 🔧 Custom Solutions
+
+**Build your own distribution if:**
+- None of the above fit your specific needs
+- You need custom configurations
+- You want to optimize for your specific use case
+
+**Guides:** [Building Custom Distributions](building_distro.md)
+
+## Detailed Documentation
+
+### Self-Hosted Distributions
+
+```{toctree}
+:maxdepth: 1
+
+self_hosted_distro/starter
+self_hosted_distro/meta-reference-gpu
+```
+
+### Remote-Hosted Solutions

 ```{toctree}
 :maxdepth: 1

 remote_hosted_distro/index
-self_hosted_distro/remote-vllm
-self_hosted_distro/meta-reference-gpu
-self_hosted_distro/tgi
-self_hosted_distro/nvidia
-self_hosted_distro/ollama
-self_hosted_distro/together
-self_hosted_distro/fireworks
 ```

-### On-Device Distributions
+### Mobile SDKs

 ```{toctree}
 :maxdepth: 1
@ -53,3 +96,25 @@ self_hosted_distro/fireworks
 ondevice_distro/ios_sdk
 ondevice_distro/android_sdk
 ```
+
+## Decision Flow
+
+```mermaid
+graph TD
+    A[What's your use case?] --> B{Need mobile app?}
+    B -->|Yes| C[Use Mobile SDKs]
+    B -->|No| D{Have GPU hardware?}
+    D -->|Yes| E[Use Meta Reference GPU]
+    D -->|No| F{Want managed hosting?}
+    F -->|Yes| G[Use Remote-Hosted]
+    F -->|No| H[Use Starter Distribution]
+```
+
+## Next Steps
+
+1. **Choose your distribution** from the options above
+2. **Follow the setup guide** for your selected distribution
+3. **Configure your providers** with API keys or local models
+4. **Start building** with Llama Stack!
+
+For help choosing or troubleshooting, check our [Getting Started Guide](../getting_started/index.md) or [Community Support](https://github.com/llama-stack/llama-stack/discussions).