feat: consolidate most distros into "starter" (#2516)

# What does this PR do?

* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
  since inference providers are disabled by default and can be turned on
  manually via env variable.
* Disables safety in starter distro

Closes: https://github.com/meta-llama/llama-stack/issues/2502.

~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~

TODO:

- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama

Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
Sébastien Han 2025-07-04 15:58:03 +02:00 committed by GitHub
parent f77d4d91f5
commit c4349f532b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
132 changed files with 1009 additions and 10845 deletions

View file

@ -1,51 +1,94 @@
# Available List of Distributions
# Available Distributions
Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
Llama Stack provides several pre-configured distributions to help you get started quickly. Choose the distribution that best fits your hardware and use case.
## Selection of a Distribution / Template
## Quick Reference
Which templates / distributions to choose depends on the hardware you have for running LLM inference.
| Distribution | Use Case | Hardware Requirements | Provider |
|--------------|----------|----------------------|----------|
| `distribution-starter` | General purpose, prototyping | Any (CPU/GPU) | Ollama, Remote APIs |
| `distribution-meta-reference-gpu` | High-performance inference | GPU required | Local GPU inference |
| Remote-hosted | Production, managed service | None | Partner providers |
| iOS/Android SDK | Mobile applications | Mobile device | On-device inference |
- **Do you want a hosted Llama Stack endpoint?** If so, we suggest leveraging our partners who host Llama Stack endpoints. Namely, _fireworks.ai_ and _together.xyz_.
- Read more about it here - [Remote-Hosted Endpoints](remote_hosted_distro/index).
## Choose Your Distribution
### 🚀 Getting Started (Recommended for Beginners)
- **Do you have access to machines with GPUs?** If you wish to run Llama Stack locally or on a cloud instance and host your own Llama Stack endpoint, we suggest:
- {dockerhub}`distribution-remote-vllm` ([Guide](self_hosted_distro/remote-vllm))
- {dockerhub}`distribution-meta-reference-gpu` ([Guide](self_hosted_distro/meta-reference-gpu))
- {dockerhub}`distribution-tgi` ([Guide](self_hosted_distro/tgi))
- {dockerhub}`distribution-nvidia` ([Guide](self_hosted_distro/nvidia))
**Use `distribution-starter` if you want to:**
- Prototype quickly without GPU requirements
- Use remote inference providers (Fireworks, Together, vLLM etc.)
- Run locally with Ollama for development
- **Are you running on a "regular" desktop or laptop ?** We suggest using the ollama template for quick prototyping and get started without having to worry about needing GPUs.
- {dockerhub}`distribution-ollama` ([Guide](self_hosted_distro/ollama))
```bash
docker pull llama-stack/distribution-starter
```
- **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
- {dockerhub}`distribution-together` ([Guide](self_hosted_distro/together))
- {dockerhub}`distribution-fireworks` ([Guide](self_hosted_distro/fireworks))
**Guides:** [Starter Distribution Guide](self_hosted_distro/starter)
- **Do you want to run Llama Stack inference on your iOS / Android device?** Lastly, we also provide templates for running Llama Stack inference on your iOS / Android device:
- [iOS SDK](ondevice_distro/ios_sdk)
- [Android](ondevice_distro/android_sdk)
### 🖥️ Self-Hosted with GPU
**Use `distribution-meta-reference-gpu` if you:**
- Have access to GPU hardware
- Want maximum performance and control
- Need to run inference locally
- **If none of the above fit your needs, you can also build your own [custom distribution](building_distro.md).**
```bash
docker pull llama-stack/distribution-meta-reference-gpu
```
### Distribution Details
**Guides:** [Meta Reference GPU Guide](self_hosted_distro/meta-reference-gpu)
### ☁️ Managed Hosting
**Use remote-hosted endpoints if you:**
- Don't want to manage infrastructure
- Need production-ready reliability
- Prefer managed services
**Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)
**Guides:** [Remote-Hosted Endpoints](remote_hosted_distro/index)
### 📱 Mobile Development
**Use mobile SDKs if you:**
- Are building iOS or Android applications
- Need on-device inference capabilities
- Want offline functionality
- [iOS SDK](ondevice_distro/ios_sdk)
- [Android SDK](ondevice_distro/android_sdk)
### 🔧 Custom Solutions
**Build your own distribution if:**
- None of the above fit your specific needs
- You need custom configurations
- You want to optimize for your specific use case
**Guides:** [Building Custom Distributions](building_distro.md)
## Detailed Documentation
### Self-Hosted Distributions
```{toctree}
:maxdepth: 1
self_hosted_distro/starter
self_hosted_distro/meta-reference-gpu
```
### Remote-Hosted Solutions
```{toctree}
:maxdepth: 1
remote_hosted_distro/index
self_hosted_distro/remote-vllm
self_hosted_distro/meta-reference-gpu
self_hosted_distro/tgi
self_hosted_distro/nvidia
self_hosted_distro/ollama
self_hosted_distro/together
self_hosted_distro/fireworks
```
### On-Device Distributions
### Mobile SDKs
```{toctree}
:maxdepth: 1
@ -53,3 +96,25 @@ self_hosted_distro/fireworks
ondevice_distro/ios_sdk
ondevice_distro/android_sdk
```
## Decision Flow
```mermaid
graph TD
A[What's your use case?] --> B{Need mobile app?}
B -->|Yes| C[Use Mobile SDKs]
B -->|No| D{Have GPU hardware?}
D -->|Yes| E[Use Meta Reference GPU]
D -->|No| F{Want managed hosting?}
F -->|Yes| G[Use Remote-Hosted]
F -->|No| H[Use Starter Distribution]
```
## Next Steps
1. **Choose your distribution** from the options above
2. **Follow the setup guide** for your selected distribution
3. **Configure your providers** with API keys or local models
4. **Start building** with Llama Stack!
For help choosing or troubleshooting, check our [Getting Started Guide](../getting_started/index.md) or [Community Support](https://github.com/llama-stack/llama-stack/discussions).