mirror of https://github.com/meta-llama/llama-stack.git synced 2025-07-07 06:20:45 +00:00

feat: consolidate most distros into "starter" (#2516 )

# What does this PR do?

* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
  since inference providers are disabled by default and can be turned on
  manually via env variable.
* Disables safety in starter distro

Closes: https://github.com/meta-llama/llama-stack/issues/2502.

~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~

TODO:

- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama

Signed-off-by: Sébastien Han <seb@redhat.com>

2025-07-04 15:58:03 +02:00

3.3 KiB

Raw Blame History

Available Distributions

Llama Stack provides several pre-configured distributions to help you get started quickly. Choose the distribution that best fits your hardware and use case.

Quick Reference

Distribution	Use Case	Hardware Requirements	Provider
`distribution-starter`	General purpose, prototyping	Any (CPU/GPU)	Ollama, Remote APIs
`distribution-meta-reference-gpu`	High-performance inference	GPU required	Local GPU inference
Remote-hosted	Production, managed service	None	Partner providers
iOS/Android SDK	Mobile applications	Mobile device	On-device inference

Choose Your Distribution

🚀 Getting Started (Recommended for Beginners)

Use distribution-starter if you want to:

Prototype quickly without GPU requirements
Use remote inference providers (Fireworks, Together, vLLM etc.)
Run locally with Ollama for development

docker pull llama-stack/distribution-starter

Guides: Starter Distribution Guide

🖥️ Self-Hosted with GPU

Use distribution-meta-reference-gpu if you:

Have access to GPU hardware
Want maximum performance and control
Need to run inference locally

docker pull llama-stack/distribution-meta-reference-gpu

Guides: Meta Reference GPU Guide

☁️ Managed Hosting

Use remote-hosted endpoints if you:

Don't want to manage infrastructure
Need production-ready reliability
Prefer managed services

Partners: Fireworks.ai and Together.xyz

Guides: Remote-Hosted Endpoints

📱 Mobile Development

Use mobile SDKs if you:

Are building iOS or Android applications
Need on-device inference capabilities
Want offline functionality
iOS SDK
Android SDK

🔧 Custom Solutions

Build your own distribution if:

None of the above fit your specific needs
You need custom configurations
You want to optimize for your specific use case

Guides: Building Custom Distributions

Detailed Documentation

Self-Hosted Distributions

:maxdepth: 1

self_hosted_distro/starter
self_hosted_distro/meta-reference-gpu

Remote-Hosted Solutions

:maxdepth: 1

remote_hosted_distro/index

Mobile SDKs

:maxdepth: 1

ondevice_distro/ios_sdk
ondevice_distro/android_sdk

Decision Flow

graph TD
    A[What's your use case?] --> B{Need mobile app?}
    B -->|Yes| C[Use Mobile SDKs]
    B -->|No| D{Have GPU hardware?}
    D -->|Yes| E[Use Meta Reference GPU]
    D -->|No| F{Want managed hosting?}
    F -->|Yes| G[Use Remote-Hosted]
    F -->|No| H[Use Starter Distribution]

Next Steps

Choose your distribution from the options above
Follow the setup guide for your selected distribution
Configure your providers with API keys or local models
Start building with Llama Stack!

For help choosing or troubleshooting, check our Getting Started Guide or Community Support.

3.3 KiB Raw Blame History