mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-20 11:47:00 +00:00
feat: consolidate most distros into "starter" (#2516)
# What does this PR do? * Removes a bunch of distros * Removed distros were added into the "starter" distribution * Doc for "starter" has been added * Partially reverts https://github.com/meta-llama/llama-stack/pull/2482 since inference providers are disabled by default and can be turned on manually via env variable. * Disables safety in starter distro Closes: https://github.com/meta-llama/llama-stack/issues/2502. ~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama to work properly in the CI.~ TODO: - [ ] We can only update `install.sh` when we get a new release. - [x] Update providers documentation - [ ] Update notebooks to reference starter instead of ollama Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
parent
f77d4d91f5
commit
c4349f532b
132 changed files with 1009 additions and 10845 deletions
|
@ -1,51 +1,94 @@
|
|||
# Available List of Distributions
|
||||
# Available Distributions
|
||||
|
||||
Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
|
||||
Llama Stack provides several pre-configured distributions to help you get started quickly. Choose the distribution that best fits your hardware and use case.
|
||||
|
||||
## Selection of a Distribution / Template
|
||||
## Quick Reference
|
||||
|
||||
Which templates / distributions to choose depends on the hardware you have for running LLM inference.
|
||||
| Distribution | Use Case | Hardware Requirements | Provider |
|
||||
|--------------|----------|----------------------|----------|
|
||||
| `distribution-starter` | General purpose, prototyping | Any (CPU/GPU) | Ollama, Remote APIs |
|
||||
| `distribution-meta-reference-gpu` | High-performance inference | GPU required | Local GPU inference |
|
||||
| Remote-hosted | Production, managed service | None | Partner providers |
|
||||
| iOS/Android SDK | Mobile applications | Mobile device | On-device inference |
|
||||
|
||||
- **Do you want a hosted Llama Stack endpoint?** If so, we suggest leveraging our partners who host Llama Stack endpoints. Namely, _fireworks.ai_ and _together.xyz_.
|
||||
- Read more about it here - [Remote-Hosted Endpoints](remote_hosted_distro/index).
|
||||
## Choose Your Distribution
|
||||
|
||||
### 🚀 Getting Started (Recommended for Beginners)
|
||||
|
||||
- **Do you have access to machines with GPUs?** If you wish to run Llama Stack locally or on a cloud instance and host your own Llama Stack endpoint, we suggest:
|
||||
- {dockerhub}`distribution-remote-vllm` ([Guide](self_hosted_distro/remote-vllm))
|
||||
- {dockerhub}`distribution-meta-reference-gpu` ([Guide](self_hosted_distro/meta-reference-gpu))
|
||||
- {dockerhub}`distribution-tgi` ([Guide](self_hosted_distro/tgi))
|
||||
- {dockerhub}`distribution-nvidia` ([Guide](self_hosted_distro/nvidia))
|
||||
**Use `distribution-starter` if you want to:**
|
||||
- Prototype quickly without GPU requirements
|
||||
- Use remote inference providers (Fireworks, Together, vLLM etc.)
|
||||
- Run locally with Ollama for development
|
||||
|
||||
- **Are you running on a "regular" desktop or laptop ?** We suggest using the ollama template for quick prototyping and get started without having to worry about needing GPUs.
|
||||
- {dockerhub}`distribution-ollama` ([Guide](self_hosted_distro/ollama))
|
||||
```bash
|
||||
docker pull llama-stack/distribution-starter
|
||||
```
|
||||
|
||||
- **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
|
||||
- {dockerhub}`distribution-together` ([Guide](self_hosted_distro/together))
|
||||
- {dockerhub}`distribution-fireworks` ([Guide](self_hosted_distro/fireworks))
|
||||
**Guides:** [Starter Distribution Guide](self_hosted_distro/starter)
|
||||
|
||||
- **Do you want to run Llama Stack inference on your iOS / Android device?** Lastly, we also provide templates for running Llama Stack inference on your iOS / Android device:
|
||||
- [iOS SDK](ondevice_distro/ios_sdk)
|
||||
- [Android](ondevice_distro/android_sdk)
|
||||
### 🖥️ Self-Hosted with GPU
|
||||
|
||||
**Use `distribution-meta-reference-gpu` if you:**
|
||||
- Have access to GPU hardware
|
||||
- Want maximum performance and control
|
||||
- Need to run inference locally
|
||||
|
||||
- **If none of the above fit your needs, you can also build your own [custom distribution](building_distro.md).**
|
||||
```bash
|
||||
docker pull llama-stack/distribution-meta-reference-gpu
|
||||
```
|
||||
|
||||
### Distribution Details
|
||||
**Guides:** [Meta Reference GPU Guide](self_hosted_distro/meta-reference-gpu)
|
||||
|
||||
### ☁️ Managed Hosting
|
||||
|
||||
**Use remote-hosted endpoints if you:**
|
||||
- Don't want to manage infrastructure
|
||||
- Need production-ready reliability
|
||||
- Prefer managed services
|
||||
|
||||
**Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)
|
||||
|
||||
**Guides:** [Remote-Hosted Endpoints](remote_hosted_distro/index)
|
||||
|
||||
### 📱 Mobile Development
|
||||
|
||||
**Use mobile SDKs if you:**
|
||||
- Are building iOS or Android applications
|
||||
- Need on-device inference capabilities
|
||||
- Want offline functionality
|
||||
|
||||
- [iOS SDK](ondevice_distro/ios_sdk)
|
||||
- [Android SDK](ondevice_distro/android_sdk)
|
||||
|
||||
### 🔧 Custom Solutions
|
||||
|
||||
**Build your own distribution if:**
|
||||
- None of the above fit your specific needs
|
||||
- You need custom configurations
|
||||
- You want to optimize for your specific use case
|
||||
|
||||
**Guides:** [Building Custom Distributions](building_distro.md)
|
||||
|
||||
## Detailed Documentation
|
||||
|
||||
### Self-Hosted Distributions
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
self_hosted_distro/starter
|
||||
self_hosted_distro/meta-reference-gpu
|
||||
```
|
||||
|
||||
### Remote-Hosted Solutions
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
remote_hosted_distro/index
|
||||
self_hosted_distro/remote-vllm
|
||||
self_hosted_distro/meta-reference-gpu
|
||||
self_hosted_distro/tgi
|
||||
self_hosted_distro/nvidia
|
||||
self_hosted_distro/ollama
|
||||
self_hosted_distro/together
|
||||
self_hosted_distro/fireworks
|
||||
```
|
||||
|
||||
### On-Device Distributions
|
||||
### Mobile SDKs
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
@ -53,3 +96,25 @@ self_hosted_distro/fireworks
|
|||
ondevice_distro/ios_sdk
|
||||
ondevice_distro/android_sdk
|
||||
```
|
||||
|
||||
## Decision Flow
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[What's your use case?] --> B{Need mobile app?}
|
||||
B -->|Yes| C[Use Mobile SDKs]
|
||||
B -->|No| D{Have GPU hardware?}
|
||||
D -->|Yes| E[Use Meta Reference GPU]
|
||||
D -->|No| F{Want managed hosting?}
|
||||
F -->|Yes| G[Use Remote-Hosted]
|
||||
F -->|No| H[Use Starter Distribution]
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Choose your distribution** from the options above
|
||||
2. **Follow the setup guide** for your selected distribution
|
||||
3. **Configure your providers** with API keys or local models
|
||||
4. **Start building** with Llama Stack!
|
||||
|
||||
For help choosing or troubleshooting, check our [Getting Started Guide](../getting_started/index.md) or [Community Support](https://github.com/llama-stack/llama-stack/discussions).
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue