mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-13 16:46:09 +00:00
# What does this PR do? The `nvidia` distro was previously collapsed into the `starter` distro. However, the `nvidia` distro was setup specifically to use NVIDIA NeMo microservices as providers for all APIs and not just inference, which means it was doing quite a bit more than what the `starter` distro covers today. We should work with our friends at NVIDIA to determine the best place to maintain this distro long-term, but for now this restores the `nvidia` distro and its docs back to where they were so that things continue to work for their users. ## Test Plan I ensure the `nvidia` distro could build, and run at least to the point of complaining that I didn't provide the necessary API keys. ``` uv run llama stack build --template nvidia --image-type venv uv run llama stack run llama_stack/templates/nvidia/run.yaml ``` I also made sure the docs website built and looks reasonable, with the `nvidia` distro docs at the same URL it was previously (because it has incoming links from official NVIDIA NeMo docs, among other places). ``` uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>
3.5 KiB
3.5 KiB
Available Distributions
Llama Stack provides several pre-configured distributions to help you get started quickly. Choose the distribution that best fits your hardware and use case.
Quick Reference
Distribution | Use Case | Hardware Requirements | Provider |
---|---|---|---|
distribution-starter |
General purpose, prototyping | Any (CPU/GPU) | Ollama, Remote APIs |
distribution-meta-reference-gpu |
High-performance inference | GPU required | Local GPU inference |
Remote-hosted | Production, managed service | None | Partner providers |
iOS/Android SDK | Mobile applications | Mobile device | On-device inference |
Choose Your Distribution
🚀 Getting Started (Recommended for Beginners)
Use distribution-starter
if you want to:
- Prototype quickly without GPU requirements
- Use remote inference providers (Fireworks, Together, vLLM etc.)
- Run locally with Ollama for development
docker pull llama-stack/distribution-starter
Guides: Starter Distribution Guide
🖥️ Self-Hosted with GPU
Use distribution-meta-reference-gpu
if you:
- Have access to GPU hardware
- Want maximum performance and control
- Need to run inference locally
docker pull llama-stack/distribution-meta-reference-gpu
Guides: Meta Reference GPU Guide
🖥️ Self-Hosted with NVIDA NeMo Microservices
Use nvidia
if you:
- Want to use Llama Stack with NVIDIA NeMo Microservices
Guides: NVIDIA Distribution Guide
☁️ Managed Hosting
Use remote-hosted endpoints if you:
- Don't want to manage infrastructure
- Need production-ready reliability
- Prefer managed services
Partners: Fireworks.ai and Together.xyz
Guides: Remote-Hosted Endpoints
📱 Mobile Development
Use mobile SDKs if you:
-
Are building iOS or Android applications
-
Need on-device inference capabilities
-
Want offline functionality
🔧 Custom Solutions
Build your own distribution if:
- None of the above fit your specific needs
- You need custom configurations
- You want to optimize for your specific use case
Guides: Building Custom Distributions
Detailed Documentation
Self-Hosted Distributions
:maxdepth: 1
self_hosted_distro/starter
self_hosted_distro/meta-reference-gpu
Remote-Hosted Solutions
:maxdepth: 1
remote_hosted_distro/index
Mobile SDKs
:maxdepth: 1
ondevice_distro/ios_sdk
ondevice_distro/android_sdk
Decision Flow
graph TD
A[What's your use case?] --> B{Need mobile app?}
B -->|Yes| C[Use Mobile SDKs]
B -->|No| D{Have GPU hardware?}
D -->|Yes| E[Use Meta Reference GPU]
D -->|No| F{Want managed hosting?}
F -->|Yes| G[Use Remote-Hosted]
F -->|No| H[Use Starter Distribution]
Next Steps
- Choose your distribution from the options above
- Follow the setup guide for your selected distribution
- Configure your providers with API keys or local models
- Start building with Llama Stack!
For help choosing or troubleshooting, check our Getting Started Guide or Community Support.