llama-stack-mirror/docs/source/distributions/list_of_distributions.md
Ben Browning 5bb3817c49
fix: Restore the nvidia distro (#2639)
# What does this PR do?

The `nvidia` distro was previously collapsed into the `starter` distro.
However, the `nvidia` distro was setup specifically to use NVIDIA NeMo
microservices as providers for all APIs and not just inference, which
means it was doing quite a bit more than what the `starter` distro
covers today.

We should work with our friends at NVIDIA to determine the best place to
maintain this distro long-term, but for now this restores the `nvidia`
distro and its docs back to where they were so that things continue to
work for their users.

## Test Plan

I ensure the `nvidia` distro could build, and run at least to the point
of complaining that I didn't provide the necessary API keys.

```
uv run llama stack build --template nvidia --image-type venv
uv run llama stack run llama_stack/templates/nvidia/run.yaml
```

I also made sure the docs website built and looks reasonable, with the
`nvidia` distro docs at the same URL it was previously (because it has
incoming links from official NVIDIA NeMo docs, among other places).

```
uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-07-07 15:50:05 -07:00

3.5 KiB

Available Distributions

Llama Stack provides several pre-configured distributions to help you get started quickly. Choose the distribution that best fits your hardware and use case.

Quick Reference

Distribution Use Case Hardware Requirements Provider
distribution-starter General purpose, prototyping Any (CPU/GPU) Ollama, Remote APIs
distribution-meta-reference-gpu High-performance inference GPU required Local GPU inference
Remote-hosted Production, managed service None Partner providers
iOS/Android SDK Mobile applications Mobile device On-device inference

Choose Your Distribution

Use distribution-starter if you want to:

  • Prototype quickly without GPU requirements
  • Use remote inference providers (Fireworks, Together, vLLM etc.)
  • Run locally with Ollama for development
docker pull llama-stack/distribution-starter

Guides: Starter Distribution Guide

🖥️ Self-Hosted with GPU

Use distribution-meta-reference-gpu if you:

  • Have access to GPU hardware
  • Want maximum performance and control
  • Need to run inference locally
docker pull llama-stack/distribution-meta-reference-gpu

Guides: Meta Reference GPU Guide

🖥️ Self-Hosted with NVIDA NeMo Microservices

Use nvidia if you:

  • Want to use Llama Stack with NVIDIA NeMo Microservices

Guides: NVIDIA Distribution Guide

☁️ Managed Hosting

Use remote-hosted endpoints if you:

  • Don't want to manage infrastructure
  • Need production-ready reliability
  • Prefer managed services

Partners: Fireworks.ai and Together.xyz

Guides: Remote-Hosted Endpoints

📱 Mobile Development

Use mobile SDKs if you:

  • Are building iOS or Android applications

  • Need on-device inference capabilities

  • Want offline functionality

  • iOS SDK

  • Android SDK

🔧 Custom Solutions

Build your own distribution if:

  • None of the above fit your specific needs
  • You need custom configurations
  • You want to optimize for your specific use case

Guides: Building Custom Distributions

Detailed Documentation

Self-Hosted Distributions

:maxdepth: 1

self_hosted_distro/starter
self_hosted_distro/meta-reference-gpu

Remote-Hosted Solutions

:maxdepth: 1

remote_hosted_distro/index

Mobile SDKs

:maxdepth: 1

ondevice_distro/ios_sdk
ondevice_distro/android_sdk

Decision Flow

graph TD
    A[What's your use case?] --> B{Need mobile app?}
    B -->|Yes| C[Use Mobile SDKs]
    B -->|No| D{Have GPU hardware?}
    D -->|Yes| E[Use Meta Reference GPU]
    D -->|No| F{Want managed hosting?}
    F -->|Yes| G[Use Remote-Hosted]
    F -->|No| H[Use Starter Distribution]

Next Steps

  1. Choose your distribution from the options above
  2. Follow the setup guide for your selected distribution
  3. Configure your providers with API keys or local models
  4. Start building with Llama Stack!

For help choosing or troubleshooting, check our Getting Started Guide or Community Support.