diff --git a/docs/source/getting_started/distributions/dell-tgi.md b/docs/source/getting_started/distributions/deployable_distro/dell-tgi.md similarity index 100% rename from docs/source/getting_started/distributions/dell-tgi.md rename to docs/source/getting_started/distributions/deployable_distro/dell-tgi.md diff --git a/docs/source/getting_started/distributions/deployable_distro/index.md b/docs/source/getting_started/distributions/deployable_distro/index.md new file mode 100644 index 000000000..52708140d --- /dev/null +++ b/docs/source/getting_started/distributions/deployable_distro/index.md @@ -0,0 +1,22 @@ +# Deployable Distribution + +We offer deployable distributions where you can host your own Llama Stack server using local inference. + +| **Distribution** | **Llama Stack Docker** | Start This Distribution | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** | +|:----------------: |:------------------------------------------: |:-----------------------: |:------------------: |:------------------: |:------------------: |:------------------: |:------------------: | +| Meta Reference | [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html) | meta-reference | meta-reference | meta-reference; remote::pgvector; remote::chromadb | meta-reference | meta-reference | +| Meta Reference Quantized | [llamastack/distribution-meta-reference-quantized-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-quantized-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-quantized-gpu.html) | meta-reference-quantized | meta-reference | meta-reference; remote::pgvector; remote::chromadb | meta-reference | meta-reference | +| Ollama | [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html) | remote::ollama | meta-reference | remote::pgvector; remote::chromadb | meta-reference | meta-reference | +| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html) | remote::tgi | meta-reference | meta-reference; remote::pgvector; remote::chromadb | meta-reference | meta-reference | +| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference | +| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference | + +```{toctree} +:maxdepth: 1 + +meta-reference-gpu +meta-reference-quantized-gpu +ollama +tgi +dell-tgi +``` diff --git a/docs/source/getting_started/distributions/meta-reference-gpu.md b/docs/source/getting_started/distributions/deployable_distro/meta-reference-gpu.md similarity index 100% rename from docs/source/getting_started/distributions/meta-reference-gpu.md rename to docs/source/getting_started/distributions/deployable_distro/meta-reference-gpu.md diff --git a/docs/source/getting_started/distributions/meta-reference-quantized-gpu.md b/docs/source/getting_started/distributions/deployable_distro/meta-reference-quantized-gpu.md similarity index 100% rename from docs/source/getting_started/distributions/meta-reference-quantized-gpu.md rename to docs/source/getting_started/distributions/deployable_distro/meta-reference-quantized-gpu.md diff --git a/docs/source/getting_started/distributions/ollama.md b/docs/source/getting_started/distributions/deployable_distro/ollama.md similarity index 100% rename from docs/source/getting_started/distributions/ollama.md rename to docs/source/getting_started/distributions/deployable_distro/ollama.md diff --git a/docs/source/getting_started/distributions/tgi.md b/docs/source/getting_started/distributions/deployable_distro/tgi.md similarity index 100% rename from docs/source/getting_started/distributions/tgi.md rename to docs/source/getting_started/distributions/deployable_distro/tgi.md diff --git a/docs/source/getting_started/distributions/fireworks.md b/docs/source/getting_started/distributions/hosted_distro/fireworks.md similarity index 100% rename from docs/source/getting_started/distributions/fireworks.md rename to docs/source/getting_started/distributions/hosted_distro/fireworks.md diff --git a/docs/source/getting_started/distributions/hosted_distro/index.md b/docs/source/getting_started/distributions/hosted_distro/index.md new file mode 100644 index 000000000..d61dfcec3 --- /dev/null +++ b/docs/source/getting_started/distributions/hosted_distro/index.md @@ -0,0 +1,15 @@ +# Hosted Distribution + +Hosted distributions are distributions connecting to remote hosted services through Llama Stack server. Inference is done through remote providers. These are useful if you have an API key for a remote inference provider like Fireworks, Together, etc. + +| **Distribution** | **Llama Stack Docker** | Start This Distribution | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** | +|:----------------: |:------------------------------------------: |:-----------------------: |:------------------: |:------------------: |:------------------: |:------------------: |:------------------: | +| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference | +| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference | + +```{toctree} +:maxdepth: 1 + +fireworks +together +``` diff --git a/docs/source/getting_started/distributions/together.md b/docs/source/getting_started/distributions/hosted_distro/together.md similarity index 100% rename from docs/source/getting_started/distributions/together.md rename to docs/source/getting_started/distributions/hosted_distro/together.md diff --git a/docs/source/getting_started/distributions/index.md b/docs/source/getting_started/distributions/index.md index 354d2521c..87c91933a 100644 --- a/docs/source/getting_started/distributions/index.md +++ b/docs/source/getting_started/distributions/index.md @@ -2,23 +2,16 @@ A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications. -| **Distribution** | **Llama Stack Docker** | Start This Distribution | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** | -|:----------------: |:------------------------------------------: |:-----------------------: |:------------------: |:------------------: |:------------------: |:------------------: |:------------------: | -| Meta Reference | [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html) | meta-reference | meta-reference | meta-reference; remote::pgvector; remote::chromadb | meta-reference | meta-reference | -| Meta Reference Quantized | [llamastack/distribution-meta-reference-quantized-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-quantized-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-quantized-gpu.html) | meta-reference-quantized | meta-reference | meta-reference; remote::pgvector; remote::chromadb | meta-reference | meta-reference | -| Ollama | [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html) | remote::ollama | meta-reference | remote::pgvector; remote::chromadb | meta-reference | meta-reference | -| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html) | remote::tgi | meta-reference | meta-reference; remote::pgvector; remote::chromadb | meta-reference | meta-reference | -| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference | -| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference | +We offer three types of distributions: + +1. [Deployable Distribution](./deployable_distro/index.md): If you want to run Llama Stack inference on your local machine. +2. [Hosted Distribution](./hosted_distro/index.md): If you want to connect to a remote hosted inference provider. +3. [On-device Distribution](./ondevice_distro/index.md): If you want to run Llama Stack inference on your iOS / Android device. ```{toctree} :maxdepth: 1 -meta-reference-gpu -meta-reference-quantized-gpu -ollama -tgi -together -fireworks -dell-tgi +deployable_distro/index +hosted_distro/index +ondevice_distro/index ``` diff --git a/docs/source/getting_started/distributions/ondevice_distro/index.md b/docs/source/getting_started/distributions/ondevice_distro/index.md new file mode 100644 index 000000000..cf31719ac --- /dev/null +++ b/docs/source/getting_started/distributions/ondevice_distro/index.md @@ -0,0 +1,9 @@ +# On-Device Distribution + +On-device distributions are Llama Stack distributions that run locally on your iOS / Android device. + +```{toctree} +:maxdepth: 1 + +ios_setup +``` diff --git a/docs/source/getting_started/ios_setup.md b/docs/source/getting_started/distributions/ondevice_distro/ios_setup.md similarity index 100% rename from docs/source/getting_started/ios_setup.md rename to docs/source/getting_started/distributions/ondevice_distro/ios_setup.md