Much more documentation work, things are getting a bit consumable right now

2024-11-22 14:04:49 -08:00 · 2024-11-22 14:04:49 -08:00 · 900b0556e7
commit 900b0556e7
parent 98e213e96c
17 changed files with 143 additions and 162 deletions
--- a/docs/source/distributions/index.md
+++ b/docs/source/distributions/index.md
@ -1,57 +1,58 @@
-# Building Llama Stacks
+# Starting a Llama Stack

-```{toctree}
-:maxdepth: 2
-:hidden:
+As mentioned in the [Concepts](../concepts/index), Llama Stack Distributions are specific pre-packaged versions of the Llama Stack. These templates make it easy to get started quickly.

-self_hosted_distro/index
-remote_hosted_distro/index
-ondevice_distro/index
-```
-## Introduction
-
-Llama Stack Distributions are pre-built Docker containers/Conda environments that assemble APIs and Providers to provide a consistent whole to the end application developer.
-
-These distributions allow you to mix-and-match providers - some could be backed by local code and some could be remote. This flexibility enables you to choose the optimal setup for your use case, such as serving a small model locally while using a cloud provider for larger models, all while maintaining a consistent API interface for your application.
-
-
-## Decide Your Build Type
-There are two ways to start a Llama Stack:
-
- **Docker**: we provide a number of pre-built Docker containers allowing you to get started instantly. If you are focused on application development, we recommend this option.
+A Llama Stack Distribution can be consumed in two ways:
+- **Docker**: we provide a number of pre-built Docker containers allowing you to get started instantly. If you are focused on application development, we recommend this option. You can also build your own custom Docker container.
 - **Conda**: the `llama` CLI provides a simple set of commands to build, configure and run a Llama Stack server containing the exact combination of providers you wish. We have provided various templates to make getting started easier.

-Both of these provide options to run model inference using our reference implementations, Ollama, TGI, vLLM or even remote providers like Fireworks, Together, Bedrock, etc.
-
-### Decide Your Inference Provider
-
-Running inference on the underlying Llama model is one of the most critical requirements. Depending on what hardware you have available, you have various options. Note that each option have different necessary prerequisites.
+Which distribution to choose depends on the hardware you have for running LLM inference.

 - **Do you have access to a machine with powerful GPUs?**
 If so, we suggest:
-  - [distribution-meta-reference-gpu](./self_hosted_distro/meta-reference-gpu.md)
-  - [distribution-tgi](./self_hosted_distro/tgi.md)
+  - [distribution-remote-vllm](self_hosted_distro/remote-vllm)
+  - [distribution-meta-reference-gpu](self_hosted_distro/meta-reference-gpu)
+  - [distribution-tgi](self_hosted_distro/tgi)

 - **Are you running on a "regular" desktop machine?**
 If so, we suggest:
-  - [distribution-ollama](./self_hosted_distro/ollama.md)
+  - [distribution-ollama](self_hosted_distro/ollama)

 - **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
-  - [distribution-together](./remote_hosted_distro/together.md)
-  - [distribution-fireworks](./remote_hosted_distro/fireworks.md)
+  - [distribution-together](#remote-hosted-distributions)
+  - [distribution-fireworks](#remote-hosted-distributions)

 - **Do you want to run Llama Stack inference on your iOS / Android device** If so, we suggest:
-  - [iOS](./ondevice_distro/ios_sdk.md)
-  - [Android](https://github.com/meta-llama/llama-stack-client-kotlin) (coming soon)
+  - [iOS](ondevice_distro/ios_sdk)
+  - [Android](ondevice_distro/android_sdk) (coming soon)

-Please see our pages in detail for the types of distributions we offer:

-1. [Self-Hosted Distributions](./self_hosted_distro/index.md): If you want to run Llama Stack inference on your local machine.
-2. [Remote-Hosted Distributions](./remote_hosted_distro/index.md): If you want to connect to a remote hosted inference provider.
-3. [On-device Distributions](./ondevice_distro/index.md): If you want to run Llama Stack inference on your iOS / Android device.
+## Remote-Hosted Distributions
+
+Remote-Hosted distributions are available endpoints serving Llama Stack API that you can directly connect to.
+
+| Distribution | Endpoint | Inference | Agents | Memory | Safety | Telemetry |
+|-------------|----------|-----------|---------|---------|---------|------------|
+| Together | [https://llama-stack.together.ai](https://llama-stack.together.ai) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference |
+| Fireworks | [https://llamastack-preview.fireworks.ai](https://llamastack-preview.fireworks.ai) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference |
+
+You can use `llama-stack-client` to interact with these endpoints. For example, to list the available models served by the Fireworks endpoint:
+
+```bash
+$ pip install llama-stack-client
+$ llama-stack-client configure --endpoint https://llamastack-preview.fireworks.ai
+$ llama-stack-client models list
+```
+
+## On-Device Distributions
+
+On-device distributions are Llama Stack distributions that run locally on your iOS / Android device.
+

 ## Building Your Own Distribution

+<TODO> talk about llama stack build --image-type conda, etc.
+
 ### Prerequisites

 ```bash
@ -59,81 +60,15 @@ $ git clone git@github.com:meta-llama/llama-stack.git
 ```


-### Starting the Distribution
-
-::::{tab-set}
-
-:::{tab-item} meta-reference-gpu
-##### System Requirements
-Access to Single-Node GPU to start a local server.
-
-##### Downloading Models
-Please make sure you have Llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../cli_reference/download_models.md) here to download the models.
-
-```
-$ ls ~/.llama/checkpoints
-Llama3.1-8B           Llama3.2-11B-Vision-Instruct  Llama3.2-1B-Instruct  Llama3.2-90B-Vision-Instruct  Llama-Guard-3-8B
-Llama3.1-8B-Instruct  Llama3.2-1B                   Llama3.2-3B-Instruct  Llama-Guard-3-1B              Prompt-Guard-86M
-```
-
-:::
-
-:::{tab-item} vLLM
-##### System Requirements
-Access to Single-Node GPU to start a vLLM server.
-:::
-
-:::{tab-item} tgi
-##### System Requirements
-Access to Single-Node GPU to start a TGI server.
-:::
-
-:::{tab-item} ollama
-##### System Requirements
-Access to Single-Node CPU/GPU able to run ollama.
-:::
-
-:::{tab-item} together
-##### System Requirements
-Access to Single-Node CPU with Together hosted endpoint via API_KEY from [together.ai](https://api.together.xyz/signin).
-:::
-
-:::{tab-item} fireworks
-##### System Requirements
-Access to Single-Node CPU with Fireworks hosted endpoint via API_KEY from [fireworks.ai](https://fireworks.ai/).
-:::
-
-::::
-
-
-::::{tab-set}
-:::{tab-item} meta-reference-gpu
- [Start Meta Reference GPU Distribution](./self_hosted_distro/meta-reference-gpu.md)
-:::
-
-:::{tab-item} vLLM
- [Start vLLM Distribution](./self_hosted_distro/remote-vllm.md)
-:::
-
-:::{tab-item} tgi
- [Start TGI Distribution](./self_hosted_distro/tgi.md)
-:::
-
-:::{tab-item} ollama
- [Start Ollama Distribution](./self_hosted_distro/ollama.md)
-:::
-
-:::{tab-item} together
- [Start Together Distribution](./self_hosted_distro/together.md)
-:::
-
-:::{tab-item} fireworks
- [Start Fireworks Distribution](./self_hosted_distro/fireworks.md)
-:::
-
-::::
-
 ### Troubleshooting

 - If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.
 - Use `--port <PORT>` flag to use a different port number. For docker run, update the `-p <PORT>:<PORT>` flag.
+
+
+```{toctree}
+:maxdepth: 3
+
+remote_hosted_distro/index
+ondevice_distro/index
+```
--- a/docs/source/distributions/ondevice_distro/index.md
+++ b/docs/source/distributions/ondevice_distro/index.md
@ -1,6 +1,3 @@
-# On-Device Distributions
-
-On-device distributions are Llama Stack distributions that run locally on your iOS / Android device.

 ```{toctree}
 :maxdepth: 1
--- a/docs/source/distributions/remote_hosted_distro/index.md
+++ b/docs/source/distributions/remote_hosted_distro/index.md
@ -1,12 +1,5 @@
 # Remote-Hosted Distributions

-```{toctree}
-:maxdepth: 2
-:hidden:
-
-remote
-```
-
 Remote-Hosted distributions are available endpoints serving Llama Stack API that you can directly connect to.

 | Distribution | Endpoint | Inference | Agents | Memory | Safety | Telemetry |
--- a/docs/source/distributions/self_hosted_distro/index.md
+++ b/docs/source/distributions/self_hosted_distro/index.md
@ -1,20 +1,5 @@
 # Self-Hosted Distributions

-```{toctree}
-:maxdepth: 2
-:hidden:
-
-meta-reference-gpu
-meta-reference-quantized-gpu
-ollama
-tgi
-dell-tgi
-together
-fireworks
-remote-vllm
-bedrock
-```
-
 We offer deployable distributions where you can host your own Llama Stack server using local inference.

 | **Distribution** 	|           **Llama Stack Docker**           	| Start This Distribution 	|