More doc cleanup

2024-11-22 14:37:22 -08:00 · 2024-11-22 14:37:22 -08:00 · c2c53d0272
commit c2c53d0272
parent 900b0556e7
6 changed files with 34 additions and 121 deletions
--- a/docs/source/distribution_dev/index.md
+++ b/docs/source/distribution_dev/index.md
@ -1,20 +0,0 @@
-# Developer Guide
-
-```{toctree}
-:hidden:
-:maxdepth: 1
-
-building_distro
-```
-
-## Key Concepts
-
-### API Provider
-A Provider is what makes the API real -- they provide the actual implementation backing the API.
-
-As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options.
-
-A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs.
-
-### Distribution
-A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
--- a/docs/source/distribution_dev/building_distro.md
+++ b/docs/source/distribution_dev/building_distro.md
@ -1,15 +1,22 @@
-# Developer Guide: Assemble a Llama Stack Distribution
+# Build your own Distribution


-This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution.
+This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers.

-## Step 1. Build

-### Llama Stack Build Options
+## Llama Stack Build
+
+In order to build your own distribution, we recommend you clone the `llama-stack` repository.
+

 ```
+git clone git@github.com:meta-llama/llama-stack.git
+cd llama-stack
+pip install -e .
+
 llama stack build -h
 ```
+
 We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
 - `name`: the name for our distribution (e.g. `my-stack`)
 - `image_type`: our build image type (`conda | docker`)
@ -240,7 +247,7 @@ After this step is successful, you should be able to find the built docker image
 ::::


-## Step 2. Run
+## Running your Stack server
 Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.

 ```
@ -250,11 +257,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-
 ```
 $ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml

-Loaded model...
-Serving API datasets
- GET /datasets/get
- GET /datasets/list
- POST /datasets/register
 Serving API inspect
 GET /health
 GET /providers/list
@ -263,41 +265,7 @@ Serving API inference
 POST /inference/chat_completion
 POST /inference/completion
 POST /inference/embeddings
-Serving API scoring_functions
- GET /scoring_functions/get
- GET /scoring_functions/list
- POST /scoring_functions/register
-Serving API scoring
- POST /scoring/score
- POST /scoring/score_batch
-Serving API memory_banks
- GET /memory_banks/get
- GET /memory_banks/list
- POST /memory_banks/register
-Serving API memory
- POST /memory/insert
- POST /memory/query
-Serving API safety
- POST /safety/run_shield
-Serving API eval
- POST /eval/evaluate
- POST /eval/evaluate_batch
- POST /eval/job/cancel
- GET /eval/job/result
- GET /eval/job/status
-Serving API shields
- GET /shields/get
- GET /shields/list
- POST /shields/register
-Serving API datasetio
- GET /datasetio/get_rows_paginated
-Serving API telemetry
- GET /telemetry/get_trace
- POST /telemetry/log_event
-Serving API models
- GET /models/get
- GET /models/list
- POST /models/register
+...
 Serving API agents
 POST /agents/create
 POST /agents/session/create
@ -316,8 +284,6 @@ INFO:     Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit
 INFO:     2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
 ```

-> [!IMPORTANT]
-> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
+### Troubleshooting

-> [!TIP]
-> You might need to use the flag `--disable-ipv6` to  Disable IPv6 support
+If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.
--- a/docs/source/distributions/index.md
+++ b/docs/source/distributions/index.md
@ -1,4 +1,13 @@
 # Starting a Llama Stack
+```{toctree}
+:maxdepth: 3
+:hidden:
+
+self_hosted_distro/index
+remote_hosted_distro/index
+building_distro
+ondevice_distro/index
+```

 As mentioned in the [Concepts](../concepts/index), Llama Stack Distributions are specific pre-packaged versions of the Llama Stack. These templates make it easy to get started quickly.

@ -19,56 +28,9 @@ If so, we suggest:
  - [distribution-ollama](self_hosted_distro/ollama)

 - **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
-  - [distribution-together](#remote-hosted-distributions)
-  - [distribution-fireworks](#remote-hosted-distributions)
+  - [distribution-together](remote_hosted_distro/index)
+  - [distribution-fireworks](remote_hosted_distro/index)

 - **Do you want to run Llama Stack inference on your iOS / Android device** If so, we suggest:
  - [iOS](ondevice_distro/ios_sdk)
-  - [Android](ondevice_distro/android_sdk) (coming soon)
-
-
-## Remote-Hosted Distributions
-
-Remote-Hosted distributions are available endpoints serving Llama Stack API that you can directly connect to.
-
-| Distribution | Endpoint | Inference | Agents | Memory | Safety | Telemetry |
-|-------------|----------|-----------|---------|---------|---------|------------|
-| Together | [https://llama-stack.together.ai](https://llama-stack.together.ai) | remote::together | meta-reference | remote::weaviate | meta-reference | meta-reference |
-| Fireworks | [https://llamastack-preview.fireworks.ai](https://llamastack-preview.fireworks.ai) | remote::fireworks | meta-reference | remote::weaviate | meta-reference | meta-reference |
-
-You can use `llama-stack-client` to interact with these endpoints. For example, to list the available models served by the Fireworks endpoint:
-
-```bash
-$ pip install llama-stack-client
-$ llama-stack-client configure --endpoint https://llamastack-preview.fireworks.ai
-$ llama-stack-client models list
-```
-
-## On-Device Distributions
-
-On-device distributions are Llama Stack distributions that run locally on your iOS / Android device.
-
-
-## Building Your Own Distribution
-
-<TODO> talk about llama stack build --image-type conda, etc.
-
-### Prerequisites
-
-```bash
-$ git clone git@github.com:meta-llama/llama-stack.git
-```
-
-
-### Troubleshooting
-
- If you encounter any issues, search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.
- Use `--port <PORT>` flag to use a different port number. For docker run, update the `-p <PORT>:<PORT>` flag.
-
-
-```{toctree}
-:maxdepth: 3
-
-remote_hosted_distro/index
-ondevice_distro/index
-```
+  - Android (coming soon)
--- a/docs/source/distributions/ondevice_distro/index.md
+++ b/docs/source/distributions/ondevice_distro/index.md
@ -1,6 +1,12 @@
+# On-Device Distributions

 ```{toctree}
 :maxdepth: 1
+:hidden:

 ios_sdk
 ```
+
+On device distributions are Llama Stack distributions that run locally on your iOS / Android device.
+
+Currently, we only support the [iOS SDK](ios_sdk); support for Android is coming soon.
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -96,5 +96,4 @@ getting_started/index
 concepts/index
 distributions/index
 contributing/index
-distribution_dev/index
 ```
--- a/docs/source/references/llama_cli_reference/index.md
+++ b/docs/source/references/llama_cli_reference/index.md
@ -29,7 +29,7 @@ You have two ways to install Llama Stack:
 ## `llama` subcommands
 1. `download`: `llama` cli tools supports downloading the model from Meta or Hugging Face.
 2. `model`: Lists available models and their properties.
-3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distribution_dev/building_distro.md).
+3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](../distributions/building_distro).

 ### Sample Usage