From e0548fe5b9178ad3292f6add5110700fad50257a Mon Sep 17 00:00:00 2001
From: Xi Yan <xiyan@meta.com>
Date: Tue, 5 Nov 2024 11:54:44 -0800
Subject: [PATCH] fix docs, kill configure

---
 .../distribution_dev/building_distro.md       | 303 ++++++++++--------
 1 file changed, 162 insertions(+), 141 deletions(-)
diff --git a/docs/source/distribution_dev/building_distro.md b/docs/source/distribution_dev/building_distro.md
index f3a9bd7bb..976474ebe 100644
--- a/docs/source/distribution_dev/building_distro.md
+++ b/docs/source/distribution_dev/building_distro.md
@@ -1,53 +1,72 @@
 # Developer Guide: Assemble a Llama Stack Distribution
 
-> NOTE: This doc may be out-of-date.
 
-This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](./getting_started.md) if you just want the basic steps to start a Llama Stack distribution.
+This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution.
 
 ## Step 1. Build
-In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
-- `name`: the name for our distribution (e.g. `8b-instruct`)
+
+```
+llama stack build -h
+
+usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates | --no-list-templates] [--image-type {conda,docker}]
+
+Build a Llama stack container
+
+options:
+  -h, --help            show this help message and exit
+  --config CONFIG       Path to a config file to use for the build. You can find example configs in llama_stack/distribution/example_configs. If this argument is not provided, you will be prompted to enter information interactively
+  --template TEMPLATE   Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates
+  --list-templates, --no-list-templates
+                        Show the available templates for building a Llama Stack distribution
+  --image-type {conda,docker}
+                        Image Type to use for the build. This can be either conda or docker. If not specified, will use the image type from the template config.
+
+```
+We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
+- `name`: the name for our distribution (e.g. `my-stack`)
 - `image_type`: our build image type (`conda | docker`)
 - `distribution_spec`: our distribution specs for specifying API providers
   - `description`: a short description of the configurations for the distribution
   - `providers`: specifies the underlying implementation for serving each API endpoint
   - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
 
+After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
 
-At the end of build command, we will generate `<name>-build.yaml` file storing the build configurations.
 
-After this step is complete, a file named `<name>-build.yaml` will be generated and saved at the output file path specified at the end of the command.
+You have 3 options for building your distribution:
+1.1 Building from scratch
+1.2. Building from a template
+1.3. Building from a pre-existing build config file
 
-#### Building from scratch
+
+### 1.1 Building from scratch
 - For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
 ```
 llama stack build
+
+> Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
+> Enter the image type you want your Llama Stack to be built as (docker or conda): conda
+
+Llama Stack is composed of several APIs working together. Let's select
+the provider types (implementations) you want to use for these APIs.
+
+Tip: use <TAB> to see options for the providers.
+
+> Enter provider for API inference: meta-reference
+> Enter provider for API safety: meta-reference
+> Enter provider for API agents: meta-reference
+> Enter provider for API memory: meta-reference
+> Enter provider for API datasetio: meta-reference
+> Enter provider for API scoring: meta-reference
+> Enter provider for API eval: meta-reference
+> Enter provider for API telemetry: meta-reference
+
+ > (Optional) Enter a short description for your Llama Stack:
+
+You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml`
 ```
 
-Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs.
-
-```
-> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct
-> Enter the image type you want your distribution to be built with (docker or conda): conda
-
- Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
-> Enter the API provider for the inference API: (default=meta-reference): meta-reference
-> Enter the API provider for the safety API: (default=meta-reference): meta-reference
-> Enter the API provider for the agents API: (default=meta-reference): meta-reference
-> Enter the API provider for the memory API: (default=meta-reference): meta-reference
-> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
-
- > (Optional) Enter a short description for your Llama Stack distribution:
-
-Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml
-```
-
-**Ollama (optional)**
-
-If you plan to use Ollama for inference, you'll need to install the server [via these instructions](https://ollama.com/download).
-
-
-#### Building from templates
+### 1.2 Building from a template
 - To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
 
 The following command will allow you to see the available templates and their corresponding providers.
@@ -59,18 +78,21 @@ llama stack build --list-templates
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
 | Template Name                | Providers                                  | Description                                                                      |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| bedrock                      | {                                          | Use Amazon Bedrock APIs.                                                         |
-|                              |   "inference": "remote::bedrock",          |                                                                                  |
-|                              |   "memory": "meta-reference",              |                                                                                  |
+| hf-serverless                | {                                          | Like local, but use Hugging Face Inference API (serverless) for running LLM      |
+|                              |   "inference": "remote::hf::serverless",   | inference.                                                                       |
+|                              |   "memory": "meta-reference",              | See https://hf.co/docs/api-inference.                                            |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| databricks                   | {                                          | Use Databricks for running LLM inference                                         |
-|                              |   "inference": "remote::databricks",       |                                                                                  |
-|                              |   "memory": "meta-reference",              |                                                                                  |
-|                              |   "safety": "meta-reference",              |                                                                                  |
+| together                     | {                                          | Use Together.ai for running LLM inference                                        |
+|                              |   "inference": "remote::together",         |                                                                                  |
+|                              |   "memory": [                              |                                                                                  |
+|                              |     "meta-reference",                      |                                                                                  |
+|                              |     "remote::weaviate"                     |                                                                                  |
+|                              |   ],                                       |                                                                                  |
+|                              |   "safety": "remote::together",            |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
@@ -88,17 +110,37 @@ llama stack build --list-templates
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| hf-endpoint                  | {                                          | Like local, but use Hugging Face Inference Endpoints for running LLM inference.  |
-|                              |   "inference": "remote::hf::endpoint",     | See https://hf.co/docs/api-endpoints.                                            |
+| databricks                   | {                                          | Use Databricks for running LLM inference                                         |
+|                              |   "inference": "remote::databricks",       |                                                                                  |
 |                              |   "memory": "meta-reference",              |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| hf-serverless                | {                                          | Like local, but use Hugging Face Inference API (serverless) for running LLM      |
-|                              |   "inference": "remote::hf::serverless",   | inference.                                                                       |
-|                              |   "memory": "meta-reference",              | See https://hf.co/docs/api-inference.                                            |
+| vllm                         | {                                          | Like local, but use vLLM for running LLM inference                               |
+|                              |   "inference": "vllm",                     |                                                                                  |
+|                              |   "memory": "meta-reference",              |                                                                                  |
+|                              |   "safety": "meta-reference",              |                                                                                  |
+|                              |   "agents": "meta-reference",              |                                                                                  |
+|                              |   "telemetry": "meta-reference"            |                                                                                  |
+|                              | }                                          |                                                                                  |
++------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
+| tgi                          | {                                          | Use TGI for running LLM inference                                                |
+|                              |   "inference": "remote::tgi",              |                                                                                  |
+|                              |   "memory": [                              |                                                                                  |
+|                              |     "meta-reference",                      |                                                                                  |
+|                              |     "remote::chromadb",                    |                                                                                  |
+|                              |     "remote::pgvector"                     |                                                                                  |
+|                              |   ],                                       |                                                                                  |
+|                              |   "safety": "meta-reference",              |                                                                                  |
+|                              |   "agents": "meta-reference",              |                                                                                  |
+|                              |   "telemetry": "meta-reference"            |                                                                                  |
+|                              | }                                          |                                                                                  |
++------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
+| bedrock                      | {                                          | Use Amazon Bedrock APIs.                                                         |
+|                              |   "inference": "remote::bedrock",          |                                                                                  |
+|                              |   "memory": "meta-reference",              |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
@@ -140,31 +182,8 @@ llama stack build --list-templates
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| tgi                          | {                                          | Use TGI for running LLM inference                                                |
-|                              |   "inference": "remote::tgi",              |                                                                                  |
-|                              |   "memory": [                              |                                                                                  |
-|                              |     "meta-reference",                      |                                                                                  |
-|                              |     "remote::chromadb",                    |                                                                                  |
-|                              |     "remote::pgvector"                     |                                                                                  |
-|                              |   ],                                       |                                                                                  |
-|                              |   "safety": "meta-reference",              |                                                                                  |
-|                              |   "agents": "meta-reference",              |                                                                                  |
-|                              |   "telemetry": "meta-reference"            |                                                                                  |
-|                              | }                                          |                                                                                  |
-+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| together                     | {                                          | Use Together.ai for running LLM inference                                        |
-|                              |   "inference": "remote::together",         |                                                                                  |
-|                              |   "memory": [                              |                                                                                  |
-|                              |     "meta-reference",                      |                                                                                  |
-|                              |     "remote::weaviate"                     |                                                                                  |
-|                              |   ],                                       |                                                                                  |
-|                              |   "safety": "remote::together",            |                                                                                  |
-|                              |   "agents": "meta-reference",              |                                                                                  |
-|                              |   "telemetry": "meta-reference"            |                                                                                  |
-|                              | }                                          |                                                                                  |
-+------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| vllm                         | {                                          | Like local, but use vLLM for running LLM inference                               |
-|                              |   "inference": "vllm",                     |                                                                                  |
+| hf-endpoint                  | {                                          | Like local, but use Hugging Face Inference Endpoints for running LLM inference.  |
+|                              |   "inference": "remote::hf::endpoint",     | See https://hf.co/docs/api-endpoints.                                            |
 |                              |   "memory": "meta-reference",              |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
@@ -175,6 +194,7 @@ llama stack build --list-templates
 
 You may then pick a template to build your distribution with providers fitted to your liking.
 
+For example, to build a distribution with TGI as the inference provider, you can run:
 ```
 llama stack build --template tgi
 ```
@@ -182,14 +202,13 @@ llama stack build --template tgi
 ```
 $ llama stack build --template tgi
 ...
-...
-You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml`
+You can now edit ~/.llama/distributions/llamastack-tgi/tgi-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-tgi/tgi-run.yaml`
 ```
 
-#### Building from config file
+### 1.3 Building from a pre-existing build config file
 - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
 
-- The config file will be of contents like the ones in `llama_stack/distributions/templates/`.
+- The config file will be of contents like the ones in `llama_stack/templates/*build.yaml`.
 
 ```
 $ cat llama_stack/templates/ollama/build.yaml
@@ -210,104 +229,106 @@ image_type: conda
 llama stack build --config llama_stack/templates/ollama/build.yaml
 ```
 
-#### How to build distribution with Docker image
-
+### How to build distribution with Docker image
 > [!TIP]
 > Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman.
 
 To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
 
 ```
-llama stack build --template local --image-type docker
+llama stack build --template ollama --image-type docker
 ```
 
-Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
-
 ```
-name: local-docker-example
-distribution_spec:
-  description: Use code from `llama_stack` itself to serve all llama stack APIs
-  docker_image: null
-  providers:
-    inference: meta-reference
-    memory: meta-reference-faiss
-    safety: meta-reference
-    agentic_system: meta-reference
-    telemetry: console
-image_type: docker
-```
-
-The following command allows you to build a Docker image with the name `<name>`
-```
-llama stack build --config <name>-build.yaml
-
-Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
-WORKDIR /app
+$ llama stack build --template ollama --image-type docker
 ...
+Dockerfile created successfully in /tmp/tmp.viA3a3Rdsg/DockerfileFROM python:3.10-slim
 ...
-You can run it with: podman run -p 8000:8000 llamastack-docker-local
-Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
+
+You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
 ```
 
-After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
-
-As you can see, we did basic configuration above and configured:
-- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
-- Llama Guard safety shield with model `Llama-Guard-3-1B`
-- Prompt Guard safety shield with model `Prompt-Guard-86M`
-
-For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
-
-Note that all configurations as well as models are stored in `~/.llama`
+After this step is successful, you should be able to find the built docker image and test it with `llama stack run <path/to/run.yaml>`. 
 
 
 ## Step 2. Run
 Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
 
 ```
-llama stack run 8b-instruct
+llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
 ```
 
-You should see the Llama Stack server start and print the APIs that it is supporting
-
 ```
-$ llama stack run 8b-instruct
+$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
 
-> initializing model parallel with size 1
-> initializing ddp with size 1
-> initializing pipeline with size 1
-Loaded in 19.28 seconds
-NCCL version 2.20.5+cuda12.4
-Finished model load YES READY
-Serving POST /inference/batch_chat_completion
-Serving POST /inference/batch_completion
-Serving POST /inference/chat_completion
-Serving POST /inference/completion
-Serving POST /safety/run_shield
-Serving POST /agentic_system/memory_bank/attach
-Serving POST /agentic_system/create
-Serving POST /agentic_system/session/create
-Serving POST /agentic_system/turn/create
-Serving POST /agentic_system/delete
-Serving POST /agentic_system/session/delete
-Serving POST /agentic_system/memory_bank/detach
-Serving POST /agentic_system/session/get
-Serving POST /agentic_system/step/get
-Serving POST /agentic_system/turn/get
-Listening on :::5000
-INFO:     Started server process [453333]
+Loaded model...
+Serving API datasets
+ GET /datasets/get
+ GET /datasets/list
+ POST /datasets/register
+Serving API inspect
+ GET /health
+ GET /providers/list
+ GET /routes/list
+Serving API inference
+ POST /inference/chat_completion
+ POST /inference/completion
+ POST /inference/embeddings
+Serving API scoring_functions
+ GET /scoring_functions/get
+ GET /scoring_functions/list
+ POST /scoring_functions/register
+Serving API scoring
+ POST /scoring/score
+ POST /scoring/score_batch
+Serving API memory_banks
+ GET /memory_banks/get
+ GET /memory_banks/list
+ POST /memory_banks/register
+Serving API memory
+ POST /memory/insert
+ POST /memory/query
+Serving API safety
+ POST /safety/run_shield
+Serving API eval
+ POST /eval/evaluate
+ POST /eval/evaluate_batch
+ POST /eval/job/cancel
+ GET /eval/job/result
+ GET /eval/job/status
+Serving API shields
+ GET /shields/get
+ GET /shields/list
+ POST /shields/register
+Serving API datasetio
+ GET /datasetio/get_rows_paginated
+Serving API telemetry
+ GET /telemetry/get_trace
+ POST /telemetry/log_event
+Serving API models
+ GET /models/get
+ GET /models/list
+ POST /models/register
+Serving API agents
+ POST /agents/create
+ POST /agents/session/create
+ POST /agents/turn/create
+ POST /agents/delete
+ POST /agents/session/delete
+ POST /agents/session/get
+ POST /agents/step/get
+ POST /agents/turn/get
+
+Listening on ['::', '0.0.0.0']:5000
+INFO:     Started server process [2935911]
 INFO:     Waiting for application startup.
 INFO:     Application startup complete.
-INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
+INFO:     Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit)
+INFO:     2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
 ```
 
-> [!NOTE]
-> Configuration is in `~/.llama/builds/local/conda/tgi-run.yaml`. Feel free to increase `max_seq_len`.
-
 > [!IMPORTANT]
 > The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
 
 > [!TIP]
 > You might need to use the flag `--disable-ipv6` to  Disable IPv6 support
-
-This server is running a Llama model locally.