fix docs, kill configure

2025-07-30 23:51:00 +00:00 · 2024-11-05 11:54:44 -08:00 · 2024-11-05 11:54:44 -08:00 · e0548fe5b9
commit e0548fe5b9
parent f5f2936bbb
1 changed files with 162 additions and 141 deletions
--- a/docs/source/distribution_dev/building_distro.md
+++ b/docs/source/distribution_dev/building_distro.md
@ -1,53 +1,72 @@
 # Developer Guide: Assemble a Llama Stack Distribution
 > NOTE: This doc may be out-of-date.
-This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](./getting_started.md) if you just want the basic steps to start a Llama Stack distribution.
+This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) if you just want the basic steps to start a Llama Stack distribution.
 ## Step 1. Build
-In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
+
- `name`: the name for our distribution (e.g. `8b-instruct`)
+```
 llama stack build -h
 usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates | --no-list-templates] [--image-type {conda,docker}]
 Build a Llama stack container
 options:
  -h, --help            show this help message and exit
  --config CONFIG       Path to a config file to use for the build. You can find example configs in llama_stack/distribution/example_configs. If this argument is not provided, you will be prompted to enter information interactively
  --template TEMPLATE   Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates
  --list-templates, --no-list-templates
                        Show the available templates for building a Llama Stack distribution
  --image-type {conda,docker}
                        Image Type to use for the build. This can be either conda or docker. If not specified, will use the image type from the template config.
 ```
 We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
 - `name`: the name for our distribution (e.g. `my-stack`)
 - `image_type`: our build image type (`conda | docker`)
 - `distribution_spec`: our distribution specs for specifying API providers
  - `description`: a short description of the configurations for the distribution
  - `providers`: specifies the underlying implementation for serving each API endpoint
  - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
 After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
 At the end of build command, we will generate `<name>-build.yaml` file storing the build configurations.
-After this step is complete, a file named `<name>-build.yaml` will be generated and saved at the output file path specified at the end of the command.
+You have 3 options for building your distribution:
 1.1 Building from scratch
 1.2. Building from a template
 1.3. Building from a pre-existing build config file
-#### Building from scratch
+
 ### 1.1 Building from scratch
 - For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
 ```
 llama stack build
 > Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
 > Enter the image type you want your Llama Stack to be built as (docker or conda): conda
 Llama Stack is composed of several APIs working together. Let's select
 the provider types (implementations) you want to use for these APIs.
 Tip: use <TAB> to see options for the providers.
 > Enter provider for API inference: meta-reference
 > Enter provider for API safety: meta-reference
 > Enter provider for API agents: meta-reference
 > Enter provider for API memory: meta-reference
 > Enter provider for API datasetio: meta-reference
 > Enter provider for API scoring: meta-reference
 > Enter provider for API eval: meta-reference
 > Enter provider for API telemetry: meta-reference
 > (Optional) Enter a short description for your Llama Stack:
 You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml`
 ```
-Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs.
+### 1.2 Building from a template
 ```
 > Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct
 > Enter the image type you want your distribution to be built with (docker or conda): conda
 Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
 > Enter the API provider for the inference API: (default=meta-reference): meta-reference
 > Enter the API provider for the safety API: (default=meta-reference): meta-reference
 > Enter the API provider for the agents API: (default=meta-reference): meta-reference
 > Enter the API provider for the memory API: (default=meta-reference): meta-reference
 > Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
 > (Optional) Enter a short description for your Llama Stack distribution:
 Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml
 ```
 **Ollama (optional)**
 If you plan to use Ollama for inference, you'll need to install the server [via these instructions](https://ollama.com/download).
 #### Building from templates
 - To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
 The following command will allow you to see the available templates and their corresponding providers.
@ -59,18 +78,21 @@ llama stack build --list-templates
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
 | Template Name                | Providers                                  | Description                                                                      |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| bedrock                      | {                                          | Use Amazon Bedrock APIs.                                                         |
+| hf-serverless                | {                                          | Like local, but use Hugging Face Inference API (serverless) for running LLM      |
-|                              |   "inference": "remote::bedrock",          |                                                                                  |
+|                              |   "inference": "remote::hf::serverless",   | inference.                                                                       |
-|                              |   "memory": "meta-reference",              |                                                                                  |
+|                              |   "memory": "meta-reference",              | See https://hf.co/docs/api-inference.                                            |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| databricks                   | {                                          | Use Databricks for running LLM inference                                         |
+| together                     | {                                          | Use Together.ai for running LLM inference                                        |
-|                              |   "inference": "remote::databricks",       |                                                                                  |
+|                              |   "inference": "remote::together",         |                                                                                  |
-|                              |   "memory": "meta-reference",              |                                                                                  |
+|                              |   "memory": [                              |                                                                                  |
-|                              |   "safety": "meta-reference",              |                                                                                  |
+|                              |     "meta-reference",                      |                                                                                  |
 |                              |     "remote::weaviate"                     |                                                                                  |
 |                              |   ],                                       |                                                                                  |
 |                              |   "safety": "remote::together",            |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
@ -88,17 +110,37 @@ llama stack build --list-templates
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| hf-endpoint                  | {                                          | Like local, but use Hugging Face Inference Endpoints for running LLM inference.  |
+| databricks                   | {                                          | Use Databricks for running LLM inference                                         |
-|                              |   "inference": "remote::hf::endpoint",     | See https://hf.co/docs/api-endpoints.                                            |
+|                              |   "inference": "remote::databricks",       |                                                                                  |
 |                              |   "memory": "meta-reference",              |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| hf-serverless                | {                                          | Like local, but use Hugging Face Inference API (serverless) for running LLM      |
+| vllm                         | {                                          | Like local, but use vLLM for running LLM inference                               |
-|                              |   "inference": "remote::hf::serverless",   | inference.                                                                       |
+|                              |   "inference": "vllm",                     |                                                                                  |
-|                              |   "memory": "meta-reference",              | See https://hf.co/docs/api-inference.                                            |
+|                              |   "memory": "meta-reference",              |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
 | tgi                          | {                                          | Use TGI for running LLM inference                                                |
 |                              |   "inference": "remote::tgi",              |                                                                                  |
 |                              |   "memory": [                              |                                                                                  |
 |                              |     "meta-reference",                      |                                                                                  |
 |                              |     "remote::chromadb",                    |                                                                                  |
 |                              |     "remote::pgvector"                     |                                                                                  |
 |                              |   ],                                       |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
 | bedrock                      | {                                          | Use Amazon Bedrock APIs.                                                         |
 |                              |   "inference": "remote::bedrock",          |                                                                                  |
 |                              |   "memory": "meta-reference",              |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
@ -140,31 +182,8 @@ llama stack build --list-templates
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
-| tgi                          | {                                          | Use TGI for running LLM inference                                                |
+| hf-endpoint                  | {                                          | Like local, but use Hugging Face Inference Endpoints for running LLM inference.  |
-|                              |   "inference": "remote::tgi",              |                                                                                  |
+|                              |   "inference": "remote::hf::endpoint",     | See https://hf.co/docs/api-endpoints.                                            |
 |                              |   "memory": [                              |                                                                                  |
 |                              |     "meta-reference",                      |                                                                                  |
 |                              |     "remote::chromadb",                    |                                                                                  |
 |                              |     "remote::pgvector"                     |                                                                                  |
 |                              |   ],                                       |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
 | together                     | {                                          | Use Together.ai for running LLM inference                                        |
 |                              |   "inference": "remote::together",         |                                                                                  |
 |                              |   "memory": [                              |                                                                                  |
 |                              |     "meta-reference",                      |                                                                                  |
 |                              |     "remote::weaviate"                     |                                                                                  |
 |                              |   ],                                       |                                                                                  |
 |                              |   "safety": "remote::together",            |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
 |                              |   "telemetry": "meta-reference"            |                                                                                  |
 |                              | }                                          |                                                                                  |
 +------------------------------+--------------------------------------------+----------------------------------------------------------------------------------+
 | vllm                         | {                                          | Like local, but use vLLM for running LLM inference                               |
 |                              |   "inference": "vllm",                     |                                                                                  |
 |                              |   "memory": "meta-reference",              |                                                                                  |
 |                              |   "safety": "meta-reference",              |                                                                                  |
 |                              |   "agents": "meta-reference",              |                                                                                  |
@ -175,6 +194,7 @@ llama stack build --list-templates
 You may then pick a template to build your distribution with providers fitted to your liking.
 For example, to build a distribution with TGI as the inference provider, you can run:
 ```
 llama stack build --template tgi
 ```
@ -182,14 +202,13 @@ llama stack build --template tgi
 ```
 $ llama stack build --template tgi
 ...
-...
+You can now edit ~/.llama/distributions/llamastack-tgi/tgi-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-tgi/tgi-run.yaml`
 You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml`
 ```
-#### Building from config file
+### 1.3 Building from a pre-existing build config file
 - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
- The config file will be of contents like the ones in `llama_stack/distributions/templates/`.
+- The config file will be of contents like the ones in `llama_stack/templates/*build.yaml`.
 ```
 $ cat llama_stack/templates/ollama/build.yaml
@ -210,104 +229,106 @@ image_type: conda
 llama stack build --config llama_stack/templates/ollama/build.yaml
 ```
-#### How to build distribution with Docker image
+### How to build distribution with Docker image
 > [!TIP]
 > Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman.
 To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
 ```
-llama stack build --template local --image-type docker
+llama stack build --template ollama --image-type docker
 ```
 Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
 ```
-name: local-docker-example
+$ llama stack build --template ollama --image-type docker
 distribution_spec:
  description: Use code from `llama_stack` itself to serve all llama stack APIs
  docker_image: null
  providers:
    inference: meta-reference
    memory: meta-reference-faiss
    safety: meta-reference
    agentic_system: meta-reference
    telemetry: console
 image_type: docker
 ```
 The following command allows you to build a Docker image with the name `<name>`
 ```
 llama stack build --config <name>-build.yaml
 Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
 WORKDIR /app
 ...
 Dockerfile created successfully in /tmp/tmp.viA3a3Rdsg/DockerfileFROM python:3.10-slim
 ...
-You can run it with: podman run -p 8000:8000 llamastack-docker-local
+
-Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
+You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
 ```
-After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
+After this step is successful, you should be able to find the built docker image and test it with `llama stack run <path/to/run.yaml>`. 
 As you can see, we did basic configuration above and configured:
 - inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
 - Llama Guard safety shield with model `Llama-Guard-3-1B`
 - Prompt Guard safety shield with model `Prompt-Guard-86M`
 For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
 Note that all configurations as well as models are stored in `~/.llama`
 ## Step 2. Run
 Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
 ```
-llama stack run 8b-instruct
+llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
 ```
 You should see the Llama Stack server start and print the APIs that it is supporting
 ```
-$ llama stack run 8b-instruct
+$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
-> initializing model parallel with size 1
+Loaded model...
-> initializing ddp with size 1
+Serving API datasets
-> initializing pipeline with size 1
+ GET /datasets/get
-Loaded in 19.28 seconds
+ GET /datasets/list
-NCCL version 2.20.5+cuda12.4
+ POST /datasets/register
-Finished model load YES READY
+Serving API inspect
-Serving POST /inference/batch_chat_completion
+ GET /health
-Serving POST /inference/batch_completion
+ GET /providers/list
-Serving POST /inference/chat_completion
+ GET /routes/list
-Serving POST /inference/completion
+Serving API inference
-Serving POST /safety/run_shield
+ POST /inference/chat_completion
-Serving POST /agentic_system/memory_bank/attach
+ POST /inference/completion
-Serving POST /agentic_system/create
+ POST /inference/embeddings
-Serving POST /agentic_system/session/create
+Serving API scoring_functions
-Serving POST /agentic_system/turn/create
+ GET /scoring_functions/get
-Serving POST /agentic_system/delete
+ GET /scoring_functions/list
-Serving POST /agentic_system/session/delete
+ POST /scoring_functions/register
-Serving POST /agentic_system/memory_bank/detach
+Serving API scoring
-Serving POST /agentic_system/session/get
+ POST /scoring/score
-Serving POST /agentic_system/step/get
+ POST /scoring/score_batch
-Serving POST /agentic_system/turn/get
+Serving API memory_banks
-Listening on :::5000
+ GET /memory_banks/get
-INFO:     Started server process [453333]
+ GET /memory_banks/list
 POST /memory_banks/register
 Serving API memory
 POST /memory/insert
 POST /memory/query
 Serving API safety
 POST /safety/run_shield
 Serving API eval
 POST /eval/evaluate
 POST /eval/evaluate_batch
 POST /eval/job/cancel
 GET /eval/job/result
 GET /eval/job/status
 Serving API shields
 GET /shields/get
 GET /shields/list
 POST /shields/register
 Serving API datasetio
 GET /datasetio/get_rows_paginated
 Serving API telemetry
 GET /telemetry/get_trace
 POST /telemetry/log_event
 Serving API models
 GET /models/get
 GET /models/list
 POST /models/register
 Serving API agents
 POST /agents/create
 POST /agents/session/create
 POST /agents/turn/create
 POST /agents/delete
 POST /agents/session/delete
 POST /agents/session/get
 POST /agents/step/get
 POST /agents/turn/get
 Listening on ['::', '0.0.0.0']:5000
 INFO:     Started server process [2935911]
 INFO:     Waiting for application startup.
 INFO:     Application startup complete.
-INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
+INFO:     Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit)
 INFO:     2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
 ```
 > [!NOTE]
 > Configuration is in `~/.llama/builds/local/conda/tgi-run.yaml`. Feel free to increase `max_seq_len`.
 > [!IMPORTANT]
 > The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
 > [!TIP]
 > You might need to use the flag `--disable-ipv6` to  Disable IPv6 support
 This server is running a Llama model locally.