docs: update document to match usage

- llama stack run has changed args/flags - llama stack build has more new templates - add llama stack list Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-06-27 18:50:41 +00:00 · 2025-06-26 16:42:14 +02:00 · 2025-06-26 16:42:14 +02:00 · 23484d6159
commit 23484d6159
parent dbdc811d16
1 changed files with 62 additions and 28 deletions
--- a/docs/source/distributions/building_distro.md
+++ b/docs/source/distributions/building_distro.md
@ -64,10 +64,9 @@ options:
  --template TEMPLATE   Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
  --list-templates      Show the available templates for building a Llama Stack distribution (default: False)
  --image-type {conda,container,venv}
-                        Image Type to use for the build. This can be either conda or container or venv. If not specified, will use the image type from the template config. (default:
-                        conda)
+                        Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
  --image-name IMAGE_NAME
-                        [for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active Conda environment will be used if
+                        [for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if
                        found. (default: None)
  --print-deps-only     Print the dependencies for the stack only, without building the stack (default: False)
  --run                 Run the stack after building using the same image type, name, and other applicable arguments (default: False)
@ -89,32 +88,53 @@ llama stack build --list-templates
 ------------------------------+-----------------------------------------------------------------------------+
 | Template Name                | Description                                                                 |
 +------------------------------+-----------------------------------------------------------------------------+
-| hf-serverless                | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
-+------------------------------+-----------------------------------------------------------------------------+
-| together                     | Use Together.AI for running LLM inference                                   |
+| watsonx                      | Use watsonx for running LLM inference                                       |
 +------------------------------+-----------------------------------------------------------------------------+
 | vllm-gpu                     | Use a built-in vLLM engine for running LLM inference                        |
 +------------------------------+-----------------------------------------------------------------------------+
-| experimental-post-training   | Experimental template for post training                                     |
-+------------------------------+-----------------------------------------------------------------------------+
-| remote-vllm                  | Use (an external) vLLM server for running LLM inference                     |
-+------------------------------+-----------------------------------------------------------------------------+
-| fireworks                    | Use Fireworks.AI for running LLM inference                                  |
+| together                     | Use Together.AI for running LLM inference                                   |
 +------------------------------+-----------------------------------------------------------------------------+
 | tgi                          | Use (an external) TGI server for running LLM inference                      |
 +------------------------------+-----------------------------------------------------------------------------+
-| bedrock                      | Use AWS Bedrock for running LLM inference and safety                        |
+| starter                      | Quick start template for running Llama Stack with several popular providers |
 +------------------------------+-----------------------------------------------------------------------------+
-| meta-reference-gpu           | Use Meta Reference for running LLM inference                                |
+| sambanova                    | Use SambaNova for running LLM inference and safety                          |
 +------------------------------+-----------------------------------------------------------------------------+
-| nvidia                       | Use NVIDIA NIM for running LLM inference                                    |
+| remote-vllm                  | Use (an external) vLLM server for running LLM inference                     |
 +------------------------------+-----------------------------------------------------------------------------+
-| cerebras                     | Use Cerebras for running LLM inference                                      |
+| postgres-demo                | Quick start template for running Llama Stack with several popular providers |
+------------------------------+-----------------------------------------------------------------------------+
+| passthrough                  | Use Passthrough hosted llama-stack endpoint for LLM inference               |
+------------------------------+-----------------------------------------------------------------------------+
+| open-benchmark               | Distribution for running open benchmarks                                    |
 +------------------------------+-----------------------------------------------------------------------------+
 | ollama                       | Use (an external) Ollama server for running LLM inference                   |
 +------------------------------+-----------------------------------------------------------------------------+
+| nvidia                       | Use NVIDIA NIM for running LLM inference, evaluation and safety             |
+------------------------------+-----------------------------------------------------------------------------+
+| meta-reference-gpu           | Use Meta Reference for running LLM inference                                |
+------------------------------+-----------------------------------------------------------------------------+
+| llama_api                    | Distribution for running e2e tests in CI                                    |
+------------------------------+-----------------------------------------------------------------------------+
+| hf-serverless                | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
 | hf-endpoint                  | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
 +------------------------------+-----------------------------------------------------------------------------+
+| groq                         | Use Groq for running LLM inference                                          |
+------------------------------+-----------------------------------------------------------------------------+
+| fireworks                    | Use Fireworks.AI for running LLM inference                                  |
+------------------------------+-----------------------------------------------------------------------------+
+| experimental-post-training   | Experimental template for post training                                     |
+------------------------------+-----------------------------------------------------------------------------+
+| dell                         | Dell's distribution of Llama Stack. TGI inference via Dell's custom         |
+|                              | container                                                                   |
+------------------------------+-----------------------------------------------------------------------------+
+| ci-tests                     | Distribution for running e2e tests in CI                                    |
+------------------------------+-----------------------------------------------------------------------------+
+| cerebras                     | Use Cerebras for running LLM inference                                      |
+------------------------------+-----------------------------------------------------------------------------+
+| bedrock                      | Use AWS Bedrock for running LLM inference and safety                        |
+------------------------------+-----------------------------------------------------------------------------+
 ```

 You may then pick a template to build your distribution with providers fitted to your liking.
@ -256,6 +276,7 @@ $ llama stack build --template ollama --image-type container
 ...
 Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim
 ...
+```

 You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
 ```
@ -305,30 +326,28 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con

 ```
 llama stack run -h
-usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
-                       [--image-type {conda,container,venv}]
-                       config
+usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
+                       [--image-type {conda,venv}] [--enable-ui]
+                       [config | template]

 Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

 positional arguments:
-  config                Path to config file to use for the run
+  config | template     Path to config file to use for the run or name of known template (`llama stack list` for a list). (default: None)

 options:
  -h, --help            show this help message and exit
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current environment (default: None)
-  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: [])
-  --tls-keyfile TLS_KEYFILE
-                        Path to TLS key file for HTTPS (default: None)
-  --tls-certfile TLS_CERTFILE
-                        Path to TLS certificate file for HTTPS (default: None)
-  --image-type {conda,container,venv}
-                        Image Type used during the build. This can be either conda or container or venv. (default: conda)
-
+  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
+  --image-type {conda,venv}
+                        Image Type used during the build. This can be either conda or venv. (default: None)
+  --enable-ui           Start the UI server (default: False)
 ```

+**Note:** Container images built with `llama stack build --image-type container` cannot be run using `llama stack run`. Instead, they must be run directly using Docker or Podman commands as shown in the container building section above.
+
 ```
 # Start using template name
 llama stack run tgi
@ -372,6 +391,7 @@ INFO:     Application startup complete.
 INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
 INFO:     2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
 ```
+
 ### Listing Distributions
 Using the list command, you can view all existing Llama Stack distributions, including stacks built from templates, from scratch, or using custom configuration files.

@ -391,6 +411,20 @@ Example Usage
 llama stack list
 ```

+```
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
+| Stack Name                  | Path                                                                        | Build Config | Run Config |
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
+| together                    | /home/wenzhou/.llama/distributions/together                                 | Yes          | No         |
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
+| bedrock                     | /home/wenzhou/.llama/distributions/bedrock                                  | Yes          | No         |
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
+| starter                     | /home/wenzhou/.llama/distributions/starter                                  | No           | No         |
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
+| remote-vllm                 | /home/wenzhou/.llama/distributions/remote-vllm                              | Yes          | Yes        |
+------------------------------+-----------------------------------------------------------------------------+--------------+------------+
+```
+
 ### Removing a Distribution
 Use the remove command to delete a distribution you've previously built.

@ -413,7 +447,7 @@ Example
 llama stack rm llamastack-test
 ```

-To keep your environment organized and avoid clutter, consider using `llama stack list` to review old or unused distributions and `llama stack rm <name>` to delete them when they’re no longer needed.
+To keep your environment organized and avoid clutter, consider using `llama stack list` to review old or unused distributions and `llama stack rm <name>` to delete them when they're no longer needed.

 ### Troubleshooting