Merge 23484d6159 into 40fdce79b3

2025-06-28 02:53:30 +00:00 · 2025-06-27 11:07:59 +02:00 · 2025-06-27 11:07:59 +02:00 · 4f6ce6f81c
commit 4f6ce6f81c
parent 40fdce79b3 23484d6159
1 changed files with 62 additions and 28 deletions
--- a/docs/source/distributions/building_distro.md
+++ b/docs/source/distributions/building_distro.md
@ -64,10 +64,9 @@ options:
  --template TEMPLATE   Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
  --list-templates      Show the available templates for building a Llama Stack distribution (default: False)
  --image-type {conda,container,venv}
-                        Image Type to use for the build. This can be either conda or container or venv. If not specified, will use the image type from the template config. (default:
+                        Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
                        conda)
  --image-name IMAGE_NAME
-                        [for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active Conda environment will be used if
+                        [for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if
                        found. (default: None)
  --print-deps-only     Print the dependencies for the stack only, without building the stack (default: False)
  --run                 Run the stack after building using the same image type, name, and other applicable arguments (default: False)
@ -89,32 +88,53 @@ llama stack build --list-templates
 ------------------------------+-----------------------------------------------------------------------------+
 | Template Name                | Description                                                                 |
 +------------------------------+-----------------------------------------------------------------------------+
-| hf-serverless                | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
+| watsonx                      | Use watsonx for running LLM inference                                       |
 +------------------------------+-----------------------------------------------------------------------------+
 | together                     | Use Together.AI for running LLM inference                                   |
 +------------------------------+-----------------------------------------------------------------------------+
 | vllm-gpu                     | Use a built-in vLLM engine for running LLM inference                        |
 +------------------------------+-----------------------------------------------------------------------------+
-| experimental-post-training   | Experimental template for post training                                     |
+| together                     | Use Together.AI for running LLM inference                                   |
 +------------------------------+-----------------------------------------------------------------------------+
 | remote-vllm                  | Use (an external) vLLM server for running LLM inference                     |
 +------------------------------+-----------------------------------------------------------------------------+
 | fireworks                    | Use Fireworks.AI for running LLM inference                                  |
 +------------------------------+-----------------------------------------------------------------------------+
 | tgi                          | Use (an external) TGI server for running LLM inference                      |
 +------------------------------+-----------------------------------------------------------------------------+
-| bedrock                      | Use AWS Bedrock for running LLM inference and safety                        |
+| starter                      | Quick start template for running Llama Stack with several popular providers |
 +------------------------------+-----------------------------------------------------------------------------+
-| meta-reference-gpu           | Use Meta Reference for running LLM inference                                |
+| sambanova                    | Use SambaNova for running LLM inference and safety                          |
 +------------------------------+-----------------------------------------------------------------------------+
-| nvidia                       | Use NVIDIA NIM for running LLM inference                                    |
+| remote-vllm                  | Use (an external) vLLM server for running LLM inference                     |
 +------------------------------+-----------------------------------------------------------------------------+
-| cerebras                     | Use Cerebras for running LLM inference                                      |
+| postgres-demo                | Quick start template for running Llama Stack with several popular providers |
 +------------------------------+-----------------------------------------------------------------------------+
 | passthrough                  | Use Passthrough hosted llama-stack endpoint for LLM inference               |
 +------------------------------+-----------------------------------------------------------------------------+
 | open-benchmark               | Distribution for running open benchmarks                                    |
 +------------------------------+-----------------------------------------------------------------------------+
 | ollama                       | Use (an external) Ollama server for running LLM inference                   |
 +------------------------------+-----------------------------------------------------------------------------+
 | nvidia                       | Use NVIDIA NIM for running LLM inference, evaluation and safety             |
 +------------------------------+-----------------------------------------------------------------------------+
 | meta-reference-gpu           | Use Meta Reference for running LLM inference                                |
 +------------------------------+-----------------------------------------------------------------------------+
 | llama_api                    | Distribution for running e2e tests in CI                                    |
 +------------------------------+-----------------------------------------------------------------------------+
 | hf-serverless                | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
 +------------------------------+-----------------------------------------------------------------------------+
 | hf-endpoint                  | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
 +------------------------------+-----------------------------------------------------------------------------+
 | groq                         | Use Groq for running LLM inference                                          |
 +------------------------------+-----------------------------------------------------------------------------+
 | fireworks                    | Use Fireworks.AI for running LLM inference                                  |
 +------------------------------+-----------------------------------------------------------------------------+
 | experimental-post-training   | Experimental template for post training                                     |
 +------------------------------+-----------------------------------------------------------------------------+
 | dell                         | Dell's distribution of Llama Stack. TGI inference via Dell's custom         |
 |                              | container                                                                   |
 +------------------------------+-----------------------------------------------------------------------------+
 | ci-tests                     | Distribution for running e2e tests in CI                                    |
 +------------------------------+-----------------------------------------------------------------------------+
 | cerebras                     | Use Cerebras for running LLM inference                                      |
 +------------------------------+-----------------------------------------------------------------------------+
 | bedrock                      | Use AWS Bedrock for running LLM inference and safety                        |
 +------------------------------+-----------------------------------------------------------------------------+
 ```
 You may then pick a template to build your distribution with providers fitted to your liking.
@ -256,6 +276,7 @@ $ llama stack build --template ollama --image-type container
 ...
 Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim
 ...
 ```
 You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
 ```
@ -305,30 +326,28 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
 ```
 llama stack run -h
-usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
+usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
-                       [--image-type {conda,container,venv}]
+                       [--image-type {conda,venv}] [--enable-ui]
-                       config
+                       [config | template]
 Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
 positional arguments:
-  config                Path to config file to use for the run
+  config | template     Path to config file to use for the run or name of known template (`llama stack list` for a list). (default: None)
 options:
  -h, --help            show this help message and exit
  --port PORT           Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current environment (default: None)
-  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: [])
+  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
-  --tls-keyfile TLS_KEYFILE
+  --image-type {conda,venv}
-                        Path to TLS key file for HTTPS (default: None)
+                        Image Type used during the build. This can be either conda or venv. (default: None)
-  --tls-certfile TLS_CERTFILE
+  --enable-ui           Start the UI server (default: False)
                        Path to TLS certificate file for HTTPS (default: None)
  --image-type {conda,container,venv}
                        Image Type used during the build. This can be either conda or container or venv. (default: conda)
 ```
 **Note:** Container images built with `llama stack build --image-type container` cannot be run using `llama stack run`. Instead, they must be run directly using Docker or Podman commands as shown in the container building section above.
 ```
 # Start using template name
 llama stack run tgi
@ -372,6 +391,7 @@ INFO:     Application startup complete.
 INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
 INFO:     2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
 ```
 ### Listing Distributions
 Using the list command, you can view all existing Llama Stack distributions, including stacks built from templates, from scratch, or using custom configuration files.
@ -391,6 +411,20 @@ Example Usage
 llama stack list
 ```
 ```
 ------------------------------+-----------------------------------------------------------------------------+--------------+------------+
 | Stack Name                  | Path                                                                        | Build Config | Run Config |
 +------------------------------+-----------------------------------------------------------------------------+--------------+------------+
 | together                    | /home/wenzhou/.llama/distributions/together                                 | Yes          | No         |
 +------------------------------+-----------------------------------------------------------------------------+--------------+------------+
 | bedrock                     | /home/wenzhou/.llama/distributions/bedrock                                  | Yes          | No         |
 +------------------------------+-----------------------------------------------------------------------------+--------------+------------+
 | starter                     | /home/wenzhou/.llama/distributions/starter                                  | No           | No         |
 +------------------------------+-----------------------------------------------------------------------------+--------------+------------+
 | remote-vllm                 | /home/wenzhou/.llama/distributions/remote-vllm                              | Yes          | Yes        |
 +------------------------------+-----------------------------------------------------------------------------+--------------+------------+
 ```
 ### Removing a Distribution
 Use the remove command to delete a distribution you've previously built.
@ -413,7 +447,7 @@ Example
 llama stack rm llamastack-test
 ```
-To keep your environment organized and avoid clutter, consider using `llama stack list` to review old or unused distributions and `llama stack rm <name>` to delete them when they’re no longer needed.
+To keep your environment organized and avoid clutter, consider using `llama stack list` to review old or unused distributions and `llama stack rm <name>` to delete them when they're no longer needed.
 ### Troubleshooting