CLI Update: build -> configure -> run (#69)

* remove configure from build * remove config from build * configure to regenerate file * update memory providers * remove comments * udpate build script * add reedme * update doc * rename getting started * update build cli * update docker build script * configure update * clean up configure * [tmp fix] hardware requirement tmp fix * clean up build * fix configure * add example build files for conda & docker * remove resolve_distribution_spec * remove available_distribution_specs * example build files * update example build files * more clean up on build * add name args to override name * move distribution to yaml files * generate distribution specs * getting started guide * getting started * add build yaml to Dockerfile * cleanup distribution_dependencies * configure from docker image name * build relative paths * minor comment * getting started * Update getting_started.md * Update getting_started.md * address comments, configure within docker file * remove distribution types! * update getting started * update documentation * remove listing distribution * minor heading * address nits, remove docker_image=null * gitignore
2024-09-16 11:02:26 -07:00 · 2024-09-16 11:02:26 -07:00 · d9147f3184
commit d9147f3184
parent 73b71d9689
27 changed files with 759 additions and 512 deletions
--- a/docs/cli_reference.md
+++ b/docs/cli_reference.md
@ -236,151 +236,156 @@ These commands can help understand the model interface and how prompts / message
 **NOTE**: Outputs in terminal are color printed to show special tokens.


-## Step 3: Listing, Building, and Configuring Llama Stack Distributions
+## Step 3: Building, and Configuring Llama Stack Distributions

+- Please see our [Getting Started](getting_started.md) guide for details.

-### Step 3.1: List available distributions
-
-Let’s start with listing available distributions:
+### Step 3.1. Build
+In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
+- `name`: the name for our distribution (e.g. `8b-instruct`)
+- `image_type`: our build image type (`conda | docker`)
+- `distribution_spec`: our distribution specs for specifying API providers
+  - `description`: a short description of the configurations for the distribution
+  - `providers`: specifies the underlying implementation for serving each API endpoint
+  - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.

+#### Build a local distribution with conda
+The following command and specifications allows you to get started with building.
 ```
-llama stack list-distributions
+llama stack build <path/to/config>
 ```
+- You will be required to pass in a file path to the build.config file (e.g. `./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml`). We provide some example build config files for configuring different types of distributions in the `./llama_toolchain/configs/distributions/` folder.

-<pre style="font-family: monospace;">
-i+-------------------------------+---------------------------------------+----------------------------------------------------------------------+
-| Distribution Type              | Providers                             | Description                                                          |
-+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
-| local                          | {                                     | Use code from `llama_toolchain` itself to serve all llama stack APIs |
-|                                |   "inference": "meta-reference",      |                                                                      |
-|                                |   "memory": "meta-reference-faiss",   |                                                                      |
-|                                |   "safety": "meta-reference",         |                                                                      |
-|                                |   "agentic_system": "meta-reference"  |                                                                      |
-|                                | }                                     |                                                                      |
-+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
-| remote                         | {                                     | Point to remote services for all llama stack APIs                    |
-|                                |   "inference": "remote",              |                                                                      |
-|                                |   "safety": "remote",                 |                                                                      |
-|                                |   "agentic_system": "remote",         |                                                                      |
-|                                |   "memory": "remote"                  |                                                                      |
-|                                | }                                     |                                                                      |
-+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
-| local-ollama                   | {                                     | Like local, but use ollama for running LLM inference                 |
-|                                |   "inference": "remote::ollama",      |                                                                      |
-|                                |   "safety": "meta-reference",         |                                                                      |
-|                                |   "agentic_system": "meta-reference", |                                                                      |
-|                                |   "memory": "meta-reference-faiss"    |                                                                      |
-|                                | }                                     |                                                                      |
-+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
-| local-plus-fireworks-inference | {                                     | Use Fireworks.ai for running LLM inference                           |
-|                                |   "inference": "remote::fireworks",   |                                                                      |
-|                                |   "safety": "meta-reference",         |                                                                      |
-|                                |   "agentic_system": "meta-reference", |                                                                      |
-|                                |   "memory": "meta-reference-faiss"    |                                                                      |
-|                                | }                                     |                                                                      |
-+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
-| local-plus-together-inference  | {                                     | Use Together.ai for running LLM inference                            |
-|                                |   "inference": "remote::together",    |                                                                      |
-|                                |   "safety": "meta-reference",         |                                                                      |
-|                                |   "agentic_system": "meta-reference", |                                                                      |
-|                                |   "memory": "meta-reference-faiss"    |                                                                      |
-|                                | }                                     |                                                                      |
-+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
-| local-plus-tgi-inference       | {                                     | Use TGI (local or with [Hugging Face Inference Endpoints](https://   |
-|                                |   "inference": "remote::tgi",         | huggingface.co/inference-endpoints/dedicated)) for running LLM       |
-|                                |   "safety": "meta-reference",         | inference. When using HF Inference Endpoints, you must provide the   |
-|                                |   "agentic_system": "meta-reference", | name of the endpoint.                                                |
-|                                |   "memory": "meta-reference-faiss"    |                                                                      |
-|                                | }                                     |                                                                      |
-+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
-</pre>
-
-As you can see above, each “distribution” details the “providers” it is composed of. For example, `local` uses the “meta-reference” provider for inference while local-ollama relies on a different provider (Ollama) for inference. Similarly, you can use Fireworks or Together.AI for running inference as well.
-
-### Step 3.2: Build a distribution
-
-Let's imagine you are working with a 8B-Instruct model. The following command will build a package (in the form of a Conda environment) _and_ configure it. As part of the configuration, you will be asked for some inputs (model_id, max_seq_len, etc.) Since we are working with a 8B model, we will name our build `8b-instruct` to help us remember the config.
-
-```
-llama stack build
+The file will be of the contents
 ```
+$ cat ./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml

-Once it runs, you will be prompted to enter build name and optional arguments, and should see some outputs in the form:
-
-```
-$ llama stack build
-Enter value for name (required): 8b-instruct
-Enter value for distribution (default: local) (required): local
-Enter value for api_providers (optional):
-Enter value for image_type (default: conda) (required):
-
-....
-....
-Successfully installed cfgv-3.4.0 distlib-0.3.8 identify-2.6.0 libcst-1.4.0 llama_toolchain-0.0.2 moreorless-0.4.0 nodeenv-1.9.1 pre-commit-3.8.0 stdlibs-2024.5.15 toml-0.10.2 tomlkit-0.13.0 trailrunner-1.4.0 ufmt-2.7.0 usort-1.0.8 virtualenv-20.26.3
-
-Successfully setup conda environment. Configuring build...
-
-...
-...
-
-YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml
-Target `8b-test` built with configuration at /home/xiyan/.llama/builds/local/conda/8b-test.yaml
-Build spec configuration saved at /home/xiyan/.llama/distributions/local/conda/8b-test-build.yaml
-```
-
-You can re-build package based on build config
-```
-$ cat ~/.llama/distributions/local/conda/8b-instruct-build.yaml
 name: 8b-instruct
-distribution: local
-api_providers: null
+distribution_spec:
+  distribution_type: local
+  description: Use code from `llama_toolchain` itself to serve all llama stack APIs
+  docker_image: null
+  providers:
+    inference: meta-reference
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
 image_type: conda
+```

-$ llama stack build --config ~/.llama/distributions/local/conda/8b-instruct-build.yaml
-
-Successfully setup conda environment. Configuring build...
-
+You may run the `llama stack build` command to generate your distribution with `--name` to override the name for your distribution.
+```
+$ llama stack build ~/.llama/distributions/conda/8b-instruct-build.yaml --name 8b-instruct
 ...
 ...
-
-YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml
-Target `8b-instruct` built with configuration at ~/.llama/builds/local/conda/8b-instruct.yaml
-Build spec configuration saved at ~/.llama/distributions/local/conda/8b-instruct-build.yaml
+Build spec configuration saved at ~/.llama/distributions/conda/8b-instruct-build.yaml
 ```

-### Step 3.3: Configure a distribution
+After this step is complete, a file named `8b-instruct-build.yaml` will be generated and saved at `~/.llama/distributions/conda/8b-instruct-build.yaml`.

-You can re-configure this distribution by running:
-```
-llama stack configure ~/.llama/builds/local/conda/8b-instruct.yaml
-```

-Here is an example run of how the CLI will guide you to fill the configuration
+#### How to build distribution with different API providers using configs
+To specify a different API provider, we can change the `distribution_spec` in our `<name>-build.yaml` config. For example, the following build spec allows you to build a distribution using TGI as the inference API provider.

 ```
-$ llama stack configure ~/.llama/builds/local/conda/8b-instruct.yaml
+$ cat ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml
+
+name: local-tgi-conda-example
+distribution_spec:
+  description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
+  docker_image: null
+  providers:
+    inference: remote::tgi
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
+```
+
+The following command allows you to build a distribution with TGI as the inference API provider, with the name `tgi`.
+```
+llama stack build --config ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml --name tgi
+```
+
+We provide some example build configs to help you get started with building with different API providers.
+
+#### How to build distribution with Docker image
+To build a docker image, simply change the `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build --config <name>-build.yaml`.
+
+```
+$ cat ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml
+
+name: local-docker-example
+distribution_spec:
+  description: Use code from `llama_toolchain` itself to serve all llama stack APIs
+  docker_image: null
+  providers:
+    inference: meta-reference
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: docker
+```
+
+The following command allows you to build a Docker image with the name `docker-local`
+```
+llama stack build --config ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml --name docker-local
+
+Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
+WORKDIR /app
+...
+...
+You can run it with: podman run -p 8000:8000 llamastack-docker-local
+Build spec configuration saved at /home/xiyan/.llama/distributions/docker/docker-local-build.yaml
+```
+
+### Step 3.2. Configure
+After our distribution is built (either in form of docker or conda environment), we will run the following command to
+```
+llama stack configure [<path/to/name.build.yaml> | <docker-image-name>]
+```
+- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
+- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
+   - Run `docker images` to check list of available images on your machine.
+
+```
+$ llama stack configure ~/.llama/distributions/conda/8b-instruct-build.yaml

 Configuring API: inference (meta-reference)
-Enter value for model (required): Meta-Llama3.1-8B-Instruct
+Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
 Enter value for quantization (optional):
 Enter value for torch_seed (optional):
-Enter value for max_seq_len (required): 4096
-Enter value for max_batch_size (default: 1): 1
+Enter value for max_seq_len (existing: 4096) (required):
+Enter value for max_batch_size (existing: 1) (required):
+
+Configuring API: memory (meta-reference-faiss)
+
 Configuring API: safety (meta-reference)
 Do you want to configure llama_guard_shield? (y/n): y
 Entering sub-configuration for llama_guard_shield:
-Enter value for model (required): Llama-Guard-3-8B
-Enter value for excluded_categories (required): []
-Enter value for disable_input_check (default: False):
-Enter value for disable_output_check (default: False):
+Enter value for model (default: Llama-Guard-3-8B) (required):
+Enter value for excluded_categories (default: []) (required):
+Enter value for disable_input_check (default: False) (required):
+Enter value for disable_output_check (default: False) (required):
 Do you want to configure prompt_guard_shield? (y/n): y
 Entering sub-configuration for prompt_guard_shield:
-Enter value for model (required): Prompt-Guard-86M
-...
-...
-YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml
+Enter value for model (default: Prompt-Guard-86M) (required):
+
+Configuring API: agentic_system (meta-reference)
+Enter value for brave_search_api_key (optional):
+Enter value for bing_search_api_key (optional):
+Enter value for wolfram_api_key (optional):
+
+Configuring API: telemetry (console)
+
+YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml
 ```

+After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings.
+
 As you can see, we did basic configuration above and configured:
 - inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
 - Llama Guard safety shield with model `Llama-Guard-3-8B`
@ -390,21 +395,18 @@ For how these configurations are stored as yaml, checkout the file printed at th

 Note that all configurations as well as models are stored in `~/.llama`

-## Step 4: Starting a Llama Stack Distribution and Testing it

-### Step 4.1: Starting a distribution
-
-Now let’s start Llama Stack Distribution Server.
-
-You need the YAML configuration file which was written out at the end by the `llama stack build` step.
+### Step 3.3. Run
+Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.

 ```
-llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml --port 5000
+llama stack run ~/.llama/builds/conda/8b-instruct-run.yaml
 ```
-You should see the Stack server start and print the APIs that it is supporting,
+
+You should see the Llama Stack server start and print the APIs that it is supporting

 ```
-$ llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml --port 5000
+$ llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml

 > initializing model parallel with size 1
 > initializing ddp with size 1
@ -434,7 +436,6 @@ INFO:     Application startup complete.
 INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
 ```

-
 > [!NOTE]
 > Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`.

@ -443,9 +444,8 @@ INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)

 This server is running a Llama model locally.

-### Step 4.2: Test the distribution
-
-Lets test with a client.
+### Step 3.4 Test with Client
+Once the server is setup, we can test it with a client to see the example outputs.
 ```
 cd /path/to/llama-stack
 conda activate <env>  # any environment containing the llama-toolchain pip package will work
@ -456,17 +456,19 @@ python -m llama_toolchain.inference.client localhost 5000
 This will run the chat completion client and query the distribution’s /inference/chat_completion API.

 Here is an example output:
-<pre style="font-family: monospace;">
+```
 Initializing client for http://localhost:5000
 User>hello world, troll me in two-paragraphs about 42

 Assistant> You think you're so smart, don't you? You think you can just waltz in here and ask about 42, like it's some kind of trivial matter. Well, let me tell you, 42 is not just a number, it's a way of life. It's the answer to the ultimate question of life, the universe, and everything, according to Douglas Adams' magnum opus, "The Hitchhiker's Guide to the Galaxy". But do you know what's even more interesting about 42? It's that it's not actually the answer to anything, it's just a number that some guy made up to sound profound.

 You know what's even more hilarious? People like you who think they can just Google "42" and suddenly become experts on the subject. Newsflash: you're not a supercomputer, you're just a human being with a fragile ego and a penchant for thinking you're smarter than you actually are. 42 is just a number, a meaningless collection of digits that holds no significance whatsoever. So go ahead, keep thinking you're so clever, but deep down, you're just a pawn in the grand game of life, and 42 is just a silly little number that's been used to make you feel like you're part of something bigger than yourself. Ha!
-</pre>
+```

 Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:

 ```
 python -m llama_toolchain.safety.client localhost 5000
 ```
+
+You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/sdk_examples) repo.
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@ -0,0 +1,317 @@
+# Getting Started
+
+The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-toolchain` package.
+
+This guides allows you to quickly get started with building and running a Llama Stack server in < 5 minutes!
+
+## Quick Cheatsheet
+- Quick 3 line command to build and start a LlamaStack server using our Meta Reference implementation for all API endpoints with `conda` as build type.
+
+**`llama stack build`**
+```
+llama stack build --config ./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml --name my-local-llama-stack
+...
+...
+Build spec configuration saved at ~/.llama/distributions/conda/my-local-llama-stack-build.yaml
+```
+
+**`llama stack configure`**
+```
+llama stack configure ~/.llama/distributions/conda/my-local-llama-stack-build.yaml
+
+Configuring API: inference (meta-reference)
+Enter value for model (default: Meta-Llama3.1-8B-Instruct) (required):
+Enter value for quantization (optional):
+Enter value for torch_seed (optional):
+Enter value for max_seq_len (required): 4096
+Enter value for max_batch_size (default: 1) (required):
+
+Configuring API: memory (meta-reference-faiss)
+
+Configuring API: safety (meta-reference)
+Do you want to configure llama_guard_shield? (y/n): n
+Do you want to configure prompt_guard_shield? (y/n): n
+
+Configuring API: agentic_system (meta-reference)
+Enter value for brave_search_api_key (optional):
+Enter value for bing_search_api_key (optional):
+Enter value for wolfram_api_key (optional):
+
+Configuring API: telemetry (console)
+
+YAML configuration has been written to ~/.llama/builds/conda/my-local-llama-stack-run.yaml
+```
+
+**`llama stack run`**
+```
+llama stack run ~/.llama/builds/conda/my-local-llama-stack-run.yaml
+
+...
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+...
+Finished model load YES READY
+Serving POST /inference/chat_completion
+Serving POST /inference/completion
+Serving POST /inference/embeddings
+Serving POST /memory_banks/create
+Serving DELETE /memory_bank/documents/delete
+Serving DELETE /memory_banks/drop
+Serving GET /memory_bank/documents/get
+Serving GET /memory_banks/get
+Serving POST /memory_bank/insert
+Serving GET /memory_banks/list
+Serving POST /memory_bank/query
+Serving POST /memory_bank/update
+Serving POST /safety/run_shields
+Serving POST /agentic_system/create
+Serving POST /agentic_system/session/create
+Serving POST /agentic_system/turn/create
+Serving POST /agentic_system/delete
+Serving POST /agentic_system/session/delete
+Serving POST /agentic_system/session/get
+Serving POST /agentic_system/step/get
+Serving POST /agentic_system/turn/get
+Serving GET /telemetry/get_trace
+Serving POST /telemetry/log_event
+Listening on :::5000
+INFO:     Started server process [587053]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
+```
+
+
+## Step 1. Build
+In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
+- `name`: the name for our distribution (e.g. `8b-instruct`)
+- `image_type`: our build image type (`conda | docker`)
+- `distribution_spec`: our distribution specs for specifying API providers
+  - `description`: a short description of the configurations for the distribution
+  - `providers`: specifies the underlying implementation for serving each API endpoint
+  - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
+
+#### Build a local distribution with conda
+The following command and specifications allows you to get started with building.
+```
+llama stack build <path/to/config>
+```
+- You will be required to pass in a file path to the build.config file (e.g. `./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml`). We provide some example build config files for configuring different types of distributions in the `./llama_toolchain/configs/distributions/` folder.
+
+The file will be of the contents
+```
+$ cat ./llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml
+
+name: 8b-instruct
+distribution_spec:
+  distribution_type: local
+  description: Use code from `llama_toolchain` itself to serve all llama stack APIs
+  docker_image: null
+  providers:
+    inference: meta-reference
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
+```
+
+You may run the `llama stack build` command to generate your distribution with `--name` to override the name for your distribution.
+```
+$ llama stack build ~/.llama/distributions/conda/8b-instruct-build.yaml --name 8b-instruct
+...
+...
+Build spec configuration saved at ~/.llama/distributions/conda/8b-instruct-build.yaml
+```
+
+After this step is complete, a file named `8b-instruct-build.yaml` will be generated and saved at `~/.llama/distributions/conda/8b-instruct-build.yaml`.
+
+
+#### How to build distribution with different API providers using configs
+To specify a different API provider, we can change the `distribution_spec` in our `<name>-build.yaml` config. For example, the following build spec allows you to build a distribution using TGI as the inference API provider.
+
+```
+$ cat ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml
+
+name: local-tgi-conda-example
+distribution_spec:
+  description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
+  docker_image: null
+  providers:
+    inference: remote::tgi
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
+```
+
+The following command allows you to build a distribution with TGI as the inference API provider, with the name `tgi`.
+```
+llama stack build --config ./llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml --name tgi
+```
+
+We provide some example build configs to help you get started with building with different API providers.
+
+#### How to build distribution with Docker image
+To build a docker image, simply change the `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build --config <name>-build.yaml`.
+
+```
+$ cat ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml
+
+name: local-docker-example
+distribution_spec:
+  description: Use code from `llama_toolchain` itself to serve all llama stack APIs
+  docker_image: null
+  providers:
+    inference: meta-reference
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: docker
+```
+
+The following command allows you to build a Docker image with the name `docker-local`
+```
+llama stack build --config ./llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml --name docker-local
+
+Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
+WORKDIR /app
+...
+...
+You can run it with: podman run -p 8000:8000 llamastack-docker-local
+Build spec configuration saved at /home/xiyan/.llama/distributions/docker/docker-local-build.yaml
+```
+
+## Step 2. Configure
+After our distribution is built (either in form of docker or conda environment), we will run the following command to
+```
+llama stack configure [<path/to/name.build.yaml> | <docker-image-name>]
+```
+- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
+- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
+   - Run `docker images` to check list of available images on your machine.
+
+```
+$ llama stack configure ~/.llama/distributions/conda/8b-instruct-build.yaml
+
+Configuring API: inference (meta-reference)
+Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
+Enter value for quantization (optional):
+Enter value for torch_seed (optional):
+Enter value for max_seq_len (existing: 4096) (required):
+Enter value for max_batch_size (existing: 1) (required):
+
+Configuring API: memory (meta-reference-faiss)
+
+Configuring API: safety (meta-reference)
+Do you want to configure llama_guard_shield? (y/n): y
+Entering sub-configuration for llama_guard_shield:
+Enter value for model (default: Llama-Guard-3-8B) (required):
+Enter value for excluded_categories (default: []) (required):
+Enter value for disable_input_check (default: False) (required):
+Enter value for disable_output_check (default: False) (required):
+Do you want to configure prompt_guard_shield? (y/n): y
+Entering sub-configuration for prompt_guard_shield:
+Enter value for model (default: Prompt-Guard-86M) (required):
+
+Configuring API: agentic_system (meta-reference)
+Enter value for brave_search_api_key (optional):
+Enter value for bing_search_api_key (optional):
+Enter value for wolfram_api_key (optional):
+
+Configuring API: telemetry (console)
+
+YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml
+```
+
+After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings.
+
+As you can see, we did basic configuration above and configured:
+- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
+- Llama Guard safety shield with model `Llama-Guard-3-8B`
+- Prompt Guard safety shield with model `Prompt-Guard-86M`
+
+For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
+
+Note that all configurations as well as models are stored in `~/.llama`
+
+
+## Step 3. Run
+Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
+
+```
+llama stack run ~/.llama/builds/conda/8b-instruct-run.yaml
+```
+
+You should see the Llama Stack server start and print the APIs that it is supporting
+
+```
+$ llama stack run ~/.llama/builds/local/conda/8b-instruct.yaml
+
+> initializing model parallel with size 1
+> initializing ddp with size 1
+> initializing pipeline with size 1
+Loaded in 19.28 seconds
+NCCL version 2.20.5+cuda12.4
+Finished model load YES READY
+Serving POST /inference/batch_chat_completion
+Serving POST /inference/batch_completion
+Serving POST /inference/chat_completion
+Serving POST /inference/completion
+Serving POST /safety/run_shields
+Serving POST /agentic_system/memory_bank/attach
+Serving POST /agentic_system/create
+Serving POST /agentic_system/session/create
+Serving POST /agentic_system/turn/create
+Serving POST /agentic_system/delete
+Serving POST /agentic_system/session/delete
+Serving POST /agentic_system/memory_bank/detach
+Serving POST /agentic_system/session/get
+Serving POST /agentic_system/step/get
+Serving POST /agentic_system/turn/get
+Listening on :::5000
+INFO:     Started server process [453333]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
+```
+
+> [!NOTE]
+> Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`.
+
+> [!IMPORTANT]
+> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
+
+This server is running a Llama model locally.
+
+## Step 4. Test with Client
+Once the server is setup, we can test it with a client to see the example outputs.
+```
+cd /path/to/llama-stack
+conda activate <env>  # any environment containing the llama-toolchain pip package will work
+
+python -m llama_toolchain.inference.client localhost 5000
+```
+
+This will run the chat completion client and query the distribution’s /inference/chat_completion API.
+
+Here is an example output:
+```
+Initializing client for http://localhost:5000
+User>hello world, troll me in two-paragraphs about 42
+
+Assistant> You think you're so smart, don't you? You think you can just waltz in here and ask about 42, like it's some kind of trivial matter. Well, let me tell you, 42 is not just a number, it's a way of life. It's the answer to the ultimate question of life, the universe, and everything, according to Douglas Adams' magnum opus, "The Hitchhiker's Guide to the Galaxy". But do you know what's even more interesting about 42? It's that it's not actually the answer to anything, it's just a number that some guy made up to sound profound.
+
+You know what's even more hilarious? People like you who think they can just Google "42" and suddenly become experts on the subject. Newsflash: you're not a supercomputer, you're just a human being with a fragile ego and a penchant for thinking you're smarter than you actually are. 42 is just a number, a meaningless collection of digits that holds no significance whatsoever. So go ahead, keep thinking you're so clever, but deep down, you're just a pawn in the grand game of life, and 42 is just a silly little number that's been used to make you feel like you're part of something bigger than yourself. Ha!
+```
+
+Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:
+
+```
+python -m llama_toolchain.safety.client localhost 5000
+```
+
+You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/sdk_examples) repo.
--- a/llama_toolchain/cli/stack/build.py
+++ b/llama_toolchain/cli/stack/build.py
@ -8,33 +8,11 @@ import argparse

 from llama_toolchain.cli.subcommand import Subcommand
 from llama_toolchain.core.datatypes import *  # noqa: F403
+from pathlib import Path
+
 import yaml


-def parse_api_provider_tuples(
-    tuples: str, parser: argparse.ArgumentParser
-) -> Dict[str, ProviderSpec]:
-    from llama_toolchain.core.distribution import api_providers
-
-    all_providers = api_providers()
-
-    deps = {}
-    for dep in tuples.split(","):
-        dep = dep.strip()
-        if not dep:
-            continue
-        api_str, provider = dep.split("=")
-        api = Api(api_str)
-
-        provider = provider.strip()
-        if provider not in all_providers[api]:
-            parser.error(f"Provider `{provider}` is not available for API `{api}`")
-            return
-        deps[api] = all_providers[api][provider]
-
-    return deps
-
-
 class StackBuild(Subcommand):
    def __init__(self, subparsers: argparse._SubParsersAction):
        super().__init__()
@ -48,16 +26,16 @@ class StackBuild(Subcommand):
        self.parser.set_defaults(func=self._run_stack_build_command)

    def _add_arguments(self):
-        from llama_toolchain.core.distribution_registry import (
-            available_distribution_specs,
-        )
-        from llama_toolchain.core.package import ImageType
-
-        allowed_ids = [d.distribution_type for d in available_distribution_specs()]
        self.parser.add_argument(
-            "--config",
+            "config",
            type=str,
-            help="Path to a config file to use for the build",
+            help="Path to a config file to use for the build. You may find example configs in llama_toolchain/configs/distributions",
+        )
+
+        self.parser.add_argument(
+            "--name",
+            type=str,
+            help="Name of the llama stack build to override from template config",
        )

    def _run_stack_build_command_from_build_config(
@ -68,69 +46,19 @@ class StackBuild(Subcommand):

        from llama_toolchain.common.config_dirs import DISTRIBS_BASE_DIR
        from llama_toolchain.common.serialize import EnumEncoder
-        from llama_toolchain.core.distribution_registry import resolve_distribution_spec
        from llama_toolchain.core.package import ApiInput, build_package, ImageType
        from termcolor import cprint

-        api_inputs = []
-        if build_config.distribution == "adhoc":
-            if not build_config.api_providers:
-                self.parser.error(
-                    "You must specify API providers with (api=provider,...) for building an adhoc distribution"
-                )
-                return
-
-            parsed = parse_api_provider_tuples(build_config.api_providers, self.parser)
-            for api, provider_spec in parsed.items():
-                for dep in provider_spec.api_dependencies:
-                    if dep not in parsed:
-                        self.parser.error(
-                            f"API {api} needs dependency {dep} provided also"
-                        )
-                        return
-
-                api_inputs.append(
-                    ApiInput(
-                        api=api,
-                        provider=provider_spec.provider_type,
-                    )
-                )
-            docker_image = None
-        else:
-            if build_config.api_providers:
-                self.parser.error(
-                    "You cannot specify API providers for pre-registered distributions"
-                )
-                return
-
-            dist = resolve_distribution_spec(build_config.distribution)
-            if dist is None:
-                self.parser.error(
-                    f"Could not find distribution {build_config.distribution}"
-                )
-                return
-
-            for api, provider_type in dist.providers.items():
-                api_inputs.append(
-                    ApiInput(
-                        api=api,
-                        provider=provider_type,
-                    )
-                )
-            docker_image = dist.docker_image
-
-        build_package(
-            api_inputs,
-            image_type=ImageType(build_config.image_type),
-            name=build_config.name,
-            distribution_type=build_config.distribution,
-            docker_image=docker_image,
-        )
-
        # save build.yaml spec for building same distribution again
+        if build_config.image_type == ImageType.docker.value:
+            # docker needs build file to be in the llama-stack repo dir to be able to copy over to the image
+            llama_toolchain_path = Path(os.path.relpath(__file__)).parent.parent.parent
            build_dir = (
-            DISTRIBS_BASE_DIR / build_config.distribution / build_config.image_type
+                llama_toolchain_path / "configs/distributions" / build_config.image_type
            )
+        else:
+            build_dir = DISTRIBS_BASE_DIR / build_config.image_type
+
        os.makedirs(build_dir, exist_ok=True)
        build_file_path = build_dir / f"{build_config.name}-build.yaml"

@ -138,6 +66,8 @@ class StackBuild(Subcommand):
            to_write = json.loads(json.dumps(build_config.dict(), cls=EnumEncoder))
            f.write(yaml.dump(to_write, sort_keys=False))

+        build_package(build_config, build_file_path)
+
        cprint(
            f"Build spec configuration saved at {str(build_file_path)}",
            color="green",
@ -147,15 +77,18 @@ class StackBuild(Subcommand):
        from llama_toolchain.common.prompt_for_config import prompt_for_config
        from llama_toolchain.core.dynamic import instantiate_class_type

-        if args.config:
+        if not args.config:
+            self.parser.error(
+                "No config file specified. Please use `llama stack build /path/to/*-build.yaml`. Example config files can be found in llama_toolchain/configs/distributions"
+            )
+            return
+
        with open(args.config, "r") as f:
            try:
                build_config = BuildConfig(**yaml.safe_load(f))
            except Exception as e:
                self.parser.error(f"Could not parse config file {args.config}: {e}")
                return
-                self._run_stack_build_command_from_build_config(build_config)
-            return
-
-        build_config = prompt_for_config(BuildConfig, None)
+            if args.name:
+                build_config.name = args.name
            self._run_stack_build_command_from_build_config(build_config)
--- a/llama_toolchain/cli/stack/configure.py
+++ b/llama_toolchain/cli/stack/configure.py
@ -8,12 +8,18 @@ import argparse
 import json
 from pathlib import Path

-import yaml
+import pkg_resources

+import yaml
 from llama_toolchain.cli.subcommand import Subcommand
 from llama_toolchain.common.config_dirs import BUILDS_BASE_DIR
+
+from llama_toolchain.common.exec import run_with_pty
 from termcolor import cprint
 from llama_toolchain.core.datatypes import *  # noqa: F403
+import os
+
+from termcolor import cprint


 class StackConfigure(Subcommand):
@ -31,49 +37,107 @@ class StackConfigure(Subcommand):
        self.parser.set_defaults(func=self._run_stack_configure_cmd)

    def _add_arguments(self):
-        from llama_toolchain.core.distribution_registry import (
-            available_distribution_specs,
-        )
-        from llama_toolchain.core.package import ImageType
-
-        allowed_ids = [d.distribution_type for d in available_distribution_specs()]
        self.parser.add_argument(
            "config",
            type=str,
-            help="Path to the package config file (e.g. ~/.llama/builds/<distribution>/<image_type>/<name>.yaml)",
+            help="Path to the build config file (e.g. ~/.llama/builds/<image_type>/<name>-build.yaml). For docker, this could also be the name of the docker image. ",
+        )
+
+        self.parser.add_argument(
+            "--output-dir",
+            type=str,
+            help="Path to the output directory to store generated run.yaml config file. If not specified, will use ~/.llama/build/<image_type>/<name>-run.yaml",
        )

    def _run_stack_configure_cmd(self, args: argparse.Namespace) -> None:
        from llama_toolchain.core.package import ImageType

-        config_file = Path(args.config)
-        if not config_file.exists():
+        docker_image = None
+        build_config_file = Path(args.config)
+        if not build_config_file.exists():
+            cprint(
+                f"Could not find {build_config_file}. Trying docker image name instead...",
+                color="green",
+            )
+            docker_image = args.config
+
+            builds_dir = BUILDS_BASE_DIR / ImageType.docker.value
+            if args.output_dir:
+                builds_dir = Path(output_dir)
+            os.makedirs(builds_dir, exist_ok=True)
+
+            script = pkg_resources.resource_filename(
+                "llama_toolchain", "core/configure_container.sh"
+            )
+            script_args = [script, docker_image, str(builds_dir)]
+
+            return_code = run_with_pty(script_args)
+
+            # we have regenerated the build config file with script, now check if it exists
+            if return_code != 0:
                self.parser.error(
-                f"Could not find {config_file}. Please run `llama stack build` first"
+                    f"Can not find {build_config_file}. Please run llama stack build first or check if docker image exists"
+                )
+
+            build_name = docker_image.removeprefix("llamastack-")
+            cprint(
+                f"YAML configuration has been written to {builds_dir / f'{build_name}-run.yaml'}",
+                color="green",
            )
            return

-        configure_llama_distribution(config_file)
+        with open(build_config_file, "r") as f:
+            build_config = BuildConfig(**yaml.safe_load(f))

+        self._configure_llama_distribution(build_config, args.output_dir)

-def configure_llama_distribution(config_file: Path) -> None:
+    def _configure_llama_distribution(
+        self,
+        build_config: BuildConfig,
+        output_dir: Optional[str] = None,
+    ):
        from llama_toolchain.common.serialize import EnumEncoder
        from llama_toolchain.core.configure import configure_api_providers

-    with open(config_file, "r") as f:
-        config = PackageConfig(**yaml.safe_load(f))
+        builds_dir = BUILDS_BASE_DIR / build_config.image_type
+        if output_dir:
+            builds_dir = Path(output_dir)
+        os.makedirs(builds_dir, exist_ok=True)
+        package_name = build_config.name.replace("::", "-")
+        package_file = builds_dir / f"{package_name}-run.yaml"

-    if config.providers:
+        api2providers = build_config.distribution_spec.providers
+
+        stub_config = {
+            api_str: {"provider_type": provider}
+            for api_str, provider in api2providers.items()
+        }
+
+        if package_file.exists():
            cprint(
-            f"Configuration already exists for {config.distribution_type}. Will overwrite...",
+                f"Configuration already exists for {build_config.name}. Will overwrite...",
                "yellow",
                attrs=["bold"],
            )
+            config = PackageConfig(**yaml.safe_load(package_file.read_text()))
+        else:
+            config = PackageConfig(
+                built_at=datetime.now(),
+                package_name=package_name,
+                providers=stub_config,
+            )

        config.providers = configure_api_providers(config.providers)
+        config.docker_image = (
+            package_name if build_config.image_type == "docker" else None
+        )
+        config.conda_env = package_name if build_config.image_type == "conda" else None

-    with open(config_file, "w") as fp:
+        with open(package_file, "w") as f:
            to_write = json.loads(json.dumps(config.dict(), cls=EnumEncoder))
-        fp.write(yaml.dump(to_write, sort_keys=False))
+            f.write(yaml.dump(to_write, sort_keys=False))

-    print(f"YAML configuration has been written to {config_file}")
+        cprint(
+            f"> YAML configuration has been written to {package_file}",
+            color="blue",
+        )
--- a/llama_toolchain/cli/stack/list_distributions.py
+++ b/llama_toolchain/cli/stack/list_distributions.py
@ -1,55 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import argparse
-import json
-
-from llama_toolchain.cli.subcommand import Subcommand
-
-
-class StackListDistributions(Subcommand):
-    def __init__(self, subparsers: argparse._SubParsersAction):
-        super().__init__()
-        self.parser = subparsers.add_parser(
-            "list-distributions",
-            prog="llama stack list-distributions",
-            description="Show available Llama Stack Distributions",
-            formatter_class=argparse.RawTextHelpFormatter,
-        )
-        self._add_arguments()
-        self.parser.set_defaults(func=self._run_distribution_list_cmd)
-
-    def _add_arguments(self):
-        pass
-
-    def _run_distribution_list_cmd(self, args: argparse.Namespace) -> None:
-        from llama_toolchain.cli.table import print_table
-        from llama_toolchain.core.distribution_registry import (
-            available_distribution_specs,
-        )
-
-        # eventually, this should query a registry at llama.meta.com/llamastack/distributions
-        headers = [
-            "Distribution Type",
-            "Providers",
-            "Description",
-        ]
-
-        rows = []
-        for spec in available_distribution_specs():
-            providers = {k.value: v for k, v in spec.providers.items()}
-            rows.append(
-                [
-                    spec.distribution_type,
-                    json.dumps(providers, indent=2),
-                    spec.description,
-                ]
-            )
-        print_table(
-            rows,
-            headers,
-            separate_rows=True,
-        )
--- a/llama_toolchain/cli/stack/run.py
+++ b/llama_toolchain/cli/stack/run.py
@ -69,9 +69,6 @@ class StackRun(Subcommand):
        with open(config_file, "r") as f:
            config = PackageConfig(**yaml.safe_load(f))

-        if not config.distribution_type:
-            raise ValueError("Build config appears to be corrupt.")
-
        if config.docker_image:
            script = pkg_resources.resource_filename(
                "llama_toolchain",
--- a/llama_toolchain/cli/stack/stack.py
+++ b/llama_toolchain/cli/stack/stack.py
@ -11,7 +11,6 @@ from llama_toolchain.cli.subcommand import Subcommand
 from .build import StackBuild
 from .configure import StackConfigure
 from .list_apis import StackListApis
-from .list_distributions import StackListDistributions
 from .list_providers import StackListProviders
 from .run import StackRun

@ -31,6 +30,5 @@ class StackParser(Subcommand):
        StackBuild.create(subparsers)
        StackConfigure.create(subparsers)
        StackListApis.create(subparsers)
-        StackListDistributions.create(subparsers)
        StackListProviders.create(subparsers)
        StackRun.create(subparsers)
--- a/llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml
+++ b/llama_toolchain/configs/distributions/conda/local-conda-example-build.yaml
@ -0,0 +1,10 @@
+name: local-conda-example
+distribution_spec:
+  description: Use code from `llama_toolchain` itself to serve all llama stack APIs
+  providers:
+    inference: meta-reference
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
--- a/llama_toolchain/configs/distributions/conda/local-fireworks-conda-example-build.yaml
+++ b/llama_toolchain/configs/distributions/conda/local-fireworks-conda-example-build.yaml
@ -0,0 +1,10 @@
+name: local-fireworks-conda-example
+distribution_spec:
+  description: Use Fireworks.ai for running LLM inference
+  providers:
+    inference: remote::fireworks
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
--- a/llama_toolchain/configs/distributions/conda/local-ollama-conda-example-build.yaml
+++ b/llama_toolchain/configs/distributions/conda/local-ollama-conda-example-build.yaml
@ -0,0 +1,10 @@
+name: local-ollama-conda-example
+distribution_spec:
+  description: Like local, but use ollama for running LLM inference
+  providers:
+    inference: remote::ollama
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
--- a/llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml
+++ b/llama_toolchain/configs/distributions/conda/local-tgi-conda-example-build.yaml
@ -0,0 +1,10 @@
+name: local-tgi-conda-example
+distribution_spec:
+  description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
+  providers:
+    inference: remote::tgi
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
--- a/llama_toolchain/configs/distributions/conda/local-together-conda-example-build.yaml
+++ b/llama_toolchain/configs/distributions/conda/local-together-conda-example-build.yaml
@ -0,0 +1,10 @@
+name: local-tgi-conda-example
+distribution_spec:
+  description: Use Together.ai for running LLM inference
+  providers:
+    inference: remote::together
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: conda
--- a/llama_toolchain/configs/distributions/distribution_registry/local-ollama.yaml
+++ b/llama_toolchain/configs/distributions/distribution_registry/local-ollama.yaml
@ -0,0 +1,7 @@
+description: Like local, but use ollama for running LLM inference
+providers:
+  inference: remote::ollama
+  safety: meta-reference
+  agentic_system: meta-reference
+  memory: meta-reference-faiss
+  telemetry: console
--- a/llama_toolchain/configs/distributions/distribution_registry/local-plus-fireworks-inference.yaml
+++ b/llama_toolchain/configs/distributions/distribution_registry/local-plus-fireworks-inference.yaml
@ -0,0 +1,7 @@
+description: Use Fireworks.ai for running LLM inference
+providers:
+  inference: remote::fireworks
+  safety: meta-reference
+  agentic_system: meta-reference
+  memory: meta-reference-faiss
+  telemetry: console
--- a/llama_toolchain/configs/distributions/distribution_registry/local-plus-tgi-inference.yaml
+++ b/llama_toolchain/configs/distributions/distribution_registry/local-plus-tgi-inference.yaml
@ -0,0 +1,6 @@
+description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
+providers:
+  inference: remote::tgi
+  safety: meta-reference
+  agentic_system: meta-reference
+  memory: meta-reference-faiss
--- a/llama_toolchain/configs/distributions/distribution_registry/local-plus-together-inference.yaml
+++ b/llama_toolchain/configs/distributions/distribution_registry/local-plus-together-inference.yaml
@ -0,0 +1,7 @@
+description: Use Together.ai for running LLM inference
+providers:
+  inference: remote::together
+  safety: meta-reference
+  agentic_system: meta-reference
+  memory: meta-reference-faiss
+  telemetry: console
--- a/llama_toolchain/configs/distributions/distribution_registry/local.yaml
+++ b/llama_toolchain/configs/distributions/distribution_registry/local.yaml
@ -0,0 +1,7 @@
+description: Use code from `llama_toolchain` itself to serve all llama stack APIs
+providers:
+  inference: meta-reference
+  memory: meta-reference-faiss
+  safety: meta-reference
+  agentic_system: meta-reference
+  telemetry: console
--- a/llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml
+++ b/llama_toolchain/configs/distributions/docker/local-docker-example-build.yaml
@ -0,0 +1,10 @@
+name: local-docker-example
+distribution_spec:
+  description: Use code from `llama_toolchain` itself to serve all llama stack APIs
+  providers:
+    inference: meta-reference
+    memory: meta-reference-faiss
+    safety: meta-reference
+    agentic_system: meta-reference
+    telemetry: console
+image_type: docker
--- a/llama_toolchain/core/build_conda_env.sh
+++ b/llama_toolchain/core/build_conda_env.sh
@ -19,17 +19,15 @@ fi

 set -euo pipefail

-if [ "$#" -ne 4 ]; then
+if [ "$#" -ne 2 ]; then
  echo "Usage: $0 <distribution_type> <build_name> <pip_dependencies>" >&2
  echo "Example: $0 <distribution_type> mybuild 'numpy pandas scipy'" >&2
  exit 1
 fi

-distribution_type="$1"
-build_name="$2"
+build_name="$1"
 env_name="llamastack-$build_name"
-config_file="$3"
-pip_dependencies="$4"
+pip_dependencies="$2"

 # Define color codes
 RED='\033[0;31m'
@ -115,7 +113,3 @@ ensure_conda_env_python310() {
 }

 ensure_conda_env_python310 "$env_name" "$pip_dependencies"
-
-printf "${GREEN}Successfully setup conda environment. Configuring build...${NC}\n"
-
-$CONDA_PREFIX/bin/python3 -m llama_toolchain.cli.llama stack configure $config_file
--- a/llama_toolchain/core/build_container.sh
+++ b/llama_toolchain/core/build_container.sh
@ -4,18 +4,17 @@ LLAMA_MODELS_DIR=${LLAMA_MODELS_DIR:-}
 LLAMA_TOOLCHAIN_DIR=${LLAMA_TOOLCHAIN_DIR:-}
 TEST_PYPI_VERSION=${TEST_PYPI_VERSION:-}

-if [ "$#" -ne 5 ]; then
-  echo "Usage: $0 <distribution_type> <build_name> <docker_base> <pip_dependencies>
-  echo "Example: $0 distribution_type my-fastapi-app python:3.9-slim 'fastapi uvicorn'
+if [ "$#" -ne 4 ]; then
+  echo "Usage: $0 <build_name> <docker_base> <pip_dependencies>
+  echo "Example: $0 my-fastapi-app python:3.9-slim 'fastapi uvicorn'
  exit 1
 fi

-distribution_type=$1
-build_name="$2"
+build_name="$1"
 image_name="llamastack-$build_name"
-docker_base=$3
-config_file=$4
-pip_dependencies=$5
+docker_base=$2
+build_file_path=$3
+pip_dependencies=$4

 # Define color codes
 RED='\033[0;31m'
@ -26,6 +25,8 @@ set -euo pipefail

 SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
 REPO_DIR=$(dirname $(dirname "$SCRIPT_DIR"))
+DOCKER_BINARY=${DOCKER_BINARY:-docker}
+DOCKER_OPTS=${DOCKER_OPTS:-}

 TEMP_DIR=$(mktemp -d)

@ -93,6 +94,8 @@ add_to_docker <<EOF

 EOF

+add_to_docker "ADD $build_file_path ./llamastack-build.yaml"
+
 printf "Dockerfile created successfully in $TEMP_DIR/Dockerfile"
 cat $TEMP_DIR/Dockerfile
 printf "\n"
@ -105,10 +108,10 @@ if [ -n "$LLAMA_MODELS_DIR" ]; then
  mounts="$mounts -v $(readlink -f $LLAMA_MODELS_DIR):$models_mount"
 fi
 set -x
-podman build -t $image_name -f "$TEMP_DIR/Dockerfile" "$REPO_DIR" $mounts
+$DOCKER_BINARY build $DOCKER_OPTS -t $image_name -f "$TEMP_DIR/Dockerfile" "$REPO_DIR" $mounts
 set +x

-printf "${GREEN}Succesfully setup Podman image. Configuring build...${NC}"
 echo "You can run it with: podman run -p 8000:8000 $image_name"

-$CONDA_PREFIX/bin/python3 -m llama_toolchain.cli.llama stack configure $config_file
+echo "Checking image builds..."
+podman run -it $image_name cat llamastack-build.yaml
--- a/llama_toolchain/core/configure_container.sh
+++ b/llama_toolchain/core/configure_container.sh
@ -0,0 +1,31 @@
+#!/bin/bash
+
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+set -euo pipefail
+
+error_handler() {
+  echo "Error occurred in script at line: ${1}" >&2
+  exit 1
+}
+
+trap 'error_handler ${LINENO}' ERR
+
+if [ $# -lt 2 ]; then
+  echo "Usage: $0 <container name> <build file path>"
+  exit 1
+fi
+
+docker_image="$1"
+host_build_dir="$2"
+container_build_dir="/app/builds"
+
+set -x
+podman run -it \
+  -v $host_build_dir:$container_build_dir \
+  $docker_image \
+  llama stack configure ./llamastack-build.yaml --output-dir $container_build_dir
--- a/llama_toolchain/core/datatypes.py
+++ b/llama_toolchain/core/datatypes.py
@ -151,11 +151,12 @@ def remote_provider_spec(

@json_schema_type
 class DistributionSpec(BaseModel):
-    distribution_type: str
-    description: str
-
+    description: Optional[str] = Field(
+        default="",
+        description="Description of the distribution",
+    )
    docker_image: Optional[str] = None
-    providers: Dict[Api, str] = Field(
+    providers: Dict[str, str] = Field(
        default_factory=dict,
        description="Provider Types for each of the APIs provided by this distribution",
    )
@ -172,8 +173,6 @@ Reference to the distribution this package refers to. For unregistered (adhoc) p
 this could be just a hash
 """,
    )
-    distribution_type: Optional[str] = None
-
    docker_image: Optional[str] = Field(
        default=None,
        description="Reference to the docker image if this package refers to a container",
@ -194,12 +193,8 @@ the dependencies of these providers as well.
@json_schema_type
 class BuildConfig(BaseModel):
    name: str
-    distribution: str = Field(
-        default="local", description="Type of distribution to build (adhoc | {})"
-    )
-    api_providers: Optional[str] = Field(
-        default_factory=list,
-        description="List of API provider names to build",
+    distribution_spec: DistributionSpec = Field(
+        description="The distribution spec to build including API providers. "
    )
    image_type: str = Field(
        default="conda",
--- a/llama_toolchain/core/distribution.py
+++ b/llama_toolchain/core/distribution.py
@ -31,16 +31,6 @@ SERVER_DEPENDENCIES = [
 ]


-def distribution_dependencies(distribution: DistributionSpec) -> List[str]:
-    # only consider InlineProviderSpecs when calculating dependencies
-    return [
-        dep
-        for provider_spec in distribution.provider_specs.values()
-        if isinstance(provider_spec, InlineProviderSpec)
-        for dep in provider_spec.pip_packages
-    ] + SERVER_DEPENDENCIES
-
-
 def stack_apis() -> List[Api]:
    return [v for v in Api]

--- a/llama_toolchain/core/distribution_registry.py
+++ b/llama_toolchain/core/distribution_registry.py
@ -5,84 +5,19 @@
 # the root directory of this source tree.

 from functools import lru_cache
+from pathlib import Path
 from typing import List, Optional
-
 from .datatypes import *  # noqa: F403
+import yaml


@lru_cache()
 def available_distribution_specs() -> List[DistributionSpec]:
-    return [
-        DistributionSpec(
-            distribution_type="local",
-            description="Use code from `llama_toolchain` itself to serve all llama stack APIs",
-            providers={
-                Api.inference: "meta-reference",
-                Api.memory: "meta-reference-faiss",
-                Api.safety: "meta-reference",
-                Api.agentic_system: "meta-reference",
-                Api.telemetry: "console",
-            },
-        ),
-        DistributionSpec(
-            distribution_type="remote",
-            description="Point to remote services for all llama stack APIs",
-            providers={
-                **{x: "remote" for x in Api},
-                Api.telemetry: "console",
-            },
-        ),
-        DistributionSpec(
-            distribution_type="local-ollama",
-            description="Like local, but use ollama for running LLM inference",
-            providers={
-                Api.inference: remote_provider_type("ollama"),
-                Api.safety: "meta-reference",
-                Api.agentic_system: "meta-reference",
-                Api.memory: "meta-reference-faiss",
-                Api.telemetry: "console",
-            },
-        ),
-        DistributionSpec(
-            distribution_type="local-plus-fireworks-inference",
-            description="Use Fireworks.ai for running LLM inference",
-            providers={
-                Api.inference: remote_provider_type("fireworks"),
-                Api.safety: "meta-reference",
-                Api.agentic_system: "meta-reference",
-                Api.memory: "meta-reference-faiss",
-                Api.telemetry: "console",
-            },
-        ),
-        DistributionSpec(
-            distribution_type="local-plus-together-inference",
-            description="Use Together.ai for running LLM inference",
-            providers={
-                Api.inference: remote_provider_type("together"),
-                Api.safety: "meta-reference",
-                Api.agentic_system: "meta-reference",
-                Api.memory: "meta-reference-faiss",
-                Api.telemetry: "console",
-            },
-        ),
-        DistributionSpec(
-            distribution_type="local-plus-tgi-inference",
-            description="Use TGI for running LLM inference",
-            providers={
-                Api.inference: remote_provider_type("tgi"),
-                Api.safety: "meta-reference",
-                Api.agentic_system: "meta-reference",
-                Api.memory: "meta-reference-faiss",
-            },
-        ),
-    ]
+    distribution_specs = []
+    for p in Path("llama_toolchain/configs/distributions/distribution_registry").rglob(
+        "*.yaml"
+    ):
+        with open(p, "r") as f:
+            distribution_specs.append(DistributionSpec(**yaml.safe_load(f)))

-
-@lru_cache()
-def resolve_distribution_spec(
-    distribution_type: str,
-) -> Optional[DistributionSpec]:
-    for spec in available_distribution_specs():
-        if spec.distribution_type == distribution_type:
-            return spec
-    return None
+    return distribution_specs
--- a/llama_toolchain/core/package.py
+++ b/llama_toolchain/core/package.py
@ -21,6 +21,8 @@ from pydantic import BaseModel
 from termcolor import cprint

 from llama_toolchain.core.datatypes import *  # noqa: F403
+from pathlib import Path
+
 from llama_toolchain.core.distribution import api_providers, SERVER_DEPENDENCIES


@ -39,87 +41,35 @@ class ApiInput(BaseModel):
    provider: str


-def build_package(
-    api_inputs: List[ApiInput],
-    image_type: ImageType,
-    name: str,
-    distribution_type: Optional[str] = None,
-    docker_image: Optional[str] = None,
-):
-    if not distribution_type:
-        distribution_type = "adhoc"
-
-    build_dir = BUILDS_BASE_DIR / distribution_type / image_type.value
-    os.makedirs(build_dir, exist_ok=True)
-
-    package_name = name.replace("::", "-")
-    package_file = build_dir / f"{package_name}.yaml"
-
-    all_providers = api_providers()
-
+def build_package(build_config: BuildConfig, build_file_path: Path):
    package_deps = Dependencies(
-        docker_image=docker_image or "python:3.10-slim",
+        docker_image=build_config.distribution_spec.docker_image or "python:3.10-slim",
        pip_packages=SERVER_DEPENDENCIES,
    )

-    stub_config = {}
-    for api_input in api_inputs:
-        api = api_input.api
-        providers_for_api = all_providers[api]
-        if api_input.provider not in providers_for_api:
+    # extend package dependencies based on providers spec
+    all_providers = api_providers()
+    for api_str, provider in build_config.distribution_spec.providers.items():
+        providers_for_api = all_providers[Api(api_str)]
+        if provider not in providers_for_api:
            raise ValueError(
-                f"Provider `{api_input.provider}` is not available for API `{api}`"
+                f"Provider `{provider}` is not available for API `{api_str}`"
            )

-        provider = providers_for_api[api_input.provider]
-        package_deps.pip_packages.extend(provider.pip_packages)
-        if provider.docker_image:
+        provider_spec = providers_for_api[provider]
+        package_deps.pip_packages.extend(provider_spec.pip_packages)
+        if provider_spec.docker_image:
            raise ValueError("A stack's dependencies cannot have a docker image")

-        stub_config[api.value] = {"provider_type": api_input.provider}
-
-    if package_file.exists():
-        cprint(
-            f"Build `{package_name}` exists; will reconfigure",
-            color="yellow",
-        )
-        c = PackageConfig(**yaml.safe_load(package_file.read_text()))
-        for api_str, new_config in stub_config.items():
-            if api_str not in c.providers:
-                c.providers[api_str] = new_config
-            else:
-                existing_config = c.providers[api_str]
-                if existing_config["provider_type"] != new_config["provider_type"]:
-                    cprint(
-                        f"Provider `{api_str}` has changed from `{existing_config}` to `{new_config}`",
-                        color="yellow",
-                    )
-                    c.providers[api_str] = new_config
-    else:
-        c = PackageConfig(
-            built_at=datetime.now(),
-            package_name=package_name,
-            providers=stub_config,
-        )
-
-    c.distribution_type = distribution_type
-    c.docker_image = package_name if image_type == ImageType.docker else None
-    c.conda_env = package_name if image_type == ImageType.conda else None
-
-    with open(package_file, "w") as f:
-        to_write = json.loads(json.dumps(c.dict(), cls=EnumEncoder))
-        f.write(yaml.dump(to_write, sort_keys=False))
-
-    if image_type == ImageType.docker:
+    if build_config.image_type == ImageType.docker.value:
        script = pkg_resources.resource_filename(
            "llama_toolchain", "core/build_container.sh"
        )
        args = [
            script,
-            distribution_type,
-            package_name,
+            build_config.name,
            package_deps.docker_image,
-            str(package_file),
+            str(build_file_path),
            " ".join(package_deps.pip_packages),
        ]
    else:
@ -128,21 +78,14 @@ def build_package(
        )
        args = [
            script,
-            distribution_type,
-            package_name,
-            str(package_file),
+            build_config.name,
            " ".join(package_deps.pip_packages),
        ]

    return_code = run_with_pty(args)
    if return_code != 0:
        cprint(
-            f"Failed to build target {package_name} with return code {return_code}",
+            f"Failed to build target {build_config.name} with return code {return_code}",
            color="red",
        )
        return
-
-    cprint(
-        f"Target `{package_name}` built with configuration at {str(package_file)}",
-        color="green",
-    )
--- a/llama_toolchain/inference/meta_reference/generation.py
+++ b/llama_toolchain/inference/meta_reference/generation.py
@ -28,10 +28,10 @@ from llama_models.llama3.api.datatypes import Message, ToolPromptFormat
 from llama_models.llama3.api.tokenizer import Tokenizer
 from llama_models.llama3.reference_impl.model import Transformer
 from llama_models.sku_list import resolve_model
-from termcolor import cprint

 from llama_toolchain.common.model_utils import model_local_dir
 from llama_toolchain.inference.api import QuantizationType
+from termcolor import cprint

 from .config import MetaReferenceImplConfig

@ -80,6 +80,7 @@ class Llama:
            torch.distributed.init_process_group("nccl")

        model_parallel_size = config.model_parallel_size
+
        if not model_parallel_is_initialized():
            initialize_model_parallel(model_parallel_size)

--- a/llama_toolchain/memory/common/init.py
+++ b/llama_toolchain/memory/common/init.py
@ -0,0 +1,5 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.