diff --git a/docs/cli_reference.md b/docs/cli_reference.md
deleted file mode 100644
index 39ac99615..000000000
--- a/docs/cli_reference.md
+++ /dev/null
@@ -1,485 +0,0 @@
-# Llama CLI Reference
-
-The `llama` CLI tool helps you setup and use the Llama Stack & agentic systems. It should be available on your path after installing the `llama-stack` package.
-
-### Subcommands
-1. `download`: `llama` cli tools supports downloading the model from Meta or Hugging Face.
-2. `model`: Lists available models and their properties.
-3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](cli_reference.md#step-3-building-and-configuring-llama-stack-distributions).
-
-### Sample Usage
-
-```
-llama --help
-```
-<pre style="font-family: monospace;">
-usage: llama [-h] {download,model,stack} ...
-
-Welcome to the Llama CLI
-
-options:
-  -h, --help            show this help message and exit
-
-subcommands:
-  {download,model,stack}
-</pre>
-
-## Step 1. Get the models
-
-You first need to have models downloaded locally.
-
-To download any model you need the **Model Descriptor**.
-This can be obtained by running the command
-```
-llama model list
-```
-
-You should see a table like this:
-
-<pre style="font-family: monospace;">
-+----------------------------------+------------------------------------------+----------------+
-| Model Descriptor                 | Hugging Face Repo                        | Context Length |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-8B                      | meta-llama/Llama-3.1-8B                  | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-70B                     | meta-llama/Llama-3.1-70B                 | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-405B:bf16-mp8           | meta-llama/Llama-3.1-405B                | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-405B                    | meta-llama/Llama-3.1-405B-FP8            | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-405B:bf16-mp16          | meta-llama/Llama-3.1-405B                | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-8B-Instruct             | meta-llama/Llama-3.1-8B-Instruct         | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-70B-Instruct            | meta-llama/Llama-3.1-70B-Instruct        | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-405B-Instruct:bf16-mp8  | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-405B-Instruct           | meta-llama/Llama-3.1-405B-Instruct-FP8   | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.1-405B-Instruct:bf16-mp16 | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-1B                      | meta-llama/Llama-3.2-1B                  | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-3B                      | meta-llama/Llama-3.2-3B                  | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-11B-Vision              | meta-llama/Llama-3.2-11B-Vision          | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-90B-Vision              | meta-llama/Llama-3.2-90B-Vision          | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-1B-Instruct             | meta-llama/Llama-3.2-1B-Instruct         | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-3B-Instruct             | meta-llama/Llama-3.2-3B-Instruct         | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-11B-Vision-Instruct     | meta-llama/Llama-3.2-11B-Vision-Instruct | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama3.2-90B-Vision-Instruct     | meta-llama/Llama-3.2-90B-Vision-Instruct | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama-Guard-3-11B-Vision         | meta-llama/Llama-Guard-3-11B-Vision      | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama-Guard-3-1B:int4-mp1        | meta-llama/Llama-Guard-3-1B-INT4         | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama-Guard-3-1B                 | meta-llama/Llama-Guard-3-1B              | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama-Guard-3-8B                 | meta-llama/Llama-Guard-3-8B              | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama-Guard-3-8B:int8-mp1        | meta-llama/Llama-Guard-3-8B-INT8         | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Prompt-Guard-86M                 | meta-llama/Prompt-Guard-86M              | 128K           |
-+----------------------------------+------------------------------------------+----------------+
-| Llama-Guard-2-8B                 | meta-llama/Llama-Guard-2-8B              | 4K             |
-+----------------------------------+------------------------------------------+----------------+
-</pre>
-
-To download models, you can use the llama download command.
-
-#### Downloading from [Meta](https://llama.meta.com/llama-downloads/)
-
-Here is an example download command to get the 3B-Instruct/11B-Vision-Instruct model. You will need META_URL which can be obtained from [here](https://llama.meta.com/docs/getting_the_models/meta/)
-
-Download the required checkpoints using the following commands:
-```bash
-# download the 8B model, this can be run on a single GPU
-llama download --source meta --model-id Llama3.2-3B-Instruct --meta-url META_URL
-
-# you can also get the 70B model, this will require 8 GPUs however
-llama download --source meta --model-id Llama3.2-11B-Vision-Instruct --meta-url META_URL
-
-# llama-agents have safety enabled by default. For this, you will need
-# safety models -- Llama-Guard and Prompt-Guard
-llama download --source meta --model-id Prompt-Guard-86M --meta-url META_URL
-llama download --source meta --model-id Llama-Guard-3-1B --meta-url META_URL
-```
-
-#### Downloading from [Hugging Face](https://huggingface.co/meta-llama)
-
-Essentially, the same commands above work, just replace `--source meta` with `--source huggingface`.
-
-```bash
-llama download --source huggingface --model-id  Llama3.1-8B-Instruct --hf-token <HF_TOKEN>
-
-llama download --source huggingface --model-id Llama3.1-70B-Instruct --hf-token <HF_TOKEN>
-
-llama download --source huggingface --model-id Llama-Guard-3-1B --ignore-patterns *original*
-llama download --source huggingface --model-id Prompt-Guard-86M --ignore-patterns *original*
-```
-
-**Important:** Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command to validate your access. You can find your token at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
-
-> **Tip:** Default for `llama download` is to run with `--ignore-patterns *.safetensors` since we use the `.pth` files in the `original` folder. For Llama Guard and Prompt Guard, however, we need safetensors. Hence, please run with `--ignore-patterns original` so that safetensors are downloaded and `.pth` files are ignored.
-
-#### Downloading via Ollama
-
-If you're already using ollama, we also have a supported Llama Stack distribution `local-ollama` and you can continue to use ollama for managing model downloads.
-
-```
-ollama pull llama3.1:8b-instruct-fp16
-ollama pull llama3.1:70b-instruct-fp16
-```
-
-> [!NOTE]
-> Only the above two models are currently supported by Ollama.
-
-
-## Step 2: Understand the models
-The `llama model` command helps you explore the model’s interface.
-
-### 2.1 Subcommands
-1. `download`: Download the model from different sources. (meta, huggingface)
-2. `list`: Lists all the models available for download with hardware requirements to deploy the models.
-3. `prompt-format`: Show llama model message formats.
-4. `describe`: Describes all the properties of the model.
-
-### 2.2 Sample Usage
-
-`llama model <subcommand> <options>`
-
-```
-llama model --help
-```
-<pre style="font-family: monospace;">
-usage: llama model [-h] {download,list,prompt-format,describe} ...
-
-Work with llama models
-
-options:
-  -h, --help            show this help message and exit
-
-model_subcommands:
-  {download,list,prompt-format,describe}
-</pre>
-
-You can use the describe command to know more about a model:
-```
-llama model describe -m Llama3.2-3B-Instruct
-```
-### 2.3 Describe
-
-<pre style="font-family: monospace;">
-+-----------------------------+----------------------------------+
-| Model                       | Llama3.2-3B-Instruct             |
-+-----------------------------+----------------------------------+
-| Hugging Face ID             | meta-llama/Llama-3.2-3B-Instruct |
-+-----------------------------+----------------------------------+
-| Description                 | Llama 3.2 3b instruct model      |
-+-----------------------------+----------------------------------+
-| Context Length              | 128K tokens                      |
-+-----------------------------+----------------------------------+
-| Weights format              | bf16                             |
-+-----------------------------+----------------------------------+
-| Model params.json           | {                                |
-|                             |     "dim": 3072,                 |
-|                             |     "n_layers": 28,              |
-|                             |     "n_heads": 24,               |
-|                             |     "n_kv_heads": 8,             |
-|                             |     "vocab_size": 128256,        |
-|                             |     "ffn_dim_multiplier": 1.0,   |
-|                             |     "multiple_of": 256,          |
-|                             |     "norm_eps": 1e-05,           |
-|                             |     "rope_theta": 500000.0,      |
-|                             |     "use_scaled_rope": true      |
-|                             | }                                |
-+-----------------------------+----------------------------------+
-| Recommended sampling params | {                                |
-|                             |     "strategy": "top_p",         |
-|                             |     "temperature": 1.0,          |
-|                             |     "top_p": 0.9,                |
-|                             |     "top_k": 0                   |
-|                             | }                                |
-+-----------------------------+----------------------------------+
-</pre>
-### 2.4 Prompt Format
-You can even run `llama model prompt-format` see all of the templates and their tokens:
-
-```
-llama model prompt-format -m Llama3.2-3B-Instruct
-```
-![alt text](resources/prompt-format.png)
-
-
-
-You will be shown a Markdown formatted description of the model interface and how prompts / messages are formatted for various scenarios.
-
-**NOTE**: Outputs in terminal are color printed to show special tokens.
-
-
-## Step 3: Building, and Configuring Llama Stack Distributions
-
-- Please see our [Getting Started](getting_started.md) guide for more details on how to build and start a Llama Stack distribution.
-
-### Step 3.1 Build
-In the following steps, imagine we'll be working with a `Llama3.1-8B-Instruct` model. We will name our build `tgi` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
-- `name`: the name for our distribution (e.g. `tgi`)
-- `image_type`: our build image type (`conda | docker`)
-- `distribution_spec`: our distribution specs for specifying API providers
-  - `description`: a short description of the configurations for the distribution
-  - `providers`: specifies the underlying implementation for serving each API endpoint
-  - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
-
-
-At the end of build command, we will generate `<name>-build.yaml` file storing the build configurations.
-
-After this step is complete, a file named `<name>-build.yaml` will be generated and saved at the output file path specified at the end of the command.
-
-#### Building from scratch
-- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
-```
-llama stack build
-```
-
-Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs.
-
-```
-> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-llama-stack
-> Enter the image type you want your distribution to be built with (docker or conda): conda
-
- Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
-> Enter the API provider for the inference API: (default=meta-reference): meta-reference
-> Enter the API provider for the safety API: (default=meta-reference): meta-reference
-> Enter the API provider for the agents API: (default=meta-reference): meta-reference
-> Enter the API provider for the memory API: (default=meta-reference): meta-reference
-> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
-
- > (Optional) Enter a short description for your Llama Stack distribution:
-
-Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/my-local-llama-stack-build.yaml
-```
-
-#### Building from templates
-- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
-
-The following command will allow you to see the available templates and their corresponding providers.
-```
-llama stack build --list-templates
-```
-
-![alt text](resources/list-templates.png)
-
-You may then pick a template to build your distribution with providers fitted to your liking.
-
-```
-llama stack build --template tgi --image-type conda
-```
-
-```
-$ llama stack build --template tgi --image-type conda
-...
-...
-Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml
-You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml`
-```
-
-#### Building from config file
-- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
-
-- The config file will be of contents like the ones in `llama_stack/templates/`.
-
-```
-$ cat build.yaml
-
-name: ollama
-distribution_spec:
-  description: Like local, but use ollama for running LLM inference
-  providers:
-    inference: remote::ollama
-    memory: meta-reference
-    safety: meta-reference
-    agents: meta-reference
-    telemetry: meta-reference
-image_type: conda
-```
-
-```
-llama stack build --config build.yaml
-```
-
-#### How to build distribution with Docker image
-
-To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
-
-```
-llama stack build --template tgi --image-type docker
-```
-
-Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
-
-```
-name: local-docker-example
-distribution_spec:
-  description: Use code from `llama_stack` itself to serve all llama stack APIs
-  docker_image: null
-  providers:
-    inference: meta-reference
-    memory: meta-reference-faiss
-    safety: meta-reference
-    agentic_system: meta-reference
-    telemetry: console
-image_type: docker
-```
-
-The following command allows you to build a Docker image with the name `<name>`
-```
-llama stack build --config <name>-build.yaml
-
-Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
-WORKDIR /app
-...
-...
-You can run it with: podman run -p 8000:8000 llamastack-docker-local
-Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
-```
-
-
-### Step 3.2 Configure
-After our distribution is built (either in form of docker or conda environment), we will run the following command to
-```
-llama stack configure [ <docker-image-name> | <path/to/name-build.yaml>]
-```
-- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
-- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
-   - Run `docker images` to check list of available images on your machine.
-
-```
-$ llama stack configure ~/.llama/distributions/conda/tgi-build.yaml
-
-Configuring API: inference (meta-reference)
-Enter value for model (existing: Llama3.1-8B-Instruct) (required):
-Enter value for quantization (optional):
-Enter value for torch_seed (optional):
-Enter value for max_seq_len (existing: 4096) (required):
-Enter value for max_batch_size (existing: 1) (required):
-
-Configuring API: memory (meta-reference-faiss)
-
-Configuring API: safety (meta-reference)
-Do you want to configure llama_guard_shield? (y/n): y
-Entering sub-configuration for llama_guard_shield:
-Enter value for model (default: Llama-Guard-3-1B) (required):
-Enter value for excluded_categories (default: []) (required):
-Enter value for disable_input_check (default: False) (required):
-Enter value for disable_output_check (default: False) (required):
-Do you want to configure prompt_guard_shield? (y/n): y
-Entering sub-configuration for prompt_guard_shield:
-Enter value for model (default: Prompt-Guard-86M) (required):
-
-Configuring API: agentic_system (meta-reference)
-Enter value for brave_search_api_key (optional):
-Enter value for bing_search_api_key (optional):
-Enter value for wolfram_api_key (optional):
-
-Configuring API: telemetry (console)
-
-YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml
-```
-
-After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings.
-
-As you can see, we did basic configuration above and configured:
-- inference to run on model `Llama3.1-8B-Instruct` (obtained from `llama model list`)
-- Llama Guard safety shield with model `Llama-Guard-3-1B`
-- Prompt Guard safety shield with model `Prompt-Guard-86M`
-
-For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
-
-Note that all configurations as well as models are stored in `~/.llama`
-
-
-### Step 3.3 Run
-Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
-
-```
-llama stack run ~/.llama/builds/conda/tgi-run.yaml
-```
-
-You should see the Llama Stack server start and print the APIs that it is supporting
-
-```
-$ llama stack run ~/.llama/builds/local/conda/tgi-run.yaml
-
-> initializing model parallel with size 1
-> initializing ddp with size 1
-> initializing pipeline with size 1
-Loaded in 19.28 seconds
-NCCL version 2.20.5+cuda12.4
-Finished model load YES READY
-Serving POST /inference/batch_chat_completion
-Serving POST /inference/batch_completion
-Serving POST /inference/chat_completion
-Serving POST /inference/completion
-Serving POST /safety/run_shield
-Serving POST /agentic_system/memory_bank/attach
-Serving POST /agentic_system/create
-Serving POST /agentic_system/session/create
-Serving POST /agentic_system/turn/create
-Serving POST /agentic_system/delete
-Serving POST /agentic_system/session/delete
-Serving POST /agentic_system/memory_bank/detach
-Serving POST /agentic_system/session/get
-Serving POST /agentic_system/step/get
-Serving POST /agentic_system/turn/get
-Listening on :::5000
-INFO:     Started server process [453333]
-INFO:     Waiting for application startup.
-INFO:     Application startup complete.
-INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
-```
-
-> [!NOTE]
-> Configuration is in `~/.llama/builds/local/conda/tgi-run.yaml`. Feel free to increase `max_seq_len`.
-
-> [!IMPORTANT]
-> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
-
-> [!TIP]
-> You might need to use the flag `--disable-ipv6` to  Disable IPv6 support
-
-This server is running a Llama model locally.
-
-### Step 3.4 Test with Client
-Once the server is setup, we can test it with a client to see the example outputs.
-```
-cd /path/to/llama-stack
-conda activate <env>  # any environment containing the llama-stack pip package will work
-
-python -m llama_stack.apis.inference.client localhost 5000
-```
-
-This will run the chat completion client and query the distribution’s /inference/chat_completion API.
-
-Here is an example output:
-```
-User>hello world, write me a 2 sentence poem about the moon
-Assistant> Here's a 2-sentence poem about the moon:
-
-The moon glows softly in the midnight sky,
-A beacon of wonder, as it passes by.
-```
-
-Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:
-
-```
-python -m llama_stack.apis.safety.client localhost 5000
-```
-
-You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
diff --git a/docs/getting_started.md b/docs/getting_started.md
deleted file mode 100644
index 49c7cd5a0..000000000
--- a/docs/getting_started.md
+++ /dev/null
@@ -1,230 +0,0 @@
-# Getting Started with Llama Stack
-
-This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server. Please see our [documentations](../README.md) on what you can do with Llama Stack, and [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) on examples apps built with Llama Stack.
-
-## Installation
-The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package.
-
-You have two ways to install this repository:
-
-1. **Install as a package**:
-   You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:
-   ```bash
-   pip install llama-stack
-   ```
-
-2. **Install from source**:
-   If you prefer to install from the source code, follow these steps:
-   ```bash
-    mkdir -p ~/local
-    cd ~/local
-    git clone git@github.com:meta-llama/llama-stack.git
-
-    conda create -n stack python=3.10
-    conda activate stack
-
-    cd llama-stack
-    $CONDA_PREFIX/bin/pip install -e .
-   ```
-
-For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md).
-
-## Starting Up Llama Stack Server
-
-You have two ways to start up Llama stack server:
-
-1. **Starting up server via docker**:
-
-We provide pre-built Docker image of Llama Stack distribution, which can be found in the following links in the [distributions](../distributions/) folder.
-
-> [!NOTE]
-> For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.
-```
-export LLAMA_CHECKPOINT_DIR=~/.llama
-```
-
-> [!NOTE]
-> `~/.llama` should be the path containing downloaded weights of Llama models.
-
-To download llama models, use
-```
-llama download --model-id Llama3.1-8B-Instruct
-```
-
-To download and start running a pre-built docker container, you may use the following commands:
-
-```
-cd llama-stack/distributions/meta-reference-gpu
-docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
-```
-
-> [!TIP]
-> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started.
-
-
-2. **Build->Configure->Run Llama Stack server via conda**:
-
-	You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
-
-	**`llama stack build`**
-	- You'll be prompted to enter build information interactively.
-	```
-	llama stack build
-
-	> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack
-	> Enter the image type you want your distribution to be built with (docker or conda): conda
-
-	Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
-	> Enter the API provider for the inference API: (default=meta-reference): meta-reference
-	> Enter the API provider for the safety API: (default=meta-reference): meta-reference
-	> Enter the API provider for the agents API: (default=meta-reference): meta-reference
-	> Enter the API provider for the memory API: (default=meta-reference): meta-reference
-	> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
-
-	> (Optional) Enter a short description for your Llama Stack distribution:
-
-	Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml
-	You can now run `llama stack configure my-local-stack`
-	```
-
-	**`llama stack configure`**
-	- Run `llama stack configure <name>` with the name you have previously defined in `build` step.
-	```
-	llama stack configure <name>
-	```
-	- You will be prompted to enter configurations for your Llama Stack
-
-	```
-	$ llama stack configure my-local-stack
-
-	Configuring API `inference`...
-	=== Configuring provider `meta-reference` for API inference...
-	Enter value for model (default: Llama3.1-8B-Instruct) (required):
-	Do you want to configure quantization? (y/n): n
-	Enter value for torch_seed (optional):
-	Enter value for max_seq_len (default: 4096) (required):
-	Enter value for max_batch_size (default: 1) (required):
-
-	Configuring API `safety`...
-	=== Configuring provider `meta-reference` for API safety...
-	Do you want to configure llama_guard_shield? (y/n): n
-	Do you want to configure prompt_guard_shield? (y/n): n
-
-	Configuring API `agents`...
-	=== Configuring provider `meta-reference` for API agents...
-	Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite):
-
-	Configuring SqliteKVStoreConfig:
-	Enter value for namespace (optional):
-	Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required):
-
-	Configuring API `memory`...
-	=== Configuring provider `meta-reference` for API memory...
-	> Please enter the supported memory bank type your provider has for memory: vector
-
-	Configuring API `telemetry`...
-	=== Configuring provider `meta-reference` for API telemetry...
-
-	> YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml.
-	You can now run `llama stack run my-local-stack --port PORT`
-	```
-
-	**`llama stack run`**
-	- Run `llama stack run <name>` with the name you have previously defined.
-	```
-	llama stack run my-local-stack
-
-	...
-	> initializing model parallel with size 1
-	> initializing ddp with size 1
-	> initializing pipeline with size 1
-	...
-	Finished model load YES READY
-	Serving POST /inference/chat_completion
-	Serving POST /inference/completion
-	Serving POST /inference/embeddings
-	Serving POST /memory_banks/create
-	Serving DELETE /memory_bank/documents/delete
-	Serving DELETE /memory_banks/drop
-	Serving GET /memory_bank/documents/get
-	Serving GET /memory_banks/get
-	Serving POST /memory_bank/insert
-	Serving GET /memory_banks/list
-	Serving POST /memory_bank/query
-	Serving POST /memory_bank/update
-	Serving POST /safety/run_shield
-	Serving POST /agentic_system/create
-	Serving POST /agentic_system/session/create
-	Serving POST /agentic_system/turn/create
-	Serving POST /agentic_system/delete
-	Serving POST /agentic_system/session/delete
-	Serving POST /agentic_system/session/get
-	Serving POST /agentic_system/step/get
-	Serving POST /agentic_system/turn/get
-	Serving GET /telemetry/get_trace
-	Serving POST /telemetry/log_event
-	Listening on :::5000
-	INFO:     Started server process [587053]
-	INFO:     Waiting for application startup.
-	INFO:     Application startup complete.
-	INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
-	```
-
-
-## Testing with client
-Once the server is setup, we can test it with a client to see the example outputs.
-```
-cd /path/to/llama-stack
-conda activate <env>  # any environment containing the llama-stack pip package will work
-
-python -m llama_stack.apis.inference.client localhost 5000
-```
-
-This will run the chat completion client and query the distribution’s `/inference/chat_completion` API.
-
-Here is an example output:
-```
-User>hello world, write me a 2 sentence poem about the moon
-Assistant> Here's a 2-sentence poem about the moon:
-
-The moon glows softly in the midnight sky,
-A beacon of wonder, as it passes by.
-```
-
-You may also send a POST request to the server:
-```
-curl http://localhost:5000/inference/chat_completion \
--H "Content-Type: application/json" \
--d '{
-	"model": "Llama3.1-8B-Instruct",
-	"messages": [
-		{"role": "system", "content": "You are a helpful assistant."},
-		{"role": "user", "content": "Write me a 2 sentence poem about the moon"}
-	],
-	"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
-}'
-
-Output:
-{'completion_message': {'role': 'assistant',
-  'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.',
-  'stop_reason': 'out_of_tokens',
-  'tool_calls': []},
- 'logprobs': null}
-
-```
-
-
-Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:
-
-```
-python -m llama_stack.apis.safety.client localhost 5000
-```
-
-
-Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
-
-You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
-
-
-## Advanced Guides
-Please see our [Building a LLama Stack Distribution](./building_distro.md) guide for more details on how to assemble your own Llama Stack Distribution.
diff --git a/docs/building_distro.md b/docs/source/building_distro.md
similarity index 100%
rename from docs/building_distro.md
rename to docs/source/building_distro.md
diff --git a/docs/source/cli_reference.md b/docs/source/cli_reference.md
index 81da1a773..39ac99615 100644
--- a/docs/source/cli_reference.md
+++ b/docs/source/cli_reference.md
@@ -2,12 +2,12 @@
 
 The `llama` CLI tool helps you setup and use the Llama Stack & agentic systems. It should be available on your path after installing the `llama-stack` package.
 
-## Subcommands
+### Subcommands
 1. `download`: `llama` cli tools supports downloading the model from Meta or Hugging Face.
 2. `model`: Lists available models and their properties.
-3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this in Step 3 below.
+3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](cli_reference.md#step-3-building-and-configuring-llama-stack-distributions).
 
-## Sample Usage
+### Sample Usage
 
 ```
 llama --help
@@ -94,7 +94,7 @@ You should see a table like this:
 
 To download models, you can use the llama download command.
 
-### Downloading from [Meta](https://llama.meta.com/llama-downloads/)
+#### Downloading from [Meta](https://llama.meta.com/llama-downloads/)
 
 Here is an example download command to get the 3B-Instruct/11B-Vision-Instruct model. You will need META_URL which can be obtained from [here](https://llama.meta.com/docs/getting_the_models/meta/)
 
@@ -112,7 +112,7 @@ llama download --source meta --model-id Prompt-Guard-86M --meta-url META_URL
 llama download --source meta --model-id Llama-Guard-3-1B --meta-url META_URL
 ```
 
-### Downloading from [Hugging Face](https://huggingface.co/meta-llama)
+#### Downloading from [Hugging Face](https://huggingface.co/meta-llama)
 
 Essentially, the same commands above work, just replace `--source meta` with `--source huggingface`.
 
@@ -129,7 +129,7 @@ llama download --source huggingface --model-id Prompt-Guard-86M --ignore-pattern
 
 > **Tip:** Default for `llama download` is to run with `--ignore-patterns *.safetensors` since we use the `.pth` files in the `original` folder. For Llama Guard and Prompt Guard, however, we need safetensors. Hence, please run with `--ignore-patterns original` so that safetensors are downloaded and `.pth` files are ignored.
 
-### Downloading via Ollama
+#### Downloading via Ollama
 
 If you're already using ollama, we also have a supported Llama Stack distribution `local-ollama` and you can continue to use ollama for managing model downloads.
 
@@ -215,7 +215,7 @@ You can even run `llama model prompt-format` see all of the templates and their
 ```
 llama model prompt-format -m Llama3.2-3B-Instruct
 ```
-![alt text](https://github.com/meta-llama/llama-stack/docs/resources/prompt-format.png)
+![alt text](resources/prompt-format.png)
 
 
 
@@ -229,8 +229,8 @@ You will be shown a Markdown formatted description of the model interface and ho
 - Please see our [Getting Started](getting_started.md) guide for more details on how to build and start a Llama Stack distribution.
 
 ### Step 3.1 Build
-In the following steps, imagine we'll be working with a `Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
-- `name`: the name for our distribution (e.g. `8b-instruct`)
+In the following steps, imagine we'll be working with a `Llama3.1-8B-Instruct` model. We will name our build `tgi` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
+- `name`: the name for our distribution (e.g. `tgi`)
 - `image_type`: our build image type (`conda | docker`)
 - `distribution_spec`: our distribution specs for specifying API providers
   - `description`: a short description of the configurations for the distribution
@@ -274,16 +274,16 @@ The following command will allow you to see the available templates and their co
 llama stack build --list-templates
 ```
 
-![alt text](https://github.com/meta-llama/llama-stack/docs/resources/list-templates.png)
+![alt text](resources/list-templates.png)
 
 You may then pick a template to build your distribution with providers fitted to your liking.
 
 ```
-llama stack build --template tgi
+llama stack build --template tgi --image-type conda
 ```
 
 ```
-$ llama stack build --template tgi
+$ llama stack build --template tgi --image-type conda
 ...
 ...
 Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml
@@ -293,10 +293,10 @@ You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/e
 #### Building from config file
 - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
 
-- The config file will be of contents like the ones in `llama_stack/distributions/templates/`.
+- The config file will be of contents like the ones in `llama_stack/templates/`.
 
 ```
-$ cat llama_stack/templates/ollama/build.yaml
+$ cat build.yaml
 
 name: ollama
 distribution_spec:
@@ -311,7 +311,7 @@ image_type: conda
 ```
 
 ```
-llama stack build --config llama_stack/templates/ollama/build.yaml
+llama stack build --config build.yaml
 ```
 
 #### How to build distribution with Docker image
@@ -319,7 +319,7 @@ llama stack build --config llama_stack/templates/ollama/build.yaml
 To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
 
 ```
-llama stack build --template local --image-type docker
+llama stack build --template tgi --image-type docker
 ```
 
 Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
@@ -354,7 +354,7 @@ Build spec configuration saved at ~/.llama/distributions/docker/docker-local-bui
 ### Step 3.2 Configure
 After our distribution is built (either in form of docker or conda environment), we will run the following command to
 ```
-llama stack configure [ <docker-image-name> | <path/to/name.build.yaml>]
+llama stack configure [ <docker-image-name> | <path/to/name-build.yaml>]
 ```
 - For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
 - For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
@@ -390,10 +390,10 @@ Enter value for wolfram_api_key (optional):
 
 Configuring API: telemetry (console)
 
-YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml
+YAML configuration has been written to ~/.llama/builds/conda/8b-instruct-run.yaml
 ```
 
-After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
+After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/8b-instruct-run.yaml` with the following contents. You may edit this file to change the settings.
 
 As you can see, we did basic configuration above and configured:
 - inference to run on model `Llama3.1-8B-Instruct` (obtained from `llama model list`)
@@ -415,7 +415,7 @@ llama stack run ~/.llama/builds/conda/tgi-run.yaml
 You should see the Llama Stack server start and print the APIs that it is supporting
 
 ```
-$ llama stack run ~/.llama/builds/conda/tgi-run.yaml
+$ llama stack run ~/.llama/builds/local/conda/tgi-run.yaml
 
 > initializing model parallel with size 1
 > initializing ddp with size 1
diff --git a/docs/developer_cookbook.md b/docs/source/developer_cookbook.md
similarity index 100%
rename from docs/developer_cookbook.md
rename to docs/source/developer_cookbook.md
diff --git a/docs/source/getting_started.md b/docs/source/getting_started.md
index b1450cd42..49c7cd5a0 100644
--- a/docs/source/getting_started.md
+++ b/docs/source/getting_started.md
@@ -1,37 +1,41 @@
-# Getting Started
+# Getting Started with Llama Stack
 
-This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server. Please see our [documentations](https://github.com/meta-llama/llama-stack/README.md) on what you can do with Llama Stack, and [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) on examples apps built with Llama Stack.
+This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server. Please see our [documentations](../README.md) on what you can do with Llama Stack, and [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) on examples apps built with Llama Stack.
 
 ## Installation
 The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package.
 
-You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack`
+You have two ways to install this repository:
 
-If you want to install from source:
+1. **Install as a package**:
+   You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:
+   ```bash
+   pip install llama-stack
+   ```
 
-```bash
-mkdir -p ~/local
-cd ~/local
-git clone git@github.com:meta-llama/llama-stack.git
+2. **Install from source**:
+   If you prefer to install from the source code, follow these steps:
+   ```bash
+    mkdir -p ~/local
+    cd ~/local
+    git clone git@github.com:meta-llama/llama-stack.git
 
-conda create -n stack python=3.10
-conda activate stack
+    conda create -n stack python=3.10
+    conda activate stack
 
-cd llama-stack
-$CONDA_PREFIX/bin/pip install -e .
-```
+    cd llama-stack
+    $CONDA_PREFIX/bin/pip install -e .
+   ```
 
 For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md).
 
-## Quick Starting Llama Stack Server
+## Starting Up Llama Stack Server
 
-### Starting up server via docker
+You have two ways to start up Llama stack server:
 
-We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links.
-- [llamastack-local-gpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-gpu/general)
-  - This is a packaged version with our local meta-reference implementations, where you will be running inference locally with downloaded Llama model checkpoints.
-- [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general)
-   - This is a lite version with remote inference where you can hook up to your favourite remote inference framework (e.g. ollama, fireworks, together, tgi) for running inference without GPU.
+1. **Starting up server via docker**:
+
+We provide pre-built Docker image of Llama Stack distribution, which can be found in the following links in the [distributions](../distributions/) folder.
 
 > [!NOTE]
 > For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.
@@ -42,362 +46,132 @@ export LLAMA_CHECKPOINT_DIR=~/.llama
 > [!NOTE]
 > `~/.llama` should be the path containing downloaded weights of Llama models.
 
+To download llama models, use
+```
+llama download --model-id Llama3.1-8B-Instruct
+```
 
 To download and start running a pre-built docker container, you may use the following commands:
 
 ```
-docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
+cd llama-stack/distributions/meta-reference-gpu
+docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
 ```
 
 > [!TIP]
-> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](https://github.com/meta-llama/llama-stack/llama_stack/distribution/docker/README.md) to help you get started.
-
-### Build->Configure->Run Llama Stack server via conda
-You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
-
-**`llama stack build`**
-- You'll be prompted to enter build information interactively.
-```
-llama stack build
-
-> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack
-> Enter the image type you want your distribution to be built with (docker or conda): conda
-
- Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
-> Enter the API provider for the inference API: (default=meta-reference): meta-reference
-> Enter the API provider for the safety API: (default=meta-reference): meta-reference
-> Enter the API provider for the agents API: (default=meta-reference): meta-reference
-> Enter the API provider for the memory API: (default=meta-reference): meta-reference
-> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
-
- > (Optional) Enter a short description for your Llama Stack distribution:
-
-Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml
-You can now run `llama stack configure my-local-stack`
-```
-
-**`llama stack configure`**
-- Run `llama stack configure <name>` with the name you have previously defined in `build` step.
-```
-llama stack configure <name>
-```
-- You will be prompted to enter configurations for your Llama Stack
-
-```
-$ llama stack configure my-local-stack
-
-Configuring API `inference`...
-=== Configuring provider `meta-reference` for API inference...
-Enter value for model (default: Llama3.1-8B-Instruct) (required):
-Do you want to configure quantization? (y/n): n
-Enter value for torch_seed (optional):
-Enter value for max_seq_len (default: 4096) (required):
-Enter value for max_batch_size (default: 1) (required):
-
-Configuring API `safety`...
-=== Configuring provider `meta-reference` for API safety...
-Do you want to configure llama_guard_shield? (y/n): n
-Do you want to configure prompt_guard_shield? (y/n): n
-
-Configuring API `agents`...
-=== Configuring provider `meta-reference` for API agents...
-Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite):
-
-Configuring SqliteKVStoreConfig:
-Enter value for namespace (optional):
-Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required):
-
-Configuring API `memory`...
-=== Configuring provider `meta-reference` for API memory...
-> Please enter the supported memory bank type your provider has for memory: vector
-
-Configuring API `telemetry`...
-=== Configuring provider `meta-reference` for API telemetry...
-
-> YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml.
-You can now run `llama stack run my-local-stack --port PORT`
-```
-
-**`llama stack run`**
-- Run `llama stack run <name>` with the name you have previously defined.
-```
-llama stack run my-local-stack
-
-...
-> initializing model parallel with size 1
-> initializing ddp with size 1
-> initializing pipeline with size 1
-...
-Finished model load YES READY
-Serving POST /inference/chat_completion
-Serving POST /inference/completion
-Serving POST /inference/embeddings
-Serving POST /memory_banks/create
-Serving DELETE /memory_bank/documents/delete
-Serving DELETE /memory_banks/drop
-Serving GET /memory_bank/documents/get
-Serving GET /memory_banks/get
-Serving POST /memory_bank/insert
-Serving GET /memory_banks/list
-Serving POST /memory_bank/query
-Serving POST /memory_bank/update
-Serving POST /safety/run_shield
-Serving POST /agentic_system/create
-Serving POST /agentic_system/session/create
-Serving POST /agentic_system/turn/create
-Serving POST /agentic_system/delete
-Serving POST /agentic_system/session/delete
-Serving POST /agentic_system/session/get
-Serving POST /agentic_system/step/get
-Serving POST /agentic_system/turn/get
-Serving GET /telemetry/get_trace
-Serving POST /telemetry/log_event
-Listening on :::5000
-INFO:     Started server process [587053]
-INFO:     Waiting for application startup.
-INFO:     Application startup complete.
-INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
-```
-
-### End-to-end flow of building, configuring, running, and testing a Distribution
-
-#### Step 1. Build
-In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instruct` model. We will name our build `8b-instruct` to help us remember the config. We will start build our distribution (in the form of a Conda environment, or Docker image). In this step, we will specify:
-- `name`: the name for our distribution (e.g. `8b-instruct`)
-- `image_type`: our build image type (`conda | docker`)
-- `distribution_spec`: our distribution specs for specifying API providers
-  - `description`: a short description of the configurations for the distribution
-  - `providers`: specifies the underlying implementation for serving each API endpoint
-  - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
-
-
-At the end of build command, we will generate `<name>-build.yaml` file storing the build configurations.
-
-After this step is complete, a file named `<name>-build.yaml` will be generated and saved at the output file path specified at the end of the command.
-
-#### Building from scratch
-- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
-```
-llama stack build
-```
-
-Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs.
-
-```
-> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): 8b-instruct
-> Enter the image type you want your distribution to be built with (docker or conda): conda
-
- Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
-> Enter the API provider for the inference API: (default=meta-reference): meta-reference
-> Enter the API provider for the safety API: (default=meta-reference): meta-reference
-> Enter the API provider for the agents API: (default=meta-reference): meta-reference
-> Enter the API provider for the memory API: (default=meta-reference): meta-reference
-> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
-
- > (Optional) Enter a short description for your Llama Stack distribution:
-
-Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/8b-instruct-build.yaml
-```
-
-**Ollama (optional)**
-
-If you plan to use Ollama for inference, you'll need to install the server [via these instructions](https://ollama.com/download).
-
-
-#### Building from templates
-- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
-
-The following command will allow you to see the available templates and their corresponding providers.
-```
-llama stack build --list-templates
-```
-
-![alt text](https://github.com/meta-llama/llama-stack/docs/resources/list-templates.png)
-
-You may then pick a template to build your distribution with providers fitted to your liking.
-
-```
-llama stack build --template tgi
-```
-
-```
-$ llama stack build --template tgi
-...
-...
-Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml
-You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml`
-```
-
-#### Building from config file
-- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
-
-- The config file will be of contents like the ones in `llama_stack/distributions/templates/`.
-
-```
-$ cat llama_stack/templates/ollama/build.yaml
-
-name: ollama
-distribution_spec:
-  description: Like local, but use ollama for running LLM inference
-  providers:
-    inference: remote::ollama
-    memory: meta-reference
-    safety: meta-reference
-    agents: meta-reference
-    telemetry: meta-reference
-image_type: conda
-```
-
-```
-llama stack build --config llama_stack/templates/ollama/build.yaml
-```
-
-#### How to build distribution with Docker image
-
-> [!TIP]
-> Podman is supported as an alternative to Docker. Set `DOCKER_BINARY` to `podman` in your environment to use Podman.
-
-To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
-
-```
-llama stack build --template tgi --image-type docker
-```
-
-Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
-
-```
-name: local-docker-example
-distribution_spec:
-  description: Use code from `llama_stack` itself to serve all llama stack APIs
-  docker_image: null
-  providers:
-    inference: meta-reference
-    memory: meta-reference-faiss
-    safety: meta-reference
-    agentic_system: meta-reference
-    telemetry: console
-image_type: docker
-```
-
-The following command allows you to build a Docker image with the name `<name>`
-```
-llama stack build --config <name>-build.yaml
-
-Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
-WORKDIR /app
-...
-...
-You can run it with: podman run -p 8000:8000 llamastack-docker-local
-Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
-```
-
-
-### Step 2. Configure
-After our distribution is built (either in form of docker or conda environment), we will run the following command to
-```
-llama stack configure [ <docker-image-name> | <path/to/name.build.yaml>]
-```
-- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
-- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
-   - Run `docker images` to check list of available images on your machine.
-
-```
-$ llama stack configure tgi
-
-Configuring API: inference (meta-reference)
-Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
-Enter value for quantization (optional):
-Enter value for torch_seed (optional):
-Enter value for max_seq_len (existing: 4096) (required):
-Enter value for max_batch_size (existing: 1) (required):
-
-Configuring API: memory (meta-reference-faiss)
-
-Configuring API: safety (meta-reference)
-Do you want to configure llama_guard_shield? (y/n): y
-Entering sub-configuration for llama_guard_shield:
-Enter value for model (default: Llama-Guard-3-1B) (required):
-Enter value for excluded_categories (default: []) (required):
-Enter value for disable_input_check (default: False) (required):
-Enter value for disable_output_check (default: False) (required):
-Do you want to configure prompt_guard_shield? (y/n): y
-Entering sub-configuration for prompt_guard_shield:
-Enter value for model (default: Prompt-Guard-86M) (required):
-
-Configuring API: agentic_system (meta-reference)
-Enter value for brave_search_api_key (optional):
-Enter value for bing_search_api_key (optional):
-Enter value for wolfram_api_key (optional):
-
-Configuring API: telemetry (console)
-
-YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml
-```
-
-After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
-
-As you can see, we did basic configuration above and configured:
-- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
-- Llama Guard safety shield with model `Llama-Guard-3-1B`
-- Prompt Guard safety shield with model `Prompt-Guard-86M`
-
-For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
-
-Note that all configurations as well as models are stored in `~/.llama`
-
-
-### Step 3. Run
-Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
-
-```
-llama stack run tgi
-```
-
-You should see the Llama Stack server start and print the APIs that it is supporting
-
-```
-$ llama stack run tgi
-
-> initializing model parallel with size 1
-> initializing ddp with size 1
-> initializing pipeline with size 1
-Loaded in 19.28 seconds
-NCCL version 2.20.5+cuda12.4
-Finished model load YES READY
-Serving POST /inference/batch_chat_completion
-Serving POST /inference/batch_completion
-Serving POST /inference/chat_completion
-Serving POST /inference/completion
-Serving POST /safety/run_shield
-Serving POST /agentic_system/memory_bank/attach
-Serving POST /agentic_system/create
-Serving POST /agentic_system/session/create
-Serving POST /agentic_system/turn/create
-Serving POST /agentic_system/delete
-Serving POST /agentic_system/session/delete
-Serving POST /agentic_system/memory_bank/detach
-Serving POST /agentic_system/session/get
-Serving POST /agentic_system/step/get
-Serving POST /agentic_system/turn/get
-Listening on :::5000
-INFO:     Started server process [453333]
-INFO:     Waiting for application startup.
-INFO:     Application startup complete.
-INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
-```
-
-> [!NOTE]
-> Configuration is in `~/.llama/builds/local/conda/8b-instruct-run.yaml`. Feel free to increase `max_seq_len`.
-
-> [!IMPORTANT]
-> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
-
-> [!TIP]
-> You might need to use the flag `--disable-ipv6` to  Disable IPv6 support
-
-This server is running a Llama model locally.
-
-### Step 4. Test with Client
+> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started.
+
+
+2. **Build->Configure->Run Llama Stack server via conda**:
+
+	You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
+
+	**`llama stack build`**
+	- You'll be prompted to enter build information interactively.
+	```
+	llama stack build
+
+	> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack
+	> Enter the image type you want your distribution to be built with (docker or conda): conda
+
+	Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
+	> Enter the API provider for the inference API: (default=meta-reference): meta-reference
+	> Enter the API provider for the safety API: (default=meta-reference): meta-reference
+	> Enter the API provider for the agents API: (default=meta-reference): meta-reference
+	> Enter the API provider for the memory API: (default=meta-reference): meta-reference
+	> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
+
+	> (Optional) Enter a short description for your Llama Stack distribution:
+
+	Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml
+	You can now run `llama stack configure my-local-stack`
+	```
+
+	**`llama stack configure`**
+	- Run `llama stack configure <name>` with the name you have previously defined in `build` step.
+	```
+	llama stack configure <name>
+	```
+	- You will be prompted to enter configurations for your Llama Stack
+
+	```
+	$ llama stack configure my-local-stack
+
+	Configuring API `inference`...
+	=== Configuring provider `meta-reference` for API inference...
+	Enter value for model (default: Llama3.1-8B-Instruct) (required):
+	Do you want to configure quantization? (y/n): n
+	Enter value for torch_seed (optional):
+	Enter value for max_seq_len (default: 4096) (required):
+	Enter value for max_batch_size (default: 1) (required):
+
+	Configuring API `safety`...
+	=== Configuring provider `meta-reference` for API safety...
+	Do you want to configure llama_guard_shield? (y/n): n
+	Do you want to configure prompt_guard_shield? (y/n): n
+
+	Configuring API `agents`...
+	=== Configuring provider `meta-reference` for API agents...
+	Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite):
+
+	Configuring SqliteKVStoreConfig:
+	Enter value for namespace (optional):
+	Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required):
+
+	Configuring API `memory`...
+	=== Configuring provider `meta-reference` for API memory...
+	> Please enter the supported memory bank type your provider has for memory: vector
+
+	Configuring API `telemetry`...
+	=== Configuring provider `meta-reference` for API telemetry...
+
+	> YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml.
+	You can now run `llama stack run my-local-stack --port PORT`
+	```
+
+	**`llama stack run`**
+	- Run `llama stack run <name>` with the name you have previously defined.
+	```
+	llama stack run my-local-stack
+
+	...
+	> initializing model parallel with size 1
+	> initializing ddp with size 1
+	> initializing pipeline with size 1
+	...
+	Finished model load YES READY
+	Serving POST /inference/chat_completion
+	Serving POST /inference/completion
+	Serving POST /inference/embeddings
+	Serving POST /memory_banks/create
+	Serving DELETE /memory_bank/documents/delete
+	Serving DELETE /memory_banks/drop
+	Serving GET /memory_bank/documents/get
+	Serving GET /memory_banks/get
+	Serving POST /memory_bank/insert
+	Serving GET /memory_banks/list
+	Serving POST /memory_bank/query
+	Serving POST /memory_bank/update
+	Serving POST /safety/run_shield
+	Serving POST /agentic_system/create
+	Serving POST /agentic_system/session/create
+	Serving POST /agentic_system/turn/create
+	Serving POST /agentic_system/delete
+	Serving POST /agentic_system/session/delete
+	Serving POST /agentic_system/session/get
+	Serving POST /agentic_system/step/get
+	Serving POST /agentic_system/turn/get
+	Serving GET /telemetry/get_trace
+	Serving POST /telemetry/log_event
+	Listening on :::5000
+	INFO:     Started server process [587053]
+	INFO:     Waiting for application startup.
+	INFO:     Application startup complete.
+	INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
+	```
+
+
+## Testing with client
 Once the server is setup, we can test it with a client to see the example outputs.
 ```
 cd /path/to/llama-stack
@@ -406,7 +180,7 @@ conda activate <env>  # any environment containing the llama-stack pip package w
 python -m llama_stack.apis.inference.client localhost 5000
 ```
 
-This will run the chat completion client and query the distribution’s /inference/chat_completion API.
+This will run the chat completion client and query the distribution’s `/inference/chat_completion` API.
 
 Here is an example output:
 ```
@@ -417,6 +191,29 @@ The moon glows softly in the midnight sky,
 A beacon of wonder, as it passes by.
 ```
 
+You may also send a POST request to the server:
+```
+curl http://localhost:5000/inference/chat_completion \
+-H "Content-Type: application/json" \
+-d '{
+	"model": "Llama3.1-8B-Instruct",
+	"messages": [
+		{"role": "system", "content": "You are a helpful assistant."},
+		{"role": "user", "content": "Write me a 2 sentence poem about the moon"}
+	],
+	"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
+}'
+
+Output:
+{'completion_message': {'role': 'assistant',
+  'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.',
+  'stop_reason': 'out_of_tokens',
+  'tool_calls': []},
+ 'logprobs': null}
+
+```
+
+
 Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:
 
 ```
@@ -427,3 +224,7 @@ python -m llama_stack.apis.safety.client localhost 5000
 Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
 
 You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
+
+
+## Advanced Guides
+Please see our [Building a LLama Stack Distribution](./building_distro.md) guide for more details on how to assemble your own Llama Stack Distribution.
diff --git a/docs/new_api_provider.md b/docs/source/providers_dev.md
similarity index 100%
rename from docs/new_api_provider.md
rename to docs/source/providers_dev.md