Merge remote-tracking branch 'upstream/main' into patch-1

2025-12-23 00:59:41 +00:00 · 2025-08-07 11:53:07 -04:00 · 2025-08-07 11:53:07 -04:00 · 688f42ba3a
commit 688f42ba3a
parent 92c2edd61c e3928e6a29
586 changed files with 65959 additions and 3793 deletions
--- a/docs/source/advanced_apis/eval/inline_meta-reference.md
+++ b/docs/source/advanced_apis/eval/inline_meta-reference.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # inline::meta-reference

 ## Description
--- a/docs/source/advanced_apis/eval/remote_nvidia.md
+++ b/docs/source/advanced_apis/eval/remote_nvidia.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # remote::nvidia

 ## Description
--- a/docs/source/advanced_apis/evaluation_concepts.md
+++ b/docs/source/advanced_apis/evaluation_concepts.md
@ -43,7 +43,7 @@ We have built-in functionality to run the supported open-benckmarks using llama-

 Spin up llama stack server with 'open-benchmark' template
 ```
-llama stack run llama_stack/templates/open-benchmark/run.yaml
+llama stack run llama_stack/distributions/open-benchmark/run.yaml

 ```

--- a/docs/source/advanced_apis/post_training/huggingface.md
+++ b/docs/source/advanced_apis/post_training/huggingface.md
@ -23,7 +23,7 @@ To use the HF SFTTrainer in your Llama Stack project, follow these steps:
 You can access the HuggingFace trainer via the `ollama` distribution:

 ```bash
-llama stack build --template starter --image-type venv
+llama stack build --distro starter --image-type venv
 llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
 ```

--- a/docs/source/advanced_apis/post_training/inline_huggingface.md
+++ b/docs/source/advanced_apis/post_training/inline_huggingface.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # inline::huggingface

 ## Description
--- a/docs/source/advanced_apis/post_training/inline_torchtune.md
+++ b/docs/source/advanced_apis/post_training/inline_torchtune.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # inline::torchtune

 ## Description
--- a/docs/source/advanced_apis/post_training/remote_nvidia.md
+++ b/docs/source/advanced_apis/post_training/remote_nvidia.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # remote::nvidia

 ## Description
--- a/docs/source/advanced_apis/scoring/inline_basic.md
+++ b/docs/source/advanced_apis/scoring/inline_basic.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # inline::basic

 ## Description
--- a/docs/source/advanced_apis/scoring/inline_braintrust.md
+++ b/docs/source/advanced_apis/scoring/inline_braintrust.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # inline::braintrust

 ## Description
--- a/docs/source/advanced_apis/scoring/inline_llm-as-judge.md
+++ b/docs/source/advanced_apis/scoring/inline_llm-as-judge.md
@ -1,3 +1,7 @@
+---
+orphan: true
+---
+
 # inline::llm-as-judge

 ## Description
--- a/docs/source/apis/external.md
+++ b/docs/source/apis/external.md
@ -355,7 +355,7 @@ server:
 8. Run the server:

 ```bash
-python -m llama_stack.distribution.server.server --yaml-config ~/.llama/run-byoa.yaml
+python -m llama_stack.core.server.server --yaml-config ~/.llama/run-byoa.yaml
 ```

 9. Test the API:
--- a/docs/source/building_applications/playground/index.md
+++ b/docs/source/building_applications/playground/index.md
@ -97,11 +97,11 @@ To start the Llama Stack Playground, run the following commands:
 1. Start up the Llama Stack API server

 ```bash
-llama stack build --template together --image-type conda
+llama stack build --distro together --image-type venv
 llama stack run together
 ```

 2. Start Streamlit UI
 ```bash
-uv run --with ".[ui]" streamlit run llama_stack/distribution/ui/app.py
+uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py
 ```
--- a/docs/source/contributing/index.md
+++ b/docs/source/contributing/index.md
@ -11,4 +11,5 @@ See the [Adding a New API Provider](new_api_provider.md) which describes how to
 :hidden:

 new_api_provider
+testing
 ```
--- a/docs/source/contributing/new_api_provider.md
+++ b/docs/source/contributing/new_api_provider.md
@ -6,7 +6,7 @@ This guide will walk you through the process of adding a new API provider to Lla
 - Begin by reviewing the [core concepts](../concepts/index.md) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
 - Determine the provider type ({repopath}`Remote::llama_stack/providers/remote` or {repopath}`Inline::llama_stack/providers/inline`). Remote providers make requests to external services, while inline providers execute implementation locally.
 - Add your provider to the appropriate {repopath}`Registry::llama_stack/providers/registry/`. Specify pip dependencies necessary.
- Update any distribution {repopath}`Templates::llama_stack/templates/` `build.yaml` and `run.yaml` files if they should include your provider by default. Run {repopath}`./scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
+- Update any distribution {repopath}`Templates::llama_stack/distributions/` `build.yaml` and `run.yaml` files if they should include your provider by default. Run {repopath}`./scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.


 Here are some example PRs to help you get started:
@ -52,7 +52,7 @@ def get_base_url(self) -> str:

 ## Testing the Provider

-Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, you should install dependencies via `llama stack build --template together`.
+Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, you should install dependencies via `llama stack build --distro together`.

 ### 1. Integration Testing

--- a/docs/source/deploying/kubernetes_deployment.md
+++ b/docs/source/deploying/kubernetes_deployment.md
@ -174,7 +174,7 @@ spec:
      - name: llama-stack
        image: localhost/llama-stack-run-k8s:latest
        imagePullPolicy: IfNotPresent
-        command: ["python", "-m", "llama_stack.distribution.server.server", "--config", "/app/config.yaml"]
+        command: ["python", "-m", "llama_stack.core.server.server", "--config", "/app/config.yaml"]
        ports:
          - containerPort: 5000
        volumeMounts:
--- a/docs/source/distributions/building_distro.md
+++ b/docs/source/distributions/building_distro.md
@ -47,30 +47,37 @@ pip install -e .
 ```
 Use the CLI to build your distribution.
 The main points to consider are:
-1. **Image Type** - Do you want a Conda / venv environment or a Container (eg. Docker)
+1. **Image Type** - Do you want a venv environment or a Container (eg. Docker)
 2. **Template** - Do you want to use a template to build your distribution? or start from scratch ?
 3. **Config** - Do you want to use a pre-existing config file to build your distribution?

 ```
 llama stack build -h
-usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {conda,container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]
+usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--distro DISTRIBUTION] [--list-distros] [--image-type {container,venv}] [--image-name IMAGE_NAME] [--print-deps-only]
+                         [--run] [--providers PROVIDERS]

 Build a Llama stack container

 options:
  -h, --help            show this help message and exit
-  --config CONFIG       Path to a config file to use for the build. You can find example configs in llama_stack/distributions/**/build.yaml. If this argument is not provided, you will
-                        be prompted to enter information interactively (default: None)
-  --template TEMPLATE   Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
-  --list-templates      Show the available templates for building a Llama Stack distribution (default: False)
-  --image-type {conda,container,venv}
+  --config CONFIG       Path to a config file to use for the build. You can find example configs in llama_stack.cores/**/build.yaml. If this argument is not provided, you will be prompted to
+                        enter information interactively (default: None)
+  --template TEMPLATE   (deprecated) Name of the example template config to use for build. You may use `llama stack build --list-distros` to check out the available distributions (default:
+                        None)
+  --distro DISTRIBUTION, --distribution DISTRIBUTION
+                        Name of the distribution to use for build. You may use `llama stack build --list-distros` to check out the available distributions (default: None)
+  --list-distros, --list-distributions
+                        Show the available distributions for building a Llama Stack distribution (default: False)
+  --image-type {container,venv}
                        Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
  --image-name IMAGE_NAME
-                        [for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if
-                        found. (default: None)
+                        [for image-type=container|venv] Name of the virtual environment to use for the build. If not specified, currently active environment will be used if found. (default:
+                        None)
  --print-deps-only     Print the dependencies for the stack only, without building the stack (default: False)
  --run                 Run the stack after building using the same image type, name, and other applicable arguments (default: False)
-
+  --providers PROVIDERS
+                        Build a config for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple providers per
+                        API. (default: None)
 ```

 After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
@ -141,7 +148,7 @@ You may then pick a template to build your distribution with providers fitted to

 For example, to build a distribution with TGI as the inference provider, you can run:
 ```
-$ llama stack build --template starter
+$ llama stack build --distro starter
 ...
 You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-starter/starter-run.yaml`
 ```
@ -159,7 +166,7 @@ It would be best to start with a template and understand the structure of the co
 llama stack build

 > Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
-> Enter the image type you want your Llama Stack to be built as (container or conda or venv): conda
+> Enter the image type you want your Llama Stack to be built as (container or venv): venv

 Llama Stack is composed of several APIs working together. Let's select
 the provider types (implementations) you want to use for these APIs.
@ -184,10 +191,10 @@ You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack
 :::{tab-item} Building from a pre-existing build config file
 - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.

- The config file will be of contents like the ones in `llama_stack/templates/*build.yaml`.
+- The config file will be of contents like the ones in `llama_stack/distributions/*build.yaml`.

 ```
-llama stack build --config llama_stack/templates/starter/build.yaml
+llama stack build --config llama_stack/distributions/starter/build.yaml
 ```
 :::

@ -253,11 +260,11 @@ Podman is supported as an alternative to Docker. Set `CONTAINER_BINARY` to `podm
 To build a container image, you may start off from a template and use the `--image-type container` flag to specify `container` as the build image type.

 ```
-llama stack build --template starter --image-type container
+llama stack build --distro starter --image-type container
 ```

 ```
-$ llama stack build --template starter --image-type container
+$ llama stack build --distro starter --image-type container
 ...
 Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim
 ...
@ -312,7 +319,7 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
 ```
 llama stack run -h
 usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
-                       [--image-type {conda,venv}] [--enable-ui]
+                       [--image-type {venv}] [--enable-ui]
                       [config | template]

 Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
@ -326,8 +333,8 @@ options:
  --image-name IMAGE_NAME
                        Name of the image to run. Defaults to the current environment (default: None)
  --env KEY=VALUE       Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
-  --image-type {conda,venv}
-                        Image Type used during the build. This can be either conda or venv. (default: None)
+  --image-type {venv}
+                        Image Type used during the build. This should be venv. (default: None)
  --enable-ui           Start the UI server (default: False)
 ```

@ -342,9 +349,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-

 # Start using a venv
 llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
-
-# Start using a conda environment
-llama stack run --image-type conda ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
 ```

 ```
--- a/docs/source/distributions/configuration.md
+++ b/docs/source/distributions/configuration.md
@ -10,7 +10,6 @@ The default `run.yaml` files generated by templates are starting points for your

 ```yaml
 version: 2
-conda_env: ollama
 apis:
 - agents
 - inference
--- a/docs/source/distributions/importing_as_library.md
+++ b/docs/source/distributions/importing_as_library.md
@ -6,11 +6,11 @@ This avoids the overhead of setting up a server.
 ```bash
 # setup
 uv pip install llama-stack
-llama stack build --template starter --image-type venv
+llama stack build --distro starter --image-type venv
 ```

 ```python
-from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
+from llama_stack.core.library_client import LlamaStackAsLibraryClient

 client = LlamaStackAsLibraryClient(
    "starter",
--- a/docs/source/distributions/index.md
+++ b/docs/source/distributions/index.md
@ -9,6 +9,7 @@ This section provides an overview of the distributions available in Llama Stack.
 list_of_distributions
 building_distro
 customizing_run_yaml
+starting_llama_stack_server
 importing_as_library
 configuration
 ```
--- a/docs/source/distributions/k8s/stack-configmap.yaml
+++ b/docs/source/distributions/k8s/stack-configmap.yaml
@ -34,6 +34,13 @@ data:
        provider_type: remote::chromadb
        config:
          url: ${env.CHROMADB_URL:=}
+          kvstore:
+            type: postgres
+            host: ${env.POSTGRES_HOST:=localhost}
+            port: ${env.POSTGRES_PORT:=5432}
+            db: ${env.POSTGRES_DB:=llamastack}
+            user: ${env.POSTGRES_USER:=llamastack}
+            password: ${env.POSTGRES_PASSWORD:=llamastack}
      safety:
      - provider_id: llama-guard
        provider_type: inline::llama-guard
--- a/docs/source/distributions/k8s/stack-k8s.yaml.template
+++ b/docs/source/distributions/k8s/stack-k8s.yaml.template
@ -52,7 +52,7 @@ spec:
          value: "${SAFETY_MODEL}"
        - name: TAVILY_SEARCH_API_KEY
          value: "${TAVILY_SEARCH_API_KEY}"
-        command: ["python", "-m", "llama_stack.distribution.server.server", "--config", "/etc/config/stack_run_config.yaml", "--port", "8321"]
+        command: ["python", "-m", "llama_stack.core.server.server", "--config", "/etc/config/stack_run_config.yaml", "--port", "8321"]
        ports:
          - containerPort: 8321
        volumeMounts:
--- a/docs/source/distributions/k8s/stack_run_config.yaml
+++ b/docs/source/distributions/k8s/stack_run_config.yaml
@ -31,6 +31,13 @@ providers:
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL:=}
+      kvstore:
+        type: postgres
+        host: ${env.POSTGRES_HOST:=localhost}
+        port: ${env.POSTGRES_PORT:=5432}
+        db: ${env.POSTGRES_DB:=llamastack}
+        user: ${env.POSTGRES_USER:=llamastack}
+        password: ${env.POSTGRES_PASSWORD:=llamastack}
  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
--- a/docs/source/distributions/ondevice_distro/android_sdk.md
+++ b/docs/source/distributions/ondevice_distro/android_sdk.md
@ -56,12 +56,12 @@ Breaking down the demo app, this section will show the core pieces that are used
 ### Setup Remote Inferencing
 Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution:
 ```
-conda create -n stack-fireworks python=3.10
-conda activate stack-fireworks
+uv venv starter --python 3.12
+source starter/bin/activate  # On Windows: starter\Scripts\activate
 pip install --no-cache llama-stack==0.2.2
-llama stack build --template fireworks --image-type conda
+llama stack build --distro starter --image-type venv
 export FIREWORKS_API_KEY=<SOME_KEY>
-llama stack run fireworks --port 5050
+llama stack run starter --port 5050
 ```

 Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.
--- a/docs/source/distributions/remote_hosted_distro/watsonx.md
+++ b/docs/source/distributions/remote_hosted_distro/watsonx.md
@ -57,7 +57,7 @@ Make sure you have access to a watsonx API Key. You can get one by referring [wa

 ## Running Llama Stack with watsonx

-You can do this via Conda (build code), venv or Docker which has a pre-built image.
+You can do this via venv or Docker which has a pre-built image.

 ### Via Docker

@ -76,13 +76,3 @@ docker run \
  --env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
  --env WATSONX_BASE_URL=$WATSONX_BASE_URL
 ```
-
-### Via Conda
-
-```bash
-llama stack build --template watsonx --image-type conda
-llama stack run ./run.yaml \
-  --port $LLAMA_STACK_PORT \
-  --env WATSONX_API_KEY=$WATSONX_API_KEY \
-  --env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID
-```
--- a/docs/source/distributions/self_hosted_distro/dell.md
+++ b/docs/source/distributions/self_hosted_distro/dell.md
@ -114,7 +114,7 @@ podman run --rm -it \

 ## Running Llama Stack

-Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via Conda (build code) or Docker which has a pre-built image.
+Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via venv or Docker which has a pre-built image.

 ### Via Docker

@ -153,7 +153,7 @@ docker run \
  --pull always \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v $HOME/.llama:/root/.llama \
-  -v ./llama_stack/templates/tgi/run-with-safety.yaml:/root/my-run.yaml \
+  -v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-run.yaml \
  llamastack/distribution-dell \
  --config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
@ -164,12 +164,12 @@ docker run \
  --env CHROMA_URL=$CHROMA_URL
 ```

-### Via Conda
+### Via venv

 Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.

 ```bash
-llama stack build --template dell --image-type conda
+llama stack build --distro dell --image-type venv
 llama stack run dell
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
--- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
@ -70,7 +70,7 @@ $ llama model list --downloaded

 ## Running the Distribution

-You can do this via Conda (build code) or Docker which has a pre-built image.
+You can do this via venv or Docker which has a pre-built image.

 ### Via Docker

@ -104,12 +104,12 @@ docker run \
  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
 ```

-### Via Conda
+### Via venv

 Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.

 ```bash
-llama stack build --template meta-reference-gpu --image-type conda
+llama stack build --distro meta-reference-gpu --image-type venv
 llama stack run distributions/meta-reference-gpu/run.yaml \
  --port 8321 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
--- a/docs/source/distributions/self_hosted_distro/nvidia.md
+++ b/docs/source/distributions/self_hosted_distro/nvidia.md
@ -133,7 +133,7 @@ curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-inst

 ## Running Llama Stack with NVIDIA

-You can do this via Conda or venv (build code), or Docker which has a pre-built image.
+You can do this via venv (build code), or Docker which has a pre-built image.

 ### Via Docker

@ -152,24 +152,13 @@ docker run \
  --env NVIDIA_API_KEY=$NVIDIA_API_KEY
 ```

-### Via Conda
-
-```bash
-INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct
-llama stack build --template nvidia --image-type conda
-llama stack run ./run.yaml \
-  --port 8321 \
-  --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
-  --env INFERENCE_MODEL=$INFERENCE_MODEL
-```
-
 ### Via venv

 If you've set up your local development environment, you can also build the image using your local virtual environment.

 ```bash
 INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct
-llama stack build --template nvidia --image-type venv
+llama stack build --distro nvidia --image-type venv
 llama stack run ./run.yaml \
  --port 8321 \
  --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
--- a/docs/source/distributions/self_hosted_distro/starter.md
+++ b/docs/source/distributions/self_hosted_distro/starter.md
@ -100,10 +100,6 @@ The following environment variables can be configured:
 ### Model Configuration
 - `INFERENCE_MODEL`: HuggingFace model for serverless inference
 - `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
- `OLLAMA_INFERENCE_MODEL`: Ollama model name
- `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name
- `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`)
- `VLLM_INFERENCE_MODEL`: vLLM model name

 ### Vector Database Configuration
 - `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
@ -127,47 +123,29 @@ The following environment variables can be configured:

 ## Enabling Providers

-You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys.
+You can enable specific providers by setting appropriate environment variables. For example,

-### Examples of Enabling Providers
-
-#### Enable FAISS Vector Provider
 ```bash
-export ENABLE_FAISS=faiss
+# self-hosted
+export OLLAMA_URL=http://localhost:11434   # enables the Ollama inference provider
+export VLLM_URL=http://localhost:8000/v1   # enables the vLLM inference provider
+export TGI_URL=http://localhost:8000/v1   # enables the TGI inference provider
+
+# cloud-hosted requiring API key configuration on the server
+export CEREBRAS_API_KEY=your_cerebras_api_key   # enables the Cerebras inference provider
+export NVIDIA_API_KEY=your_nvidia_api_key   # enables the NVIDIA inference provider
+
+# vector providers
+export MILVUS_URL=http://localhost:19530   # enables the Milvus vector provider
+export CHROMADB_URL=http://localhost:8000/v1   # enables the ChromaDB vector provider
+export PGVECTOR_DB=llama_stack_db   # enables the PGVector vector provider
 ```

-#### Enable Ollama Models
-```bash
-export ENABLE_OLLAMA=ollama
-```
-
-#### Disable vLLM Models
-```bash
-export VLLM_INFERENCE_MODEL=__disabled__
-```
-
-#### Disable Optional Vector Providers
-```bash
-export ENABLE_SQLITE_VEC=__disabled__
-export ENABLE_CHROMADB=__disabled__
-export ENABLE_PGVECTOR=__disabled__
-```
-
-### Provider ID Patterns
-
-The starter distribution uses several patterns for provider IDs:
-
-1. **Direct provider IDs**: `faiss`, `ollama`, `vllm`
-2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC:+sqlite-vec}`
-3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`
-
-When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`.
-
-When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
+This distribution comes with a default "llama-guard" shield that can be enabled by setting the `SAFETY_MODEL` environment variable to point to an appropriate Llama Guard model id. Use `llama-stack-client models list` to see the list of available models.

 ## Running the Distribution

-You can run the starter distribution via Docker, Conda, or venv.
+You can run the starter distribution via Docker or venv.

 ### Via Docker

@ -186,12 +164,12 @@ docker run \
  --port $LLAMA_STACK_PORT
 ```

-### Via Conda or venv
+### Via venv

 Ensure you have configured the starter distribution using the environment variables explained above.

 ```bash
-uv run --with llama-stack llama stack build --template starter --image-type <conda|venv> --run
+uv run --with llama-stack llama stack build --distro starter --image-type venv --run
 ```

 ## Example Usage
--- a/docs/source/distributions/starting_llama_stack_server.md
+++ b/docs/source/distributions/starting_llama_stack_server.md
@ -11,12 +11,6 @@ This is the simplest way to get started. Using Llama Stack as a library means yo

 Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.

-
-## Conda:
-
-If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
-
-
 ## Kubernetes:

 If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
--- a/docs/source/getting_started/detailed_tutorial.md
+++ b/docs/source/getting_started/detailed_tutorial.md
@ -59,10 +59,10 @@ Now let's build and run the Llama Stack config for Ollama.
 We use `starter` as template. By default all providers are disabled, this requires enable ollama by passing environment variables.

 ```bash
-llama stack build --template starter --image-type venv --run
+llama stack build --distro starter --image-type venv --run
 ```
 :::
-:::{tab-item} Using `conda`
+:::{tab-item} Using `venv`
 You can use Python to build and run the Llama Stack server, which is useful for testing and development.

 Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
@ -70,7 +70,7 @@ which defines the providers and their settings.
 Now let's build and run the Llama Stack config for Ollama.

 ```bash
-llama stack build --template starter --image-type conda --run
+llama stack build --distro starter --image-type venv --run
 ```
 :::
 :::{tab-item} Using a Container
@ -150,13 +150,7 @@ pip install llama-stack-client
 ```
 :::

-:::{tab-item} Install with `conda`
-```bash
-yes | conda create -n stack-client python=3.12
-conda activate stack-client
-pip install llama-stack-client
-```
-:::
+
 ::::

 Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the
--- a/docs/source/getting_started/quickstart.md
+++ b/docs/source/getting_started/quickstart.md
@ -16,10 +16,13 @@ as the inference [provider](../providers/inference/index) for a Llama Model.
 ```bash
 ollama run llama3.2:3b --keepalive 60m
 ```
+
 #### Step 2: Run the Llama Stack server
+
 We will use `uv` to run the Llama Stack server.
 ```bash
-uv run --with llama-stack llama stack build --template starter --image-type venv --run
+OLLAMA_URL=http://localhost:11434 \
+  uv run --with llama-stack llama stack build --distro starter --image-type venv --run
 ```
 #### Step 3: Run the demo
 Now open up a new terminal and copy the following script into a file named `demo_script.py`.
--- a/docs/source/providers/agents/index.md
+++ b/docs/source/providers/agents/index.md
@ -1,5 +1,13 @@
-# Agents Providers
+# Agents
+
+## Overview

 This section contains documentation for all available providers for the **agents** API.

- [inline::meta-reference](inline_meta-reference.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_meta-reference
+```
--- a/docs/source/providers/datasetio/index.md
+++ b/docs/source/providers/datasetio/index.md
@ -1,7 +1,15 @@
-# Datasetio Providers
+# Datasetio
+
+## Overview

 This section contains documentation for all available providers for the **datasetio** API.

- [inline::localfs](inline_localfs.md)
- [remote::huggingface](remote_huggingface.md)
- [remote::nvidia](remote_nvidia.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_localfs
+remote_huggingface
+remote_nvidia
+```
--- a/docs/source/providers/eval/index.md
+++ b/docs/source/providers/eval/index.md
@ -1,6 +1,14 @@
-# Eval Providers
+# Eval
+
+## Overview

 This section contains documentation for all available providers for the **eval** API.

- [inline::meta-reference](inline_meta-reference.md)
- [remote::nvidia](remote_nvidia.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_meta-reference
+remote_nvidia
+```
--- a/docs/source/providers/external/external-providers-guide.md
+++ b/docs/source/providers/external/external-providers-guide.md
@ -1,9 +1,4 @@
-# External Providers Guide
-
-Llama Stack supports external providers that live outside of the main codebase. This allows you to:
- Create and maintain your own providers independently
- Share providers with others without contributing to the main codebase
- Keep provider-specific code separate from the core Llama Stack code
+# Creating External Providers

 ## Configuration

@ -55,17 +50,6 @@ Llama Stack supports two types of external providers:
 1. **Remote Providers**: Providers that communicate with external services (e.g., cloud APIs)
 2. **Inline Providers**: Providers that run locally within the Llama Stack process

-## Known External Providers
-
-Here's a list of known external providers that you can use with Llama Stack:
-
-| Name | Description | API | Type | Repository |
-|------|-------------|-----|------|------------|
-| KubeFlow Training | Train models with KubeFlow | Post Training | Remote | [llama-stack-provider-kft](https://github.com/opendatahub-io/llama-stack-provider-kft) |
-| KubeFlow Pipelines | Train models with KubeFlow Pipelines | Post Training | Inline **and** Remote | [llama-stack-provider-kfp-trainer](https://github.com/opendatahub-io/llama-stack-provider-kfp-trainer) |
-| RamaLama | Inference models with RamaLama | Inference | Remote | [ramalama-stack](https://github.com/containers/ramalama-stack) |
-| TrustyAI LM-Eval | Evaluate models with TrustyAI LM-Eval | Eval | Remote | [llama-stack-provider-lmeval](https://github.com/trustyai-explainability/llama-stack-provider-lmeval) |
-
 ### Remote Provider Specification

 Remote providers are used when you need to communicate with external services. Here's an example for a custom Ollama provider:
@ -119,9 +103,9 @@ container_image: custom-vector-store:latest  # optional
 - `provider_data_validator`: Optional validator for provider data
 - `container_image`: Optional container image to use instead of pip packages

-## Required Implementation
+## Required Fields

-## All Providers
+### All Providers

 All providers must contain a `get_provider_spec` function in their `provider` module. This is a standardized structure that Llama Stack expects and is necessary for getting things such as the config class. The `get_provider_spec` method returns a structure identical to the `adapter`. An example function may look like:

@ -146,7 +130,7 @@ def get_provider_spec() -> ProviderSpec:
    )
 ```

-### Remote Providers
+#### Remote Providers

 Remote providers must expose a `get_adapter_impl()` function in their module that takes two arguments:
 1. `config`: An instance of the provider's config class
@ -162,7 +146,7 @@ async def get_adapter_impl(
    return OllamaInferenceAdapter(config)
 ```

-### Inline Providers
+#### Inline Providers

 Inline providers must expose a `get_provider_impl()` function in their module that takes two arguments:
 1. `config`: An instance of the provider's config class
@ -189,7 +173,40 @@ Version: 0.1.0
 Location: /path/to/venv/lib/python3.10/site-packages
 ```

-## Example using `external_providers_dir`: Custom Ollama Provider
+## Best Practices
+
+1. **Package Naming**: Use the prefix `llama-stack-provider-` for your provider packages to make them easily identifiable.
+
+2. **Version Management**: Keep your provider package versioned and compatible with the Llama Stack version you're using.
+
+3. **Dependencies**: Only include the minimum required dependencies in your provider package.
+
+4. **Documentation**: Include clear documentation in your provider package about:
+   - Installation requirements
+   - Configuration options
+   - Usage examples
+   - Any limitations or known issues
+
+5. **Testing**: Include tests in your provider package to ensure it works correctly with Llama Stack.
+You can refer to the [integration tests
+guide](https://github.com/meta-llama/llama-stack/blob/main/tests/integration/README.md) for more
+information. Execute the test for the Provider type you are developing.
+
+## Troubleshooting
+
+If your external provider isn't being loaded:
+
+1. Check that `module` points to a published pip package with a top level `provider` module including `get_provider_spec`.
+1. Check that the `external_providers_dir` path is correct and accessible.
+2. Verify that the YAML files are properly formatted.
+3. Ensure all required Python packages are installed.
+4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more
+   information using `LLAMA_STACK_LOGGING=all=debug`.
+5. Verify that the provider package is installed in your Python environment if using `external_providers_dir`.
+
+## Examples
+
+### Example using `external_providers_dir`: Custom Ollama Provider

 Here's a complete example of creating and using a custom Ollama provider:

@ -241,7 +258,7 @@ external_providers_dir: ~/.llama/providers.d/
 The provider will now be available in Llama Stack with the type `remote::custom_ollama`.


-## Example using `module`: ramalama-stack
+### Example using `module`: ramalama-stack

 [ramalama-stack](https://github.com/containers/ramalama-stack) is a recognized external provider that supports installation via module.

@ -266,35 +283,4 @@ additional_pip_packages:

 No other steps are required other than `llama stack build` and `llama stack run`. The build process will use `module` to install all of the provider dependencies, retrieve the spec, etc.

-The provider will now be available in Llama Stack with the type `remote::ramalama`.
-
-## Best Practices
-
-1. **Package Naming**: Use the prefix `llama-stack-provider-` for your provider packages to make them easily identifiable.
-
-2. **Version Management**: Keep your provider package versioned and compatible with the Llama Stack version you're using.
-
-3. **Dependencies**: Only include the minimum required dependencies in your provider package.
-
-4. **Documentation**: Include clear documentation in your provider package about:
-   - Installation requirements
-   - Configuration options
-   - Usage examples
-   - Any limitations or known issues
-
-5. **Testing**: Include tests in your provider package to ensure it works correctly with Llama Stack.
-You can refer to the [integration tests
-guide](https://github.com/meta-llama/llama-stack/blob/main/tests/integration/README.md) for more
-information. Execute the test for the Provider type you are developing.
-
-## Troubleshooting
-
-If your external provider isn't being loaded:
-
-1. Check that `module` points to a published pip package with a top level `provider` module including `get_provider_spec`.
-1. Check that the `external_providers_dir` path is correct and accessible.
-2. Verify that the YAML files are properly formatted.
-3. Ensure all required Python packages are installed.
-4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more
-   information using `LLAMA_STACK_LOGGING=all=debug`.
-5. Verify that the provider package is installed in your Python environment if using `external_providers_dir`.
+The provider will now be available in Llama Stack with the type `remote::ramalama`.
--- a/docs/source/providers/external/external-providers-list.md
+++ b/docs/source/providers/external/external-providers-list.md
@ -0,0 +1,10 @@
+# Known External Providers
+
+Here's a list of known external providers that you can use with Llama Stack:
+
+| Name | Description | API | Type | Repository |
+|------|-------------|-----|------|------------|
+| KubeFlow Training | Train models with KubeFlow | Post Training | Remote | [llama-stack-provider-kft](https://github.com/opendatahub-io/llama-stack-provider-kft) |
+| KubeFlow Pipelines | Train models with KubeFlow Pipelines | Post Training | Inline **and** Remote | [llama-stack-provider-kfp-trainer](https://github.com/opendatahub-io/llama-stack-provider-kfp-trainer) |
+| RamaLama | Inference models with RamaLama | Inference | Remote | [ramalama-stack](https://github.com/containers/ramalama-stack) |
+| TrustyAI LM-Eval | Evaluate models with TrustyAI LM-Eval | Eval | Remote | [llama-stack-provider-lmeval](https://github.com/trustyai-explainability/llama-stack-provider-lmeval) |
--- a/docs/source/providers/external/index.md
+++ b/docs/source/providers/external/index.md
@ -0,0 +1,13 @@
+# External Providers
+
+Llama Stack supports external providers that live outside of the main codebase. This allows you to:
+- Create and maintain your own providers independently
+- Share providers with others without contributing to the main codebase
+- Keep provider-specific code separate from the core Llama Stack code
+
+```{toctree}
+:maxdepth: 1
+
+external-providers-list
+external-providers-guide
+```
--- a/docs/source/providers/files/index.md
+++ b/docs/source/providers/files/index.md
@ -1,5 +1,13 @@
-# Files Providers
+# Files
+
+## Overview

 This section contains documentation for all available providers for the **files** API.

- [inline::localfs](inline_localfs.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_localfs
+```
--- a/docs/source/providers/files/inline_localfs.md
+++ b/docs/source/providers/files/inline_localfs.md
@ -8,7 +8,7 @@ Local filesystem-based file storage provider for managing files and documents lo

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `storage_dir` | `<class 'str'>` | No | PydanticUndefined | Directory to store uploaded files |
+| `storage_dir` | `<class 'str'>` | No |  | Directory to store uploaded files |
 | `metadata_store` | `utils.sqlstore.sqlstore.SqliteSqlStoreConfig \| utils.sqlstore.sqlstore.PostgresSqlStoreConfig` | No | sqlite | SQL store configuration for file metadata |
 | `ttl_secs` | `<class 'int'>` | No | 31536000 |  |

--- a/docs/source/providers/index.md
+++ b/docs/source/providers/index.md
@ -1,4 +1,4 @@
-# API Providers Overview
+# API Providers

 The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
 - LLM inference providers (e.g., Meta Reference, Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, OpenAI, Anthropic, Gemini, WatsonX, etc.),
@ -12,81 +12,17 @@ Providers come in two flavors:

 Importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.

-## External Providers
-Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently.
-
-```{toctree}
-:maxdepth: 1
-
-external.md
-```
-
-```{include} openai.md
-:start-after: ## OpenAI API Compatibility
-```
-
-## Inference
-Runs inference with an LLM.
-
 ```{toctree}
 :maxdepth: 1

+external/index
+openai
 inference/index
-```
-
-## Agents
-Run multi-step agentic workflows with LLMs with tool usage, memory (RAG), etc.
-
-```{toctree}
-:maxdepth: 1
-
 agents/index
-```
-
-## DatasetIO
-Interfaces with datasets and data loaders.
-
-```{toctree}
-:maxdepth: 1
-
 datasetio/index
-```
-
-## Safety
-Applies safety policies to the output at a Systems (not only model) level.
-
-```{toctree}
-:maxdepth: 1
-
 safety/index
-```
-
-## Telemetry
-Collects telemetry data from the system.
-
-```{toctree}
-:maxdepth: 1
-
 telemetry/index
-```
-
-## Vector IO
-
-Vector IO refers to operations on vector databases, such as adding documents, searching, and deleting documents.
-Vector IO plays a crucial role in [Retreival Augmented Generation (RAG)](../..//building_applications/rag), where the vector
-io and database are used to store and retrieve documents for retrieval.
-
-```{toctree}
-:maxdepth: 1
-
 vector_io/index
-```
-
-## Tool Runtime
-Is associated with the ToolGroup resources.
-
-```{toctree}
-:maxdepth: 1
-
 tool_runtime/index
-```
+files/index
+```
--- a/docs/source/providers/inference/index.md
+++ b/docs/source/providers/inference/index.md
@ -1,26 +1,34 @@
-# Inference Providers
+# Inference
+
+## Overview

 This section contains documentation for all available providers for the **inference** API.

- [inline::meta-reference](inline_meta-reference.md)
- [inline::sentence-transformers](inline_sentence-transformers.md)
- [remote::anthropic](remote_anthropic.md)
- [remote::bedrock](remote_bedrock.md)
- [remote::cerebras](remote_cerebras.md)
- [remote::databricks](remote_databricks.md)
- [remote::fireworks](remote_fireworks.md)
- [remote::gemini](remote_gemini.md)
- [remote::groq](remote_groq.md)
- [remote::hf::endpoint](remote_hf_endpoint.md)
- [remote::hf::serverless](remote_hf_serverless.md)
- [remote::llama-openai-compat](remote_llama-openai-compat.md)
- [remote::nvidia](remote_nvidia.md)
- [remote::ollama](remote_ollama.md)
- [remote::openai](remote_openai.md)
- [remote::passthrough](remote_passthrough.md)
- [remote::runpod](remote_runpod.md)
- [remote::sambanova](remote_sambanova.md)
- [remote::tgi](remote_tgi.md)
- [remote::together](remote_together.md)
- [remote::vllm](remote_vllm.md)
- [remote::watsonx](remote_watsonx.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_meta-reference
+inline_sentence-transformers
+remote_anthropic
+remote_bedrock
+remote_cerebras
+remote_databricks
+remote_fireworks
+remote_gemini
+remote_groq
+remote_hf_endpoint
+remote_hf_serverless
+remote_llama-openai-compat
+remote_nvidia
+remote_ollama
+remote_openai
+remote_passthrough
+remote_runpod
+remote_sambanova
+remote_tgi
+remote_together
+remote_vllm
+remote_watsonx
+```
--- a/docs/source/providers/inference/remote_cerebras-openai-compat.md
+++ b/docs/source/providers/inference/remote_cerebras-openai-compat.md
@ -1,21 +0,0 @@
-# remote::cerebras-openai-compat
-
-## Description
-
-Cerebras OpenAI-compatible provider for using Cerebras models with OpenAI API format.
-
-## Configuration
-
-| Field | Type | Required | Default | Description |
-|-------|------|----------|---------|-------------|
-| `api_key` | `str \| None` | No |  | The Cerebras API key |
-| `openai_compat_api_base` | `<class 'str'>` | No | https://api.cerebras.ai/v1 | The URL for the Cerebras API server |
-
-## Sample Configuration
-
-```yaml
-openai_compat_api_base: https://api.cerebras.ai/v1
-api_key: ${env.CEREBRAS_API_KEY}
-
-```
-
--- a/docs/source/providers/inference/remote_fireworks-openai-compat.md
+++ b/docs/source/providers/inference/remote_fireworks-openai-compat.md
@ -1,21 +0,0 @@
-# remote::fireworks-openai-compat
-
-## Description
-
-Fireworks AI OpenAI-compatible provider for using Fireworks models with OpenAI API format.
-
-## Configuration
-
-| Field | Type | Required | Default | Description |
-|-------|------|----------|---------|-------------|
-| `api_key` | `str \| None` | No |  | The Fireworks API key |
-| `openai_compat_api_base` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks API server |
-
-## Sample Configuration
-
-```yaml
-openai_compat_api_base: https://api.fireworks.ai/inference/v1
-api_key: ${env.FIREWORKS_API_KEY}
-
-```
-
--- a/docs/source/providers/inference/remote_groq-openai-compat.md
+++ b/docs/source/providers/inference/remote_groq-openai-compat.md
@ -1,21 +0,0 @@
-# remote::groq-openai-compat
-
-## Description
-
-Groq OpenAI-compatible provider for using Groq models with OpenAI API format.
-
-## Configuration
-
-| Field | Type | Required | Default | Description |
-|-------|------|----------|---------|-------------|
-| `api_key` | `str \| None` | No |  | The Groq API key |
-| `openai_compat_api_base` | `<class 'str'>` | No | https://api.groq.com/openai/v1 | The URL for the Groq API server |
-
-## Sample Configuration
-
-```yaml
-openai_compat_api_base: https://api.groq.com/openai/v1
-api_key: ${env.GROQ_API_KEY}
-
-```
-
--- a/docs/source/providers/inference/remote_hf_endpoint.md
+++ b/docs/source/providers/inference/remote_hf_endpoint.md
@ -8,7 +8,7 @@ HuggingFace Inference Endpoints provider for dedicated model serving.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `endpoint_name` | `<class 'str'>` | No | PydanticUndefined | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
+| `endpoint_name` | `<class 'str'>` | No |  | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
 | `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |

 ## Sample Configuration
--- a/docs/source/providers/inference/remote_hf_serverless.md
+++ b/docs/source/providers/inference/remote_hf_serverless.md
@ -8,7 +8,7 @@ HuggingFace Inference API serverless provider for on-demand model inference.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `huggingface_repo` | `<class 'str'>` | No | PydanticUndefined | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
+| `huggingface_repo` | `<class 'str'>` | No |  | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
 | `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |

 ## Sample Configuration
--- a/docs/source/providers/inference/remote_tgi.md
+++ b/docs/source/providers/inference/remote_tgi.md
@ -8,7 +8,7 @@ Text Generation Inference (TGI) provider for HuggingFace model serving.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `url` | `<class 'str'>` | No | PydanticUndefined | The URL for the TGI serving endpoint |
+| `url` | `<class 'str'>` | No |  | The URL for the TGI serving endpoint |

 ## Sample Configuration

--- a/docs/source/providers/inference/remote_together-openai-compat.md
+++ b/docs/source/providers/inference/remote_together-openai-compat.md
@ -1,21 +0,0 @@
-# remote::together-openai-compat
-
-## Description
-
-Together AI OpenAI-compatible provider for using Together models with OpenAI API format.
-
-## Configuration
-
-| Field | Type | Required | Default | Description |
-|-------|------|----------|---------|-------------|
-| `api_key` | `str \| None` | No |  | The Together API key |
-| `openai_compat_api_base` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together API server |
-
-## Sample Configuration
-
-```yaml
-openai_compat_api_base: https://api.together.xyz/v1
-api_key: ${env.TOGETHER_API_KEY}
-
-```
-
--- a/docs/source/providers/post_training/index.md
+++ b/docs/source/providers/post_training/index.md
@ -1,7 +1,15 @@
-# Post_Training Providers
+# Post_Training
+
+## Overview

 This section contains documentation for all available providers for the **post_training** API.

- [inline::huggingface](inline_huggingface.md)
- [inline::torchtune](inline_torchtune.md)
- [remote::nvidia](remote_nvidia.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_huggingface
+inline_torchtune
+remote_nvidia
+```
--- a/docs/source/providers/post_training/inline_huggingface.md
+++ b/docs/source/providers/post_training/inline_huggingface.md
@ -24,6 +24,10 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
 | `weight_decay` | `<class 'float'>` | No | 0.01 |  |
 | `dataloader_num_workers` | `<class 'int'>` | No | 4 |  |
 | `dataloader_pin_memory` | `<class 'bool'>` | No | True |  |
+| `dpo_beta` | `<class 'float'>` | No | 0.1 |  |
+| `use_reference_model` | `<class 'bool'>` | No | True |  |
+| `dpo_loss_type` | `Literal['sigmoid', 'hinge', 'ipo', 'kto_pair'` | No | sigmoid |  |
+| `dpo_output_dir` | `<class 'str'>` | No |  |  |

 ## Sample Configuration

@ -31,6 +35,7 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
 checkpoint_format: huggingface
 distributed_backend: null
 device: cpu
+dpo_output_dir: ~/.llama/dummy/dpo_output

 ```

--- a/docs/source/providers/safety/index.md
+++ b/docs/source/providers/safety/index.md
@ -1,10 +1,18 @@
-# Safety Providers
+# Safety
+
+## Overview

 This section contains documentation for all available providers for the **safety** API.

- [inline::code-scanner](inline_code-scanner.md)
- [inline::llama-guard](inline_llama-guard.md)
- [inline::prompt-guard](inline_prompt-guard.md)
- [remote::bedrock](remote_bedrock.md)
- [remote::nvidia](remote_nvidia.md)
- [remote::sambanova](remote_sambanova.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_code-scanner
+inline_llama-guard
+inline_prompt-guard
+remote_bedrock
+remote_nvidia
+remote_sambanova
+```
--- a/docs/source/providers/scoring/index.md
+++ b/docs/source/providers/scoring/index.md
@ -1,7 +1,15 @@
-# Scoring Providers
+# Scoring
+
+## Overview

 This section contains documentation for all available providers for the **scoring** API.

- [inline::basic](inline_basic.md)
- [inline::braintrust](inline_braintrust.md)
- [inline::llm-as-judge](inline_llm-as-judge.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_basic
+inline_braintrust
+inline_llm-as-judge
+```
--- a/docs/source/providers/telemetry/index.md
+++ b/docs/source/providers/telemetry/index.md
@ -1,5 +1,13 @@
-# Telemetry Providers
+# Telemetry
+
+## Overview

 This section contains documentation for all available providers for the **telemetry** API.

- [inline::meta-reference](inline_meta-reference.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_meta-reference
+```
--- a/docs/source/providers/tool_runtime/index.md
+++ b/docs/source/providers/tool_runtime/index.md
@ -1,10 +1,18 @@
-# Tool_Runtime Providers
+# Tool_Runtime
+
+## Overview

 This section contains documentation for all available providers for the **tool_runtime** API.

- [inline::rag-runtime](inline_rag-runtime.md)
- [remote::bing-search](remote_bing-search.md)
- [remote::brave-search](remote_brave-search.md)
- [remote::model-context-protocol](remote_model-context-protocol.md)
- [remote::tavily-search](remote_tavily-search.md)
- [remote::wolfram-alpha](remote_wolfram-alpha.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_rag-runtime
+remote_bing-search
+remote_brave-search
+remote_model-context-protocol
+remote_tavily-search
+remote_wolfram-alpha
+```
--- a/docs/source/providers/vector_io/index.md
+++ b/docs/source/providers/vector_io/index.md
@ -1,16 +1,24 @@
-# Vector_Io Providers
+# Vector_Io
+
+## Overview

 This section contains documentation for all available providers for the **vector_io** API.

- [inline::chromadb](inline_chromadb.md)
- [inline::faiss](inline_faiss.md)
- [inline::meta-reference](inline_meta-reference.md)
- [inline::milvus](inline_milvus.md)
- [inline::qdrant](inline_qdrant.md)
- [inline::sqlite-vec](inline_sqlite-vec.md)
- [inline::sqlite_vec](inline_sqlite_vec.md)
- [remote::chromadb](remote_chromadb.md)
- [remote::milvus](remote_milvus.md)
- [remote::pgvector](remote_pgvector.md)
- [remote::qdrant](remote_qdrant.md)
- [remote::weaviate](remote_weaviate.md)
+## Providers
+
+```{toctree}
+:maxdepth: 1
+
+inline_chromadb
+inline_faiss
+inline_meta-reference
+inline_milvus
+inline_qdrant
+inline_sqlite-vec
+inline_sqlite_vec
+remote_chromadb
+remote_milvus
+remote_pgvector
+remote_qdrant
+remote_weaviate
+```
--- a/docs/source/providers/vector_io/inline_chromadb.md
+++ b/docs/source/providers/vector_io/inline_chromadb.md
@ -41,7 +41,7 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined |  |
+| `db_path` | `<class 'str'>` | No |  |  |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |

 ## Sample Configuration
--- a/docs/source/providers/vector_io/inline_milvus.md
+++ b/docs/source/providers/vector_io/inline_milvus.md
@ -10,7 +10,7 @@ Please refer to the remote provider documentation.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined |  |
+| `db_path` | `<class 'str'>` | No |  |  |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |
 | `consistency_level` | `<class 'str'>` | No | Strong | The consistency level of the Milvus server |

--- a/docs/source/providers/vector_io/inline_qdrant.md
+++ b/docs/source/providers/vector_io/inline_qdrant.md
@ -50,12 +50,16 @@ See the [Qdrant documentation](https://qdrant.tech/documentation/) for more deta

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `path` | `<class 'str'>` | No | PydanticUndefined |  |
+| `path` | `<class 'str'>` | No |  |  |
+| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite |  |

 ## Sample Configuration

 ```yaml
 path: ${env.QDRANT_PATH:=~/.llama/~/.llama/dummy}/qdrant.db
+kvstore:
+  type: sqlite
+  db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/qdrant_registry.db

 ```

--- a/docs/source/providers/vector_io/inline_sqlite-vec.md
+++ b/docs/source/providers/vector_io/inline_sqlite-vec.md
@ -205,7 +205,7 @@ See [sqlite-vec's GitHub repo](https://github.com/asg017/sqlite-vec/tree/main) f

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined | Path to the SQLite database file |
+| `db_path` | `<class 'str'>` | No |  | Path to the SQLite database file |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |

 ## Sample Configuration
--- a/docs/source/providers/vector_io/inline_sqlite_vec.md
+++ b/docs/source/providers/vector_io/inline_sqlite_vec.md
@ -10,7 +10,7 @@ Please refer to the sqlite-vec provider documentation.

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined | Path to the SQLite database file |
+| `db_path` | `<class 'str'>` | No |  | Path to the SQLite database file |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |

 ## Sample Configuration
--- a/docs/source/providers/vector_io/remote_chromadb.md
+++ b/docs/source/providers/vector_io/remote_chromadb.md
@ -40,7 +40,7 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `url` | `str \| None` | No | PydanticUndefined |  |
+| `url` | `str \| None` | No |  |  |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |

 ## Sample Configuration
--- a/docs/source/providers/vector_io/remote_milvus.md
+++ b/docs/source/providers/vector_io/remote_milvus.md
@ -111,8 +111,8 @@ For more details on TLS configuration, refer to the [TLS setup guide](https://mi

 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `uri` | `<class 'str'>` | No | PydanticUndefined | The URI of the Milvus server |
-| `token` | `str \| None` | No | PydanticUndefined | The token of the Milvus server |
+| `uri` | `<class 'str'>` | No |  | The URI of the Milvus server |
+| `token` | `str \| None` | No |  | The token of the Milvus server |
 | `consistency_level` | `<class 'str'>` | No | Strong | The consistency level of the Milvus server |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
 | `config` | `dict` | No | {} | This configuration allows additional fields to be passed through to the underlying Milvus client. See the [Milvus](https://milvus.io/docs/install-overview.md) documentation for more details about Milvus in general. |
--- a/docs/source/providers/vector_io/remote_qdrant.md
+++ b/docs/source/providers/vector_io/remote_qdrant.md
@ -20,11 +20,15 @@ Please refer to the inline provider documentation.
 | `prefix` | `str \| None` | No |  |  |
 | `timeout` | `int \| None` | No |  |  |
 | `host` | `str \| None` | No |  |  |
+| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite |  |

 ## Sample Configuration

 ```yaml
-api_key: ${env.QDRANT_API_KEY}
+api_key: ${env.QDRANT_API_KEY:=}
+kvstore:
+  type: sqlite
+  db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/qdrant_registry.db

 ```

--- a/docs/source/providers/vector_io/remote_weaviate.md
+++ b/docs/source/providers/vector_io/remote_weaviate.md
@ -33,9 +33,19 @@ To install Weaviate see the [Weaviate quickstart documentation](https://weaviate
 See [Weaviate's documentation](https://weaviate.io/developers/weaviate) for more details about Weaviate in general.


+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `weaviate_api_key` | `str \| None` | No |  | The API key for the Weaviate instance |
+| `weaviate_cluster_url` | `str \| None` | No | localhost:8080 | The URL of the Weaviate cluster |
+| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig, annotation=NoneType, required=False, default='sqlite', discriminator='type'` | No |  | Config for KV store backend (SQLite only for now) |
+
 ## Sample Configuration

 ```yaml
+weaviate_api_key: null
+weaviate_cluster_url: ${env.WEAVIATE_CLUSTER_URL:=localhost:8080}
 kvstore:
  type: sqlite
  db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/weaviate_registry.db
--- a/docs/source/references/evals_reference/index.md
+++ b/docs/source/references/evals_reference/index.md
@ -366,7 +366,7 @@ The purpose of scoring function is to calculate the score for each example based
 Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe.

 ### Add new benchmark into template
-Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/open-benchmark/run.yaml)
+Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/open-benchmark/run.yaml)

 Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have
 - `benchmark_id`: identifier of the benchmark
@ -378,7 +378,7 @@ Secondly, you need to add the new benchmark you just created under the `benchmar

 Spin up llama stack server with 'open-benchmark' templates
 ```
-llama stack run llama_stack/templates/open-benchmark/run.yaml
+llama stack run llama_stack/distributions/open-benchmark/run.yaml

 ```

--- a/docs/source/references/llama_cli_reference/download_models.md
+++ b/docs/source/references/llama_cli_reference/download_models.md
@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
    cd ~/local
    git clone git@github.com:meta-llama/llama-stack.git

-    conda create -n myenv python=3.10
-    conda activate myenv
+    uv venv myenv --python 3.12
+    source myenv/bin/activate  # On Windows: myenv\Scripts\activate

    cd llama-stack
-    $CONDA_PREFIX/bin/pip install -e .
+    pip install -e .

 ## Downloading models via CLI

--- a/docs/source/references/llama_cli_reference/index.md
+++ b/docs/source/references/llama_cli_reference/index.md
@ -19,11 +19,11 @@ You have two ways to install Llama Stack:
    cd ~/local
    git clone git@github.com:meta-llama/llama-stack.git

-    conda create -n myenv python=3.10
-    conda activate myenv
+    uv venv myenv --python 3.12
+    source myenv/bin/activate  # On Windows: myenv\Scripts\activate

    cd llama-stack
-    $CONDA_PREFIX/bin/pip install -e .
+    pip install -e .


 ## `llama` subcommands