docs: fix broken links (#3540)

# What does this PR do?    - Fixes broken links and Docusaurus search Closes #3518 ## Test Plan The following should produce a clean build with no warnings and search enabled: ``` npm install npm run gen-api-docs all npm run build npm run serve ```
2025-12-24 05:38:03 +00:00 · 2025-09-24 14:16:31 -07:00 · 2025-09-24 14:16:31 -07:00 · 6101c8e015
commit 6101c8e015
parent 8537ada11b
52 changed files with 188 additions and 981 deletions
--- a/docs/docs/advanced_apis/post_training.mdx
+++ b/docs/docs/advanced_apis/post_training.mdx
@ -302,4 +302,4 @@ customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:=http://nemo.test}

 - Check out the [Building Applications - Fine-tuning](../building_applications/index.mdx) guide for application-level examples
 - See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation
- Review the [API Reference](../api_reference/post_training.mdx) for complete API documentation
+- Review the [API Reference](../advanced_apis/post_training.mdx) for complete API documentation
--- a/docs/docs/advanced_apis/scoring.mdx
+++ b/docs/docs/advanced_apis/scoring.mdx
@ -189,5 +189,5 @@ The Scoring API works closely with the [Evaluation](./evaluation.mdx) API to pro

 - Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations
 - See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive scoring function usage
- Explore the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) for detailed conceptual information
+- Review the [Evaluation Reference](../references/evals_reference/) for comprehensive scoring function usage
+- Explore the [Evaluation Concepts](../concepts/evaluation_concepts) for detailed conceptual information
--- a/docs/docs/building_applications/evals.mdx
+++ b/docs/docs/building_applications/evals.mdx
@ -8,7 +8,7 @@ sidebar_position: 7
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';

-This guide walks you through the process of evaluating an LLM application built using Llama Stack. For detailed API reference, check out the [Evaluation Reference](/docs/references/evals-reference) guide that covers the complete set of APIs and developer experience flow.
+This guide walks you through the process of evaluating an LLM application built using Llama Stack. For detailed API reference, check out the [Evaluation Reference](../references/evals_reference/) guide that covers the complete set of APIs and developer experience flow.

 :::tip[Interactive Examples]
 Check out our [Colab notebook](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing) for working examples with evaluations, or try the [Getting Started notebook](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).
@ -251,6 +251,6 @@ results = client.scoring.score(

 - **[Agents](./agent)** - Building agents for evaluation
 - **[Tools Integration](./tools)** - Using tools in evaluated agents
- **[Evaluation Reference](/docs/references/evals-reference)** - Complete API reference for evaluations
+- **[Evaluation Reference](../references/evals_reference/)** - Complete API reference for evaluations
 - **[Getting Started Notebook](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Interactive examples
 - **[Evaluation Examples](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing)** - Additional evaluation scenarios
--- a/docs/docs/building_applications/index.mdx
+++ b/docs/docs/building_applications/index.mdx
@ -20,23 +20,23 @@ The best way to get started is to look at this comprehensive notebook which walk
 Here are the key topics that will help you build effective AI applications:

 ### 🤖 **Agent Development**
- **[Agent Framework](./agent)** - Understand the components and design patterns of the Llama Stack agent framework
- **[Agent Execution Loop](./agent_execution_loop)** - How agents process information, make decisions, and execute actions
- **[Agents vs Responses API](./responses_vs_agents)** - Learn when to use each API for different use cases
+- **[Agent Framework](./agent.mdx)** - Understand the components and design patterns of the Llama Stack agent framework
+- **[Agent Execution Loop](./agent_execution_loop.mdx)** - How agents process information, make decisions, and execute actions
+- **[Agents vs Responses API](./responses_vs_agents.mdx)** - Learn when to use each API for different use cases

 ### 📚 **Knowledge Integration**
- **[RAG (Retrieval-Augmented Generation)](./rag)** - Enhance your agents with external knowledge through retrieval mechanisms
+- **[RAG (Retrieval-Augmented Generation)](./rag.mdx)** - Enhance your agents with external knowledge through retrieval mechanisms

 ### 🛠️ **Capabilities & Extensions**
- **[Tools](./tools)** - Extend your agents' capabilities by integrating with external tools and APIs
+- **[Tools](./tools.mdx)** - Extend your agents' capabilities by integrating with external tools and APIs

 ### 📊 **Quality & Monitoring**
- **[Evaluations](./evals)** - Evaluate your agents' effectiveness and identify areas for improvement
- **[Telemetry](./telemetry)** - Monitor and analyze your agents' performance and behavior
- **[Safety](./safety)** - Implement guardrails and safety measures to ensure responsible AI behavior
+- **[Evaluations](./evals.mdx)** - Evaluate your agents' effectiveness and identify areas for improvement
+- **[Telemetry](./telemetry.mdx)** - Monitor and analyze your agents' performance and behavior
+- **[Safety](./safety.mdx)** - Implement guardrails and safety measures to ensure responsible AI behavior

 ### 🎮 **Interactive Development**
- **[Playground](./playground)** - Interactive environment for testing and developing applications
+- **[Playground](./playground.mdx)** - Interactive environment for testing and developing applications

 ## Application Patterns

@ -77,7 +77,7 @@ Build production-ready systems with:

 ## Related Resources

- **[Getting Started](/docs/getting-started/)** - Basic setup and concepts
+- **[Getting Started](/docs/getting_started/quickstart)** - Basic setup and concepts
 - **[Providers](/docs/providers/)** - Available AI service providers
 - **[Distributions](/docs/distributions/)** - Pre-configured deployment packages
- **[API Reference](/docs/api/)** - Complete API documentation
+- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation
--- a/docs/docs/building_applications/playground.mdx
+++ b/docs/docs/building_applications/playground.mdx
@ -291,9 +291,9 @@ llama stack run meta-reference

 ## Related Resources

- **[Getting Started Guide](/docs/getting-started)** - Complete setup and introduction
+- **[Getting Started Guide](../getting_started/quickstart)** - Complete setup and introduction
 - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack fundamentals
 - **[Agents](./agent)** - Building intelligent agents
 - **[RAG (Retrieval Augmented Generation)](./rag)** - Knowledge-enhanced applications
 - **[Evaluations](./evals)** - Comprehensive evaluation framework
- **[API Reference](/docs/api-reference)** - Complete API documentation
+- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation
--- a/docs/docs/building_applications/responses_vs_agents.mdx
+++ b/docs/docs/building_applications/responses_vs_agents.mdx
@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
 Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics.

 :::note
-**Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](/docs/providers/openai-compatibility#chat-completions) directly, before progressing to Agents or Responses API.
+**Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](../providers/openai#chat-completions) directly, before progressing to Agents or Responses API.
 :::

 ## Overview
@ -217,5 +217,5 @@ Use this framework to choose the right API for your use case:
 - **[Agents](./agent)** - Understanding the Agents API fundamentals
 - **[Agent Execution Loop](./agent_execution_loop)** - How agents process turns and steps
 - **[Tools Integration](./tools)** - Adding capabilities to both APIs
- **[OpenAI Compatibility](/docs/providers/openai-compatibility)** - Using OpenAI-compatible endpoints
+- **[OpenAI Compatibility](../providers/openai)** - Using OpenAI-compatible endpoints
 - **[Safety Guardrails](./safety)** - Implementing safety measures in agents
--- a/docs/docs/concepts/apis/external.mdx
+++ b/docs/docs/concepts/apis/external.mdx
@ -2,7 +2,7 @@
 title: External APIs
 description: Understanding external APIs in Llama Stack
 sidebar_label: External APIs
-sidebar_position: 4
+sidebar_position: 3
 ---
 # External APIs

--- a/docs/docs/concepts/distributions.mdx
+++ b/docs/docs/concepts/distributions.mdx
@ -2,7 +2,7 @@
 title: Distributions
 description: Pre-packaged provider configurations for different deployment scenarios
 sidebar_label: Distributions
-sidebar_position: 5
+sidebar_position: 3
 ---

 # Distributions
--- a/docs/docs/concepts/evaluation_concepts.mdx
+++ b/docs/docs/concepts/evaluation_concepts.mdx
@ -0,0 +1,78 @@
+---
+title: Evaluation Concepts
+description: Running evaluations on Llama Stack
+sidebar_label: Evaluation Concepts
+sidebar_position: 5
+---
+
+# Evaluation Concepts
+
+The Llama Stack Evaluation flow allows you to run evaluations on your GenAI application datasets or pre-registered benchmarks.
+
+We introduce a set of APIs in Llama Stack for supporting running evaluations of LLM applications:
+- `/datasetio` + `/datasets` API
+- `/scoring` + `/scoring_functions` API
+- `/eval` + `/benchmarks` API
+
+This guide goes over the sets of APIs and developer experience flow of using Llama Stack to run evaluations for different use cases. Checkout our Colab notebook on working examples with evaluations [here](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing).
+
+The Evaluation APIs are associated with a set of Resources. Please visit the Resources section in our [Core Concepts](./index.mdx) guide for better high-level understanding.
+
+- **DatasetIO**: defines interface with datasets and data loaders.
+  - Associated with `Dataset` resource.
+- **Scoring**: evaluate outputs of the system.
+  - Associated with `ScoringFunction` resource. We provide a suite of out-of-the box scoring functions and also the ability for you to add custom evaluators. These scoring functions are the core part of defining an evaluation task to output evaluation metrics.
+- **Eval**: generate outputs (via Inference or Agents) and perform scoring.
+  - Associated with `Benchmark` resource.
+
+## Open-benchmark Eval
+
+### List of open-benchmarks Llama Stack support
+
+Llama stack pre-registers several popular open-benchmarks to easily evaluate model perfomance via CLI.
+
+The list of open-benchmarks we currently support:
+- [MMLU-COT](https://arxiv.org/abs/2009.03300) (Measuring Massive Multitask Language Understanding): Benchmark designed to comprehensively evaluate the breadth and depth of a model's academic and professional understanding
+- [GPQA-COT](https://arxiv.org/abs/2311.12022) (A Graduate-Level Google-Proof Q&A Benchmark): A challenging benchmark of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
+- [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
+- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
+
+You can follow this [contributing guide](../references/evals_reference/#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
+
+### Run evaluation on open-benchmarks via CLI
+
+We have built-in functionality to run the supported open-benckmarks using llama-stack-client CLI
+
+#### Spin up Llama Stack server
+
+Spin up llama stack server with 'open-benchmark' template
+```bash
+llama stack run llama_stack/distributions/open-benchmark/run.yaml
+```
+
+#### Run eval CLI
+There are 3 necessary inputs to run a benchmark eval
+- `list of benchmark_ids`: The list of benchmark ids to run evaluation on
+- `model-id`: The model id to evaluate on
+- `output_dir`: Path to store the evaluate results
+
+```bash
+llama-stack-client eval run-benchmark <benchmark_id_1> <benchmark_id_2> ... \
+--model_id <model id to evaluate on> \
+--output_dir <directory to store the evaluate results>
+```
+
+You can run
+```bash
+llama-stack-client eval run-benchmark help
+```
+to see the description of all the flags that eval run-benchmark has
+
+In the output log, you can find the file path that has your evaluation results. Open that file and you can see you aggregate
+evaluation results over there.
+
+## What's Next?
+
+- Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
+- Check out our [Building Applications - Evaluation](../building_applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
+- Check out our [Evaluation Reference](../references/evals_reference/) for more details on the APIs.
--- a/docs/docs/concepts/index.mdx
+++ b/docs/docs/concepts/index.mdx
@ -1,4 +1,9 @@
-# Core Concepts
+---
+title: Core Concepts
+description: Understanding Llama Stack's service-oriented philosophy and key concepts
+sidebar_label: Overview
+sidebar_position: 1
+---

 Given Llama Stack's service-oriented philosophy, a few concepts and workflows arise which may not feel completely natural in the LLM landscape, especially if you are coming with a background in other frameworks.

@ -6,38 +11,21 @@ Given Llama Stack's service-oriented philosophy, a few concepts and workflows ar

 This section covers the fundamental concepts of Llama Stack:

- **[Architecture](./architecture.md)** - Learn about Llama Stack's architectural design and principles
- **[APIs](./apis/index.mdx)** - Understanding the core APIs and their stability levels
-  - [API Overview](./apis/index.mdx) - Core APIs available in Llama Stack
-  - [API Providers](./apis/api_providers.mdx) - How providers implement APIs
-  - [API Stability Leveling](./apis/api_leveling.mdx) - API stability and versioning
- **[Distributions](./distributions.md)** - Pre-configured deployment packages
- **[Resources](./resources.md)** - Understanding Llama Stack resources and their lifecycle
- **[External Integration](./external.md)** - Integrating with external services and providers
+- **[Architecture](architecture.mdx)** - Learn about Llama Stack's architectural design and principles
+- **[APIs](/docs/concepts/apis/)** - Understanding the core APIs and their stability levels
+  - [API Overview](apis/index.mdx) - Core APIs available in Llama Stack
+  - [API Providers](apis/api_providers.mdx) - How providers implement APIs
+  - [External APIs](apis/external.mdx) - External APIs available in Llama Stack
+  - [API Stability Leveling](apis/api_leveling.mdx) - API stability and versioning
+- **[Distributions](distributions.mdx)** - Pre-configured deployment packages
+- **[Resources](resources.mdx)** - Understanding Llama Stack resources and their lifecycle

 ## Getting Started

 If you're new to Llama Stack, we recommend starting with:

-1. **[Architecture](./architecture.md)** - Understand the overall system design
-2. **[APIs](./apis/index.mdx)** - Learn about the available APIs and their purpose
-3. **[Distributions](./distributions.md)** - Choose a pre-configured setup for your use case
+1. **[Architecture](architecture.mdx)** - Understand the overall system design
+2. **[APIs](apis/index.mdx)** - Learn about the available APIs and their purpose
+3. **[Distributions](distributions.mdx)** - Choose a pre-configured setup for your use case

-Each concept builds upon the previous ones to give you a comprehensive understanding of how Llama Stack works and how to use it effectively.---
-title: Core Concepts
-description: Understanding Llama Stack's service-oriented philosophy and key concepts
-sidebar_label: Overview
-sidebar_position: 1
---
-
-# Core Concepts
-
-Given Llama Stack's service-oriented philosophy, a few concepts and workflows arise which may not feel completely natural in the LLM landscape, especially if you are coming with a background in other frameworks.
-
-This section covers the key concepts you need to understand to work effectively with Llama Stack:
-
- **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits
- **[APIs](./apis)** - Available REST APIs and planned capabilities
- **[API Providers](./api_providers)** - Remote vs inline provider implementations
- **[Distributions](./distributions)** - Pre-packaged provider configurations
- **[Resources](./resources)** - Resource federation and registration
+Each concept builds upon the previous ones to give you a comprehensive understanding of how Llama Stack works and how to use it effectively.
--- a/docs/docs/concepts/resources.mdx
+++ b/docs/docs/concepts/resources.mdx
@ -2,7 +2,7 @@
 title: Resources
 description: Resource federation and registration in Llama Stack
 sidebar_label: Resources
-sidebar_position: 6
+sidebar_position: 4
 ---

 # Resources
--- a/docs/docs/contributing/index.mdx
+++ b/docs/docs/contributing/index.mdx
@ -148,7 +148,7 @@ As a general guideline:
  that describes the configuration. These descriptions will be used to generate the provider
  documentation.
 * When possible, use keyword arguments only when calling functions.
-* Llama Stack utilizes [custom Exception classes](llama_stack/apis/common/errors.py) for certain Resources that should be used where applicable.
+* Llama Stack utilizes custom Exception classes for certain Resources that should be used where applicable.

 ### License
 By contributing to Llama, you agree that your contributions will be licensed
@ -212,35 +212,22 @@ The generated API schema will be available in `docs/static/`. Make sure to revie
 ## Adding a New Provider

 See:
- [Adding a New API Provider Page](new_api_provider.md) which describes how to add new API providers to the Stack.
- [Vector Database Page](new_vector_database.md) which describes how to add a new vector databases with Llama Stack.
- [External Provider Page](../providers/external/index.md) which describes how to add external providers to the Stack.
+- [Adding a New API Provider Page](./new_api_provider.mdx) which describes how to add new API providers to the Stack.
+- [Vector Database Page](./new_vector_database.mdx) which describes how to add a new vector databases with Llama Stack.
+- [External Provider Page](/docs/providers/external/) which describes how to add external providers to the Stack.

-```{toctree}
-:maxdepth: 1
-:hidden:
-
-new_api_provider
-new_vector_database
-```

 ## Testing


-```{include} ../../../tests/README.md
-```
+See the [Testing README](https://github.com/meta-llama/llama-stack/blob/main/tests/README.md) for detailed testing information.

 ## Advanced Topics

 For developers who need deeper understanding of the testing system internals:

-```{toctree}
-:maxdepth: 1
-
-testing/record-replay
-```
+- [Record-Replay Testing](./testing/record-replay.mdx)

 ### Benchmarking

-```{include} ../../../benchmarking/k8s-benchmark/README.md
-```
+See the [Benchmarking README](https://github.com/meta-llama/llama-stack/blob/main/benchmarking/k8s-benchmark/README.md) for benchmarking information.
--- a/docs/docs/contributing/new_api_provider.mdx
+++ b/docs/docs/contributing/new_api_provider.mdx
@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';
 This guide will walk you through the process of adding a new API provider to Llama Stack.


- Begin by reviewing the [core concepts](../concepts/index.md) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
+- Begin by reviewing the [core concepts](../concepts/) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
 - Determine the provider type ([Remote](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/remote) or [Inline](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline)). Remote providers make requests to external services, while inline providers execute implementation locally.
 - Add your provider to the appropriate [Registry](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/registry/). Specify pip dependencies necessary.
 - Update any distribution [Templates](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions/) `build.yaml` and `run.yaml` files if they should include your provider by default. Run [./scripts/distro_codegen.py](https://github.com/meta-llama/llama-stack/blob/main/scripts/distro_codegen.py) if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
--- a/docs/docs/deploying/kubernetes_deployment.mdx
+++ b/docs/docs/deploying/kubernetes_deployment.mdx
@ -219,6 +219,6 @@ kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http:

 ## Related Resources

- **[Deployment Overview](./index)** - Overview of deployment options
+- **[Deployment Overview](/docs/deploying/)** - Overview of deployment options
 - **[Distributions](/docs/distributions)** - Understanding Llama Stack distributions
 - **[Configuration](/docs/distributions/configuration)** - Detailed configuration options
--- a/docs/docs/distributions/building_distro.mdx
+++ b/docs/docs/distributions/building_distro.mdx
@ -251,7 +251,7 @@ directory or a git repository (git must be installed on the build environment).
 llama stack build --config my-external-stack.yaml
 ```

-For more information on external providers, including directory structure, provider types, and implementation requirements, see the [External Providers documentation](../providers/external.md).
+For more information on external providers, including directory structure, provider types, and implementation requirements, see the [External Providers documentation](../providers/external/).
 </TabItem>
 <TabItem value="container" label="Building Container">

--- a/docs/docs/distributions/configuration.mdx
+++ b/docs/docs/distributions/configuration.mdx
@ -206,7 +206,7 @@ models:
  provider_model_id: null
  model_type: llm
 ```
-A Model is an instance of a "Resource" (see [Concepts](../concepts/index)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models.
+A Model is an instance of a "Resource" (see [Concepts](../concepts/)) and is associated with a specific inference provider (in this case, the provider with identifier `ollama`). This is an instance of a "pre-registered" model. While we always encourage the clients to register models before using them, some Stack servers may come up a list of "already known and available" models.

 What's with the `provider_model_id` field? This is an identifier for the model inside the provider's model catalog. Contrast it with `model_id` which is the identifier for the same model for Llama Stack's purposes. For example, you may want to name "llama3.2:vision-11b" as "image_captioning_model" when you use it in your Stack interactions. When omitted, the server will set `provider_model_id` to be the same as `model_id`.

--- a/docs/docs/distributions/importing_as_library.mdx
+++ b/docs/docs/distributions/importing_as_library.mdx
@ -33,7 +33,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
 response = client.models.list()
 ```

-If you've created a [custom distribution](building_distro.md), you can also use the run.yaml configuration file directly:
+If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:

 ```python
 client = LlamaStackAsLibraryClient(config_path)
--- a/docs/docs/distributions/index.mdx
+++ b/docs/docs/distributions/index.mdx
@ -13,9 +13,9 @@ This section provides an overview of the distributions available in Llama Stack.

 ## Distribution Guides

- **[Available Distributions](./list_of_distributions)** - Complete list and comparison of all distributions
- **[Building Custom Distributions](./building_distro)** - Create your own distribution from scratch
- **[Customizing Configuration](./customizing_run_yaml)** - Customize run.yaml for your needs
- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run distributions
- **[Importing as Library](./importing_as_library)** - Use distributions in your code
- **[Configuration Reference](./configuration)** - Configuration file format details
+- **[Available Distributions](./list_of_distributions.mdx)** - Complete list and comparison of all distributions
+- **[Building Custom Distributions](./building_distro.mdx)** - Create your own distribution from scratch
+- **[Customizing Configuration](./customizing_run_yaml.mdx)** - Customize run.yaml for your needs
+- **[Starting Llama Stack Server](./starting_llama_stack_server.mdx)** - How to run distributions
+- **[Importing as Library](./importing_as_library.mdx)** - Use distributions in your code
+- **[Configuration Reference](./configuration.mdx)** - Configuration file format details
--- a/docs/docs/distributions/list_of_distributions.mdx
+++ b/docs/docs/distributions/list_of_distributions.mdx
@ -62,7 +62,7 @@ docker pull llama-stack/distribution-meta-reference-gpu

 **Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)

-**Guides:** [Remote-Hosted Endpoints](remote_hosted_distro/index)
+**Guides:** [Remote-Hosted Endpoints](./remote_hosted_distro/)

 ### 📱 Mobile Development

@ -81,7 +81,7 @@ docker pull llama-stack/distribution-meta-reference-gpu
 - You need custom configurations
 - You want to optimize for your specific use case

-**Guides:** [Building Custom Distributions](building_distro.md)
+**Guides:** [Building Custom Distributions](./building_distro)

 ## Detailed Documentation

@ -131,4 +131,4 @@ graph TD
 3. **Configure your providers** with API keys or local models
 4. **Start building** with Llama Stack!

-For help choosing or troubleshooting, check our [Getting Started Guide](../getting_started/index.md) or [Community Support](https://github.com/llama-stack/llama-stack/discussions).
+For help choosing or troubleshooting, check our [Getting Started Guide](/docs/getting_started/quickstart) or [Community Support](https://github.com/llama-stack/llama-stack/discussions).
--- a/docs/docs/distributions/ondevice_distro/android_sdk.md
+++ b/docs/docs/distributions/ondevice_distro/android_sdk.md
@ -66,7 +66,7 @@ llama stack run starter --port 5050

 Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.

-Other inference providers: [Table](../../index.md#supported-llama-stack-implementations)
+Other inference providers: [Table](/docs/)

 How to set remote localhost in Demo App: [Settings](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app#settings)

--- a/docs/docs/distributions/self_hosted_distro/starter.md
+++ b/docs/docs/distributions/self_hosted_distro/starter.md
@ -36,25 +36,25 @@ The starter distribution includes a comprehensive set of inference providers:

 ### Hosted Providers
 - **[OpenAI](https://openai.com/api/)**: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings -
-  provider ID: `openai` - reference documentation: [openai](../../providers/inference/remote_openai.md)
+  provider ID: `openai` - reference documentation: [openai](../../providers/inference/remote_openai)
 - **[Fireworks](https://fireworks.ai/)**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and
-  embeddings - provider ID: `fireworks` - reference documentation: [fireworks](../../providers/inference/remote_fireworks.md)
+  embeddings - provider ID: `fireworks` - reference documentation: [fireworks](../../providers/inference/remote_fireworks)
 - **[Together](https://together.ai/)**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and
-  embeddings - provider ID: `together` - reference documentation: [together](../../providers/inference/remote_together.md)
- **[Anthropic](https://www.anthropic.com/)**: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings - provider ID: `anthropic` - reference documentation: [anthropic](../../providers/inference/remote_anthropic.md)
- **[Gemini](https://gemini.google.com/)**: Gemini 1.5, 2.0, 2.5 models and text embeddings - provider ID: `gemini` - reference documentation: [gemini](../../providers/inference/remote_gemini.md)
- **[Groq](https://groq.com/)**: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick) - provider ID: `groq` - reference documentation: [groq](../../providers/inference/remote_groq.md)
- **[SambaNova](https://www.sambanova.ai/)**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models - provider ID: `sambanova` - reference documentation: [sambanova](../../providers/inference/remote_sambanova.md)
- **[Cerebras](https://www.cerebras.ai/)**: Cerebras AI models - provider ID: `cerebras` - reference documentation: [cerebras](../../providers/inference/remote_cerebras.md)
- **[NVIDIA](https://www.nvidia.com/)**: NVIDIA NIM - provider ID: `nvidia` - reference documentation: [nvidia](../../providers/inference/remote_nvidia.md)
- **[HuggingFace](https://huggingface.co/)**: Serverless and endpoint models - provider ID: `hf::serverless` and `hf::endpoint` - reference documentation: [huggingface-serverless](../../providers/inference/remote_hf_serverless.md) and [huggingface-endpoint](../../providers/inference/remote_hf_endpoint.md)
- **[Bedrock](https://aws.amazon.com/bedrock/)**: AWS Bedrock models - provider ID: `bedrock` - reference documentation: [bedrock](../../providers/inference/remote_bedrock.md)
+  embeddings - provider ID: `together` - reference documentation: [together](../../providers/inference/remote_together)
+- **[Anthropic](https://www.anthropic.com/)**: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings - provider ID: `anthropic` - reference documentation: [anthropic](../../providers/inference/remote_anthropic)
+- **[Gemini](https://gemini.google.com/)**: Gemini 1.5, 2.0, 2.5 models and text embeddings - provider ID: `gemini` - reference documentation: [gemini](../../providers/inference/remote_gemini)
+- **[Groq](https://groq.com/)**: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick) - provider ID: `groq` - reference documentation: [groq](../../providers/inference/remote_groq)
+- **[SambaNova](https://www.sambanova.ai/)**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models - provider ID: `sambanova` - reference documentation: [sambanova](../../providers/inference/remote_sambanova)
+- **[Cerebras](https://www.cerebras.ai/)**: Cerebras AI models - provider ID: `cerebras` - reference documentation: [cerebras](../../providers/inference/remote_cerebras)
+- **[NVIDIA](https://www.nvidia.com/)**: NVIDIA NIM - provider ID: `nvidia` - reference documentation: [nvidia](../../providers/inference/remote_nvidia)
+- **[HuggingFace](https://huggingface.co/)**: Serverless and endpoint models - provider ID: `hf::serverless` and `hf::endpoint` - reference documentation: [huggingface-serverless](../../providers/inference/remote_hf_serverless) and [huggingface-endpoint](../../providers/inference/remote_hf_endpoint)
+- **[Bedrock](https://aws.amazon.com/bedrock/)**: AWS Bedrock models - provider ID: `bedrock` - reference documentation: [bedrock](../../providers/inference/remote_bedrock)

 ### Local/Remote Providers
- **[Ollama](https://ollama.ai/)**: Local Ollama models - provider ID: `ollama` - reference documentation: [ollama](../../providers/inference/remote_ollama.md)
- **[vLLM](https://docs.vllm.ai/en/latest/)**: Local or remote vLLM server - provider ID: `vllm` - reference documentation: [vllm](../../providers/inference/remote_vllm.md)
- **[TGI](https://github.com/huggingface/text-generation-inference)**: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use `DEH_URL`) - provider ID: `tgi` - reference documentation: [tgi](../../providers/inference/remote_tgi.md)
- **[Sentence Transformers](https://www.sbert.net/)**: Local embedding models - provider ID: `sentence-transformers` - reference documentation: [sentence-transformers](../../providers/inference/inline_sentence-transformers.md)
+- **[Ollama](https://ollama.ai/)**: Local Ollama models - provider ID: `ollama` - reference documentation: [ollama](../../providers/inference/remote_ollama)
+- **[vLLM](https://docs.vllm.ai/en/latest/)**: Local or remote vLLM server - provider ID: `vllm` - reference documentation: [vllm](../../providers/inference/remote_vllm)
+- **[TGI](https://github.com/huggingface/text-generation-inference)**: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use `DEH_URL`) - provider ID: `tgi` - reference documentation: [tgi](../../providers/inference/remote_tgi)
+- **[Sentence Transformers](https://www.sbert.net/)**: Local embedding models - provider ID: `sentence-transformers` - reference documentation: [sentence-transformers](../../providers/inference/inline_sentence-transformers)

 All providers are disabled by default. So you need to enable them by setting the environment variables.

--- a/docs/docs/distributions/starting_llama_stack_server.mdx
+++ b/docs/docs/distributions/starting_llama_stack_server.mdx
@ -16,11 +16,11 @@ This is the simplest way to get started. Using Llama Stack as a library means yo

 ## Container:

-Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
+Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](./list_of_distributions) for more details.

 ## Kubernetes:

-If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.
+If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](../deploying/kubernetes_deployment) for more details.


 ```{toctree}
--- a/docs/docs/getting_started/detailed_tutorial.mdx
+++ b/docs/docs/getting_started/detailed_tutorial.mdx
@ -18,7 +18,7 @@ In Llama Stack, we provide a server exposing multiple APIs. These APIs are backe
 Llama Stack is a stateful service with REST APIs to support seamless transition of AI applications across different environments. The server can be run in a variety of ways, including as a standalone binary, Docker container, or hosted service. You can build and test using a local server first and deploy to a hosted endpoint for production.

 In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/)
-as the inference [provider](../providers/index.md#inference) for a Llama Model.
+as the inference [provider](/docs/providers/inference/) for a Llama Model.

 ### Step 1: Installation and Setup

@ -60,8 +60,8 @@ Llama Stack is a server that exposes multiple APIs, you connect with it using th
 <TabItem value="venv" label="Using venv">
 You can use Python to build and run the Llama Stack server, which is useful for testing and development.

-Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
-which defines the providers and their settings. The generated configuration serves as a starting point that you can [customize for your specific needs](../distributions/customizing_run_yaml.md).
+Llama Stack uses a [YAML configuration file](../distributions/configuration) to specify the stack setup,
+which defines the providers and their settings. The generated configuration serves as a starting point that you can [customize for your specific needs](../distributions/customizing_run_yaml).
 Now let's build and run the Llama Stack config for Ollama.
 We use `starter` as template. By default all providers are disabled, this requires enable ollama by passing environment variables.

@ -73,7 +73,7 @@ llama stack build --distro starter --image-type venv --run
 You can use a container image to run the Llama Stack server. We provide several container images for the server
 component that works with different inference providers out of the box. For this guide, we will use
 `llamastack/distribution-starter` as the container image. If you'd like to build your own image or customize the
-configurations, please check out [this guide](../distributions/building_distro.md).
+configurations, please check out [this guide](../distributions/building_distro).
 First lets setup some environment variables and create a local directory to mount into the container’s file system.
 ```bash
 export LLAMA_STACK_PORT=8321
@ -145,7 +145,7 @@ pip install llama-stack-client
 </TabItem>
 </Tabs>

-Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the
+Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference) to check the
 connectivity to the server.

 ```bash
@ -216,8 +216,8 @@ OpenAIChatCompletion(

 ### Step 4: Run the Demos

-Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
-Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
+Note that these demos show the [Python Client SDK](../references/python_sdk_reference/).
+Other SDKs are also available, please refer to the [Client SDK](/docs/) list for the complete options.

 <Tabs>
 <TabItem value="inference" label="Basic Inference">
@ -538,4 +538,4 @@ uv run python rag_agent.py

 **You're Ready to Build Your Own Apps!**

-Congrats! 🥳 Now you're ready to [build your own Llama Stack applications](../building_applications/index)! 🚀
+Congrats! 🥳 Now you're ready to [build your own Llama Stack applications](../building_applications/)! 🚀
--- a/docs/docs/getting_started/quickstart.mdx
+++ b/docs/docs/getting_started/quickstart.mdx
@ -140,7 +140,7 @@ If you are getting a **401 Client Error** from HuggingFace for the **all-MiniLM-
 ### Next Steps

 Now you're ready to dive deeper into Llama Stack!
- Explore the [Detailed Tutorial](/docs/detailed_tutorial).
+- Explore the [Detailed Tutorial](./detailed_tutorial).
 - Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).
 - Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks).
 - Learn about Llama Stack [Concepts](/docs/concepts).
--- a/docs/docs/providers/agents/index.mdx
+++ b/docs/docs/providers/agents/index.mdx
@ -25,7 +25,3 @@ Agents API for creating and interacting with agentic systems.
    - Agents can also use Memory to retrieve information from knowledge bases. See the RAG Tool and Vector IO APIs for more details.

 This section contains documentation for all available providers for the **agents** API.
-
-## Providers
-
- [Meta-Reference](./inline_meta-reference)
--- a/docs/docs/providers/batches/index.mdx
+++ b/docs/docs/providers/batches/index.mdx
@ -29,7 +29,3 @@ The Batches API enables efficient processing of multiple requests in a single op
    Note: This API is currently under active development and may undergo changes.

 This section contains documentation for all available providers for the **batches** API.
-
-## Providers
-
- [Reference](./inline_reference)
--- a/docs/docs/providers/datasetio/index.md
+++ b/docs/docs/providers/datasetio/index.md
@ -1,16 +0,0 @@
---
-sidebar_label: Datasetio
-title: Datasetio
---
-
-# Datasetio
-
-## Overview
-
-This section contains documentation for all available providers for the **datasetio** API.
-
-## Providers
-
- [Localfs](./inline_localfs)
- [Remote - Huggingface](./remote_huggingface)
- [Remote - Nvidia](./remote_nvidia)
--- a/docs/docs/providers/datasetio/index.mdx
+++ b/docs/docs/providers/datasetio/index.mdx
@ -8,9 +8,3 @@ title: Datasetio
 ## Overview

 This section contains documentation for all available providers for the **datasetio** API.
-
-## Providers
-
- [Localfs](./inline_localfs)
- [Remote - Huggingface](./remote_huggingface)
- [Remote - Nvidia](./remote_nvidia)
--- a/docs/docs/providers/eval/index.mdx
+++ b/docs/docs/providers/eval/index.mdx
@ -11,8 +11,3 @@ title: Eval
 Llama Stack Evaluation API for running evaluations on model and agent candidates.

 This section contains documentation for all available providers for the **eval** API.
-
-## Providers
-
- [Meta-Reference](./inline_meta-reference)
- [Remote - Nvidia](./remote_nvidia)
--- a/docs/docs/providers/external/index.mdx
+++ b/docs/docs/providers/external/index.mdx
@ -7,5 +7,5 @@ Llama Stack supports external providers that live outside of the main codebase.

 ## External Provider Documentation

- [Known External Providers](external-providers-list)
- [Creating External Providers](external-providers-guide)
+- [Known External Providers](./external-providers-list.mdx)
+- [Creating External Providers](./external-providers-guide.mdx)
--- a/docs/docs/providers/files/index.mdx
+++ b/docs/docs/providers/files/index.mdx
@ -8,8 +8,3 @@ title: Files
 ## Overview

 This section contains documentation for all available providers for the **files** API.
-
-## Providers
-
- [Localfs](./inline_localfs)
- [Remote - S3](./remote_s3)
--- a/docs/docs/providers/index.mdx
+++ b/docs/docs/providers/index.mdx
@ -21,13 +21,13 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro

 ## Provider Categories

- **[External Providers](./external/)** - Guide for building and using external providers
- **[OpenAI Compatibility](./openai)** - OpenAI API compatibility layer
- **[Inference](./inference/)** - LLM and embedding model providers
- **[Agents](./agents/)** - Agentic system providers
- **[DatasetIO](./datasetio/)** - Dataset and data loader providers
- **[Safety](./safety/)** - Content moderation and safety providers
- **[Telemetry](./telemetry/)** - Monitoring and observability providers
- **[Vector IO](./vector-io/)** - Vector database providers
- **[Tool Runtime](./tool-runtime/)** - Tool and protocol providers
- **[Files](./files/)** - File system and storage providers
+- **[External Providers](external/index.mdx)** - Guide for building and using external providers
+- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer
+- **[Inference](inference/index.mdx)** - LLM and embedding model providers
+- **[Agents](agents/index.mdx)** - Agentic system providers
+- **[DatasetIO](datasetio/index.mdx)** - Dataset and data loader providers
+- **[Safety](safety/index.mdx)** - Content moderation and safety providers
+- **[Telemetry](telemetry/index.mdx)** - Monitoring and observability providers
+- **[Vector IO](vector_io/index.mdx)** - Vector database providers
+- **[Tool Runtime](tool_runtime/index.mdx)** - Tool and protocol providers
+- **[Files](files/index.mdx)** - File system and storage providers
--- a/docs/docs/providers/inference/index.mdx
+++ b/docs/docs/providers/inference/index.mdx
@ -19,30 +19,3 @@ Llama Stack Inference API for generating completions, chat completions, and embe
    - Embedding models: these models generate embeddings to be used for semantic search.

 This section contains documentation for all available providers for the **inference** API.
-
-## Providers
-
- [Meta-Reference](./inline_meta-reference)
- [Sentence-Transformers](./inline_sentence-transformers)
- [Remote - Anthropic](./remote_anthropic)
- [Remote - Azure](./remote_azure)
- [Remote - Bedrock](./remote_bedrock)
- [Remote - Cerebras](./remote_cerebras)
- [Remote - Databricks](./remote_databricks)
- [Remote - Fireworks](./remote_fireworks)
- [Remote - Gemini](./remote_gemini)
- [Remote - Groq](./remote_groq)
- [Remote - Hf - Endpoint](./remote_hf_endpoint)
- [Remote - Hf - Serverless](./remote_hf_serverless)
- [Remote - Llama-Openai-Compat](./remote_llama-openai-compat)
- [Remote - Nvidia](./remote_nvidia)
- [Remote - Ollama](./remote_ollama)
- [Remote - Openai](./remote_openai)
- [Remote - Passthrough](./remote_passthrough)
- [Remote - Runpod](./remote_runpod)
- [Remote - Sambanova](./remote_sambanova)
- [Remote - Tgi](./remote_tgi)
- [Remote - Together](./remote_together)
- [Remote - Vertexai](./remote_vertexai)
- [Remote - Vllm](./remote_vllm)
- [Remote - Watsonx](./remote_watsonx)
--- a/docs/docs/providers/openai.mdx
+++ b/docs/docs/providers/openai.mdx
@ -1,3 +1,8 @@
+title: OpenAI Compatibility
+description: OpenAI API Compatibility
+sidebar_label: OpenAI Compatibility
+sidebar_position: 1
+---
 ## OpenAI API Compatibility

 ### Server path
--- a/docs/docs/providers/post_training/index.mdx
+++ b/docs/docs/providers/post_training/index.mdx
@ -8,10 +8,3 @@ title: Post_Training
 ## Overview

 This section contains documentation for all available providers for the **post_training** API.
-
-## Providers
-
- [Huggingface-Gpu](./inline_huggingface-gpu)
- [Torchtune-Cpu](./inline_torchtune-cpu)
- [Torchtune-Gpu](./inline_torchtune-gpu)
- [Remote - Nvidia](./remote_nvidia)
--- a/docs/docs/providers/safety/index.mdx
+++ b/docs/docs/providers/safety/index.mdx
@ -8,12 +8,3 @@ title: Safety
 ## Overview

 This section contains documentation for all available providers for the **safety** API.
-
-## Providers
-
- [Code-Scanner](./inline_code-scanner)
- [Llama-Guard](./inline_llama-guard)
- [Prompt-Guard](./inline_prompt-guard)
- [Remote - Bedrock](./remote_bedrock)
- [Remote - Nvidia](./remote_nvidia)
- [Remote - Sambanova](./remote_sambanova)
--- a/docs/docs/providers/scoring/index.mdx
+++ b/docs/docs/providers/scoring/index.mdx
@ -8,9 +8,3 @@ title: Scoring
 ## Overview

 This section contains documentation for all available providers for the **scoring** API.
-
-## Providers
-
- [Basic](./inline_basic)
- [Braintrust](./inline_braintrust)
- [Llm-As-Judge](./inline_llm-as-judge)
--- a/docs/docs/providers/telemetry/index.mdx
+++ b/docs/docs/providers/telemetry/index.mdx
@ -8,7 +8,3 @@ title: Telemetry
 ## Overview

 This section contains documentation for all available providers for the **telemetry** API.
-
-## Providers
-
- [Meta-Reference](./inline_meta-reference)
--- a/docs/docs/providers/tool_runtime/index.mdx
+++ b/docs/docs/providers/tool_runtime/index.mdx
@ -8,12 +8,3 @@ title: Tool_Runtime
 ## Overview

 This section contains documentation for all available providers for the **tool_runtime** API.
-
-## Providers
-
- [Rag-Runtime](./inline_rag-runtime)
- [Remote - Bing-Search](./remote_bing-search)
- [Remote - Brave-Search](./remote_brave-search)
- [Remote - Model-Context-Protocol](./remote_model-context-protocol)
- [Remote - Tavily-Search](./remote_tavily-search)
- [Remote - Wolfram-Alpha](./remote_wolfram-alpha)
--- a/docs/docs/providers/vector_io/index.mdx
+++ b/docs/docs/providers/vector_io/index.mdx
@ -8,18 +8,3 @@ title: Vector_Io
 ## Overview

 This section contains documentation for all available providers for the **vector_io** API.
-
-## Providers
-
- [Chromadb](./inline_chromadb)
- [Faiss](./inline_faiss)
- [Meta-Reference](./inline_meta-reference)
- [Milvus](./inline_milvus)
- [Qdrant](./inline_qdrant)
- [Sqlite-Vec](./inline_sqlite-vec)
- [Sqlite Vec](./inline_sqlite_vec)
- [Remote - Chromadb](./remote_chromadb)
- [Remote - Milvus](./remote_milvus)
- [Remote - Pgvector](./remote_pgvector)
- [Remote - Qdrant](./remote_qdrant)
- [Remote - Weaviate](./remote_weaviate)
--- a/docs/docs/references/index.mdx
+++ b/docs/docs/references/index.mdx
@ -7,6 +7,6 @@ sidebar_position: 1

 # References

- [Python SDK Reference](python_sdk_reference/index)
- [Llama CLI](llama_cli_reference/index) for building and running your Llama Stack server
- [Llama Stack Client CLI](llama_stack_client_cli_reference) for interacting with your Llama Stack server
+- [Python SDK Reference](/docs/references/python_sdk_reference/)
+- [Llama CLI](/docs/references/llama_cli_reference/) for building and running your Llama Stack server
+- [Llama Stack Client CLI](./llama_stack_client_cli_reference.md) for interacting with your Llama Stack server
--- a/docs/docs/references/llama_cli_reference/index.md
+++ b/docs/docs/references/llama_cli_reference/index.md
@ -29,7 +29,7 @@ You have two ways to install Llama Stack:
 ## `llama` subcommands
 1. `download`: Supports downloading models from Meta or Hugging Face.  [Downloading models](#downloading-models)
 2. `model`: Lists available models and their properties. [Understanding models](#understand-the-models)
-3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../../distributions/building_distro) documentation.
+3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building_distro) documentation.

 ### Sample Usage

--- a/docs/docusaurus.config.ts
+++ b/docs/docusaurus.config.ts
@ -217,11 +217,6 @@ const config: Config = {
        ignoreFiles: [
          "node_modules/**/*",
        ],
-
-        // Exclude OpenAPI generated docs from search to avoid duplicates
-        searchContextByPaths: [
-          "docs",
-        ],
      },
    ],
  ],
--- a/docs/static/img/llama-stack-logo.png
+++ b/docs/static/img/llama-stack-logo.png
--- a/docs/static/llama-stack-logo.png
+++ b/docs/static/llama-stack-logo.png
--- a/docs/static/src/components/HomepageFeatures/index.js
+++ b/docs/static/src/components/HomepageFeatures/index.js
@ -1,64 +0,0 @@
-import React from 'react';
-import clsx from 'clsx';
-import styles from './styles.module.css';
-
-const FeatureList = [
-  {
-    title: 'Easy to Use',
-    Svg: require('@site/static/img/undraw_docusaurus_mountain.svg').default,
-    description: (
-      <>
-        Docusaurus was designed from the ground up to be easily installed and
-        used to get your website up and running quickly.
-      </>
-    ),
-  },
-  {
-    title: 'Focus on What Matters',
-    Svg: require('@site/static/img/undraw_docusaurus_tree.svg').default,
-    description: (
-      <>
-        Docusaurus lets you focus on your docs, and we&apos;ll do the chores. Go
-        ahead and move your docs into the <code>docs</code> directory.
-      </>
-    ),
-  },
-  {
-    title: 'Powered by React',
-    Svg: require('@site/static/img/undraw_docusaurus_react.svg').default,
-    description: (
-      <>
-        Extend or customize your website layout by reusing React. Docusaurus can
-        be extended while reusing the same header and footer.
-      </>
-    ),
-  },
-];
-
-function Feature({Svg, title, description}) {
-  return (
-    <div className={clsx('col col--4')}>
-      <div className="text--center">
-        <Svg className={styles.featureSvg} role="img" />
-      </div>
-      <div className="text--center padding-horiz--md">
-        <h3>{title}</h3>
-        <p>{description}</p>
-      </div>
-    </div>
-  );
-}
-
-export default function HomepageFeatures() {
-  return (
-    <section className={styles.features}>
-      <div className="container">
-        <div className="row">
-          {FeatureList.map((props, idx) => (
-            <Feature key={idx} {...props} />
-          ))}
-        </div>
-      </div>
-    </section>
-  );
-}
--- a/docs/static/src/components/HomepageFeatures/styles.module.css
+++ b/docs/static/src/components/HomepageFeatures/styles.module.css
@ -1,11 +0,0 @@
-.features {
-  display: flex;
-  align-items: center;
-  padding: 2rem 0;
-  width: 100%;
-}
-
-.featureSvg {
-  height: 200px;
-  width: 200px;
-}
--- a/docs/static/src/css/custom.css
+++ b/docs/static/src/css/custom.css
@ -1,191 +0,0 @@
-/**
- * Any CSS included here will be global. The classic template
- * bundles Infima by default. Infima is a CSS framework designed to
- * work well for content-centric websites.
- */
-
-/* You can override the default Infima variables here. */
-:root {
-  /* Llama Stack Original Theme - Based on llamastack.github.io */
-  --ifm-color-primary: #4a4a68;
-  --ifm-color-primary-dark: #3a3a52;
-  --ifm-color-primary-darker: #332735;
-  --ifm-color-primary-darkest: #2b2129;
-  --ifm-color-primary-light: #5a5a7e;
-  --ifm-color-primary-lighter: #6a6a94;
-  --ifm-color-primary-lightest: #8080aa;
-
-  /* Additional theme colors */
-  --ifm-color-secondary: #1b263c;
-  --ifm-color-info: #2980b9;
-  --ifm-color-success: #16a085;
-  --ifm-color-warning: #f39c12;
-  --ifm-color-danger: #e74c3c;
-
-  /* Background colors */
-  --ifm-background-color: #ffffff;
-  --ifm-background-surface-color: #f8f9fa;
-
-  /* Code and syntax highlighting */
-  --ifm-code-font-size: 95%;
-  --ifm-pre-background: #1b263c;
-  --ifm-pre-color: #e1e5e9;
-  --docusaurus-highlighted-code-line-bg: rgba(51, 39, 53, 0.1);
-
-  /* Link colors */
-  --ifm-link-color: var(--ifm-color-primary);
-  --ifm-link-hover-color: var(--ifm-color-primary-darker);
-
-  /* Navbar */
-  --ifm-navbar-background-color: rgba(255, 255, 255, 0.95);
-  --ifm-navbar-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
-
-  /* Hero section gradient - matching original theme */
-  --hero-gradient: linear-gradient(90deg, #332735 0%, #1b263c 100%);
-
-  /* OpenAPI method colors */
-  --openapi-code-blue: #2980b9;
-  --openapi-code-green: #16a085;
-  --openapi-code-orange: #f39c12;
-  --openapi-code-red: #e74c3c;
-  --openapi-code-purple: #332735;
-}
-
-/* For readability concerns, you should choose a lighter palette in dark mode. */
-[data-theme='dark'] {
-  /* Dark theme primary colors - lighter versions of original theme */
-  --ifm-color-primary: #8080aa;
-  --ifm-color-primary-dark: #6a6a94;
-  --ifm-color-primary-darker: #5a5a7e;
-  --ifm-color-primary-darkest: #4a4a68;
-  --ifm-color-primary-light: #9090ba;
-  --ifm-color-primary-lighter: #a0a0ca;
-  --ifm-color-primary-lightest: #b0b0da;
-
-  /* Dark theme background colors */
-  --ifm-background-color: #1a1a1a;
-  --ifm-background-surface-color: #2a2a2a;
-
-  /* Dark theme navbar */
-  --ifm-navbar-background-color: rgba(26, 26, 26, 0.95);
-
-  /* Dark theme code highlighting */
-  --docusaurus-highlighted-code-line-bg: rgba(51, 39, 53, 0.3);
-
-  /* Dark theme text colors */
-  --ifm-font-color-base: #e1e5e9;
-  --ifm-font-color-secondary: #a0a6ac;
-}
-
-/* Sidebar Method labels */
-.api-method>.menu__link {
-  align-items: center;
-  justify-content: start;
-}
-
-.api-method>.menu__link::before {
-  width: 50px;
-  height: 20px;
-  font-size: 12px;
-  line-height: 20px;
-  text-transform: uppercase;
-  font-weight: 600;
-  border-radius: 0.25rem;
-  border: 1px solid;
-  margin-right: var(--ifm-spacing-horizontal);
-  text-align: center;
-  flex-shrink: 0;
-  border-color: transparent;
-  color: white;
-}
-
-.get>.menu__link::before {
-  content: "get";
-  background-color: var(--ifm-color-primary);
-}
-
-.put>.menu__link::before {
-  content: "put";
-  background-color: var(--openapi-code-blue);
-}
-
-.post>.menu__link::before {
-  content: "post";
-  background-color: var(--openapi-code-green);
-}
-
-.delete>.menu__link::before {
-  content: "del";
-  background-color: var(--openapi-code-red);
-}
-
-.patch>.menu__link::before {
-  content: "patch";
-  background-color: var(--openapi-code-orange);
-}
-
-.footer--dark {
-  --ifm-footer-link-color: #ffffff;
-  --ifm-footer-title-color: #ffffff;
-}
-
-.footer--dark .footer__link-item {
-  color: #ffffff;
-}
-
-.footer--dark .footer__title {
-  color: #ffffff;
-}
-
-/* OpenAPI theme fixes for light mode readability */
-/* Version badge fixes */
-.openapi__version-badge,
-.theme-doc-version-badge,
-[class*="version-badge"],
-[class*="versionBadge"] {
-  background-color: #ffffff !important;
-  color: #333333 !important;
-  border: 1px solid #d1d5db !important;
-}
-
-/* OpenAPI method badges in light mode */
-.openapi__method-badge,
-[class*="method-badge"] {
-  color: #ffffff !important;
-}
-
-/* Button fixes for light mode */
-.openapi__button,
-.theme-api-docs-demo-panel button,
-[class*="api-docs"] button,
-button[class*="button"],
-.openapi-explorer__response-schema button,
-.openapi-tabs__operation button {
-  color: #ffffff !important;
-}
-
-.openapi__button:hover,
-.theme-api-docs-demo-panel button:hover,
-[class*="api-docs"] button:hover,
-button[class*="button"]:hover,
-.openapi-explorer__response-schema button:hover,
-.openapi-tabs__operation button:hover {
-  color: #ffffff !important;
-}
-
-/* Navigation buttons (Next/Previous) */
-.pagination-nav__link,
-.pagination-nav__label {
-  color: #333333 !important;
-}
-
-.pagination-nav__link--next,
-.pagination-nav__link--prev {
-  background-color: #ffffff !important;
-  border: 1px solid #d1d5db !important;
-}
-
-.pagination-nav__link--next:hover,
-.pagination-nav__link--prev:hover {
-  background-color: #f3f4f6 !important;
-}
--- a/docs/static/src/pages/index.js
+++ b/docs/static/src/pages/index.js
@ -1,163 +0,0 @@
-import React from 'react';
-import clsx from 'clsx';
-import Layout from '@theme/Layout';
-import Link from '@docusaurus/Link';
-import useDocusaurusContext from '@docusaurus/useDocusaurusContext';
-import styles from './index.module.css';
-
-function HomepageHeader() {
-  const {siteConfig} = useDocusaurusContext();
-  return (
-    <header className={clsx('hero hero--primary', styles.heroBanner)}>
-      <div className="container">
-        <div className={styles.heroContent}>
-          <h1 className={styles.heroTitle}>Build AI Applications with Llama Stack</h1>
-          <p className={styles.heroSubtitle}>
-            Unified APIs for Inference, RAG, Agents, Tools, Safety, and Telemetry
-          </p>
-          <div className={styles.buttons}>
-            <Link
-              className={clsx('button button--primary button--lg', styles.getStartedButton)}
-              to="/docs/getting-started">
-              🚀 Get Started
-            </Link>
-            <Link
-              className={clsx('button button--primary button--lg', styles.apiButton)}
-              to="/docs/category/llama-stack-api">
-              📚 API Reference
-            </Link>
-          </div>
-        </div>
-      </div>
-    </header>
-  );
-}
-
-function QuickStart() {
-  return (
-    <section className={styles.quickStart}>
-      <div className="container">
-        <div className="row">
-          <div className="col col--6">
-            <h2 className={styles.sectionTitle}>Quick Start</h2>
-            <p className={styles.sectionDescription}>
-              Get up and running with Llama Stack in just a few commands. Build your first RAG application locally.
-            </p>
-            <div className={styles.codeBlock}>
-              <pre><code>{`# Install uv and start Ollama
-ollama run llama3.2:3b --keepalive 60m
-
-# Run Llama Stack server
-OLLAMA_URL=http://localhost:11434 \\
-  uv run --with llama-stack \\
-  llama stack build --distro starter \\
-  --image-type venv --run
-
-# Try the Python SDK
-from llama_stack_client import LlamaStackClient
-
-client = LlamaStackClient(
-  base_url="http://localhost:8321"
-)
-
-response = client.inference.chat_completion(
-  model="Llama3.2-3B-Instruct",
-  messages=[{
-    "role": "user",
-    "content": "What is machine learning?"
-  }]
-)`}</code></pre>
-            </div>
-          </div>
-          <div className="col col--6">
-            <h2 className={styles.sectionTitle}>Why Llama Stack?</h2>
-            <div className={styles.features}>
-              <div className={styles.feature}>
-                <div className={styles.featureIcon}>🔗</div>
-                <div>
-                  <h4>Unified APIs</h4>
-                  <p>One consistent interface for all your AI needs - inference, safety, agents, and more.</p>
-                </div>
-              </div>
-              <div className={styles.feature}>
-                <div className={styles.featureIcon}>🔄</div>
-                <div>
-                  <h4>Provider Flexibility</h4>
-                  <p>Swap between providers without code changes. Start local, deploy anywhere.</p>
-                </div>
-              </div>
-              <div className={styles.feature}>
-                <div className={styles.featureIcon}>🛡️</div>
-                <div>
-                  <h4>Production Ready</h4>
-                  <p>Built-in safety, monitoring, and evaluation tools for enterprise applications.</p>
-                </div>
-              </div>
-              <div className={styles.feature}>
-                <div className={styles.featureIcon}>📱</div>
-                <div>
-                  <h4>Multi-Platform</h4>
-                  <p>SDKs for Python, Node.js, iOS, Android, and REST APIs for any language.</p>
-                </div>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </section>
-  );
-}
-
-function CommunityLinks() {
-  return (
-    <section className={styles.community}>
-      <div className="container">
-        <div className={styles.communityContent}>
-          <h2 className={styles.sectionTitle}>Join the Community</h2>
-          <p className={styles.sectionDescription}>
-            Connect with developers building the future of AI applications
-          </p>
-          <div className={styles.communityLinks}>
-            <a
-              href="https://github.com/llamastack/llama-stack"
-              className={clsx('button button--outline button--lg', styles.communityButton)}
-              target="_blank"
-              rel="noopener noreferrer">
-              <span className={styles.communityIcon}>⭐</span>
-              Star on GitHub
-            </a>
-            <a
-              href="https://discord.gg/llama-stack"
-              className={clsx('button button--outline button--lg', styles.communityButton)}
-              target="_blank"
-              rel="noopener noreferrer">
-              <span className={styles.communityIcon}>💬</span>
-              Join Discord
-            </a>
-            <Link
-              to="/docs/intro"
-              className={clsx('button button--outline button--lg', styles.communityButton)}>
-              <span className={styles.communityIcon}>📚</span>
-              Read Docs
-            </Link>
-          </div>
-        </div>
-      </div>
-    </section>
-  );
-}
-
-export default function Home() {
-  const {siteConfig} = useDocusaurusContext();
-  return (
-    <Layout
-      title="Build AI Applications"
-      description="The open-source framework for building generative AI applications with unified APIs for Inference, RAG, Agents, Tools, Safety, and Telemetry.">
-      <HomepageHeader />
-      <main>
-        <QuickStart />
-        <CommunityLinks />
-      </main>
-    </Layout>
-  );
-}
--- a/docs/static/src/pages/index.module.css
+++ b/docs/static/src/pages/index.module.css
@ -1,283 +0,0 @@
-/**
- * CSS files with the .module.css suffix will be treated as CSS modules
- * and scoped locally.
- */
-
-.heroBanner {
-  padding: 4rem 0;
-  text-align: center;
-  position: relative;
-  overflow: hidden;
-  background: var(--hero-gradient);
-  color: white;
-  display: flex;
-  align-items: center;
-}
-
-.heroBanner::before {
-  content: '';
-  position: absolute;
-  top: 0;
-  left: 0;
-  right: 0;
-  bottom: 0;
-  background: radial-gradient(circle at 30% 20%, rgba(255, 255, 255, 0.1) 0%, transparent 50%),
-              radial-gradient(circle at 70% 80%, rgba(255, 255, 255, 0.05) 0%, transparent 50%);
-  pointer-events: none;
-}
-
-.heroContent {
-  max-width: 800px;
-  margin: 0 auto;
-}
-
-.heroLogo {
-  height: 48px;
-  width: auto;
-  margin-bottom: 1.5rem;
-}
-
-.heroTitle {
-  font-size: 2.8rem;
-  font-weight: 700;
-  margin-bottom: 1rem;
-  line-height: 1.2;
-}
-
-.heroSubtitle {
-  font-size: 1.1rem;
-  font-weight: 400;
-  margin-bottom: 2rem;
-  opacity: 0.9;
-  line-height: 1.5;
-  max-width: 600px;
-  margin-left: auto;
-  margin-right: auto;
-}
-
-.buttons {
-  display: flex;
-  align-items: center;
-  justify-content: center;
-  gap: 1rem;
-}
-
-.heroBanner .getStartedButton {
-  background: white;
-  color: #332735;
-  border: 2px solid white;
-  font-weight: 600;
-  transition: all 0.3s ease;
-}
-
-.heroBanner .getStartedButton:hover {
-  background: rgba(255, 255, 255, 0.9);
-  color: #2b2129;
-  border-color: rgba(255, 255, 255, 0.9);
-  transform: translateY(-2px);
-  box-shadow: 0 8px 25px rgba(0, 0, 0, 0.15);
-}
-
-.heroBanner .apiButton {
-  background: transparent;
-  color: white;
-  border: 2px solid white;
-  font-weight: 600;
-  transition: all 0.3s ease;
-}
-
-.heroBanner .apiButton:hover {
-  background: white;
-  border-color: white;
-  color: #332735;
-  transform: translateY(-2px);
-}
-
-/* Quick Start Section */
-.quickStart {
-  padding: 4rem 0;
-  background: var(--ifm-background-color);
-}
-
-.sectionTitle {
-  font-size: 2rem;
-  font-weight: 600;
-  margin-bottom: 0.75rem;
-  color: var(--ifm-color-emphasis-800);
-}
-
-.sectionDescription {
-  font-size: 1rem;
-  color: var(--ifm-color-emphasis-600);
-  margin-bottom: 1.5rem;
-  line-height: 1.5;
-}
-
-.codeBlock {
-  background: var(--ifm-color-gray-900);
-  border-radius: 8px;
-  padding: 1.5rem;
-  margin-top: 1.5rem;
-  box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
-}
-
-.codeBlock pre {
-  margin: 0;
-  padding: 0;
-  background: none;
-  border: none;
-}
-
-.codeBlock code {
-  color: var(--ifm-color-gray-100);
-  font-family: 'Fira Code', 'Consolas', 'Monaco', monospace;
-  font-size: 0.9rem;
-  line-height: 1.6;
-}
-
-/* Features */
-.features {
-  display: flex;
-  flex-direction: column;
-  gap: 1rem;
-  margin-top: 1.5rem;
-}
-
-.feature {
-  display: flex;
-  align-items: flex-start;
-  gap: 1rem;
-  padding: 1rem;
-  border-radius: 8px;
-  background: var(--ifm-color-gray-50);
-  border: 1px solid var(--ifm-color-gray-200);
-  transition: all 0.2s ease;
-}
-
-.feature:hover {
-  transform: translateY(-2px);
-  box-shadow: 0 8px 25px rgba(0, 0, 0, 0.1);
-  border-color: var(--ifm-color-primary-lighter);
-}
-
-.featureIcon {
-  font-size: 2rem;
-  width: 3rem;
-  height: 3rem;
-  display: flex;
-  align-items: center;
-  justify-content: center;
-  background: var(--ifm-color-primary-lightest);
-  border-radius: 50%;
-  flex-shrink: 0;
-}
-
-.feature h4 {
-  margin: 0 0 0.5rem 0;
-  font-size: 1.1rem;
-  font-weight: 600;
-  color: var(--ifm-color-emphasis-800);
-}
-
-.feature p {
-  margin: 0;
-  color: var(--ifm-color-emphasis-600);
-  line-height: 1.5;
-}
-
-/* Community Section */
-.community {
-  padding: 3rem 0;
-  background: var(--ifm-color-gray-50);
-  border-top: 1px solid var(--ifm-color-gray-200);
-}
-
-.communityContent {
-  text-align: center;
-  max-width: 600px;
-  margin: 0 auto;
-}
-
-.communityLinks {
-  display: flex;
-  justify-content: center;
-  gap: 1rem;
-  margin-top: 2rem;
-}
-
-.communityButton {
-  display: flex;
-  align-items: center;
-  gap: 0.5rem;
-  font-weight: 600;
-  transition: all 0.3s ease;
-}
-
-.communityButton:hover {
-  transform: translateY(-2px);
-  box-shadow: 0 8px 25px rgba(0, 0, 0, 0.1);
-}
-
-.communityIcon {
-  font-size: 1.2rem;
-}
-
-/* Responsive Design */
-@media screen and (max-width: 996px) {
-  .heroBanner {
-    padding: 3rem 2rem;
-  }
-
-  .heroTitle {
-    font-size: 2.2rem;
-  }
-
-  .heroSubtitle {
-    font-size: 1rem;
-  }
-
-  .buttons {
-    flex-direction: column;
-    gap: 1rem;
-  }
-
-  .quickStart {
-    padding: 3rem 0;
-  }
-
-  .sectionTitle {
-    font-size: 1.75rem;
-  }
-
-  .communityLinks {
-    flex-direction: column;
-    align-items: center;
-  }
-
-  .communityButton {
-    width: 200px;
-    justify-content: center;
-  }
-}
-
-@media screen and (max-width: 768px) {
-  .heroLogo {
-    height: 40px;
-  }
-
-  .heroTitle {
-    font-size: 1.8rem;
-  }
-
-  .codeBlock {
-    padding: 1rem;
-  }
-
-  .codeBlock code {
-    font-size: 0.8rem;
-  }
-
-  .feature {
-    padding: 0.75rem;
-  }
-}
--- a/docs/static/src/pages/markdown-page.md
+++ b/docs/static/src/pages/markdown-page.md
@ -1,7 +0,0 @@
---
-title: Markdown page example
---
-
-# Markdown page example
-
-You don't need React to write simple standalone pages.
--- a/scripts/provider_codegen.py
+++ b/scripts/provider_codegen.py
@ -358,16 +358,6 @@ def generate_index_docs(api_name: str, api_docstring: str | None, provider_entri
        md_lines.append("")

    md_lines.append(f"This section contains documentation for all available providers for the **{api_name}** API.")
-    md_lines.append("")
-
-    md_lines.append("## Providers")
-    md_lines.append("")
-
-    # For Docusaurus, create a simple list of links instead of toctree
-    for entry in provider_entries:
-        provider_name = entry["display_name"]
-        filename = entry["filename"]
-        md_lines.append(f"- [{provider_name}](./{filename})")

    return "\n".join(md_lines) + "\n"