docs naming update

2025-10-04 12:07:34 +00:00 · 2025-09-22 17:17:07 -07:00 · 2025-09-22 17:17:07 -07:00 · 9a8652ee30
commit 9a8652ee30
parent 584f3592ce
42 changed files with 377 additions and 81 deletions
--- a/docs/docs/advanced_apis/evaluation.mdx
+++ b/docs/docs/advanced_apis/evaluation.mdx
@ -15,7 +15,7 @@ The Evaluation API works with several related APIs to provide comprehensive eval
 - `/eval` + `/benchmarks` API - Generate outputs and perform scoring
 :::tip
-For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide.
+For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide.
 :::
 ## Meta Reference
@ -112,7 +112,7 @@ Llama Stack pre-registers several popular open-benchmarks for easy model evaluat
 ## Next Steps
- Check out the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide for detailed conceptual information
+- Check out the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide for detailed conceptual information
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples
+- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive CLI and API usage
+- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive CLI and API usage
 - Explore the [Scoring](./scoring.mdx) documentation for available scoring functions
--- a/docs/docs/advanced_apis/post_training.mdx
+++ b/docs/docs/advanced_apis/post_training.mdx
@ -300,6 +300,6 @@ customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:=http://nemo.test}
 ## Next Steps
- Check out the [Building Applications - Fine-tuning](../building-applications/index.mdx) guide for application-level examples
+- Check out the [Building Applications - Fine-tuning](../building_applications/index.mdx) guide for application-level examples
 - See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation
- Review the [API Reference](../api-reference/post-training.mdx) for complete API documentation
+- Review the [API Reference](../api_reference/post_training.mdx) for complete API documentation
--- a/docs/docs/advanced_apis/scoring.mdx
+++ b/docs/docs/advanced_apis/scoring.mdx
@ -188,6 +188,6 @@ The Scoring API works closely with the [Evaluation](./evaluation.mdx) API to pro
 ## Next Steps
 - Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples
+- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive scoring function usage
+- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive scoring function usage
- Explore the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) for detailed conceptual information
+- Explore the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) for detailed conceptual information
--- a/docs/docs/building_applications/agent.mdx
+++ b/docs/docs/building_applications/agent.mdx
@ -102,11 +102,11 @@ Each turn consists of multiple steps that represent the agent's thought process:
 ## Agent Execution Loop
-Refer to the [Agent Execution Loop](./agent-execution-loop) for more details on what happens within an agent turn.
+Refer to the [Agent Execution Loop](./agent_execution_loop) for more details on what happens within an agent turn.
 ## Related Resources
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding the internal processing flow
+- **[Agent Execution Loop](./agent_execution_loop)** - Understanding the internal processing flow
 - **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced agents
 - **[Tools Integration](./tools)** - Extending agent capabilities with external tools
 - **[Safety Guardrails](./safety)** - Implementing responsible AI practices
--- a/docs/docs/building_applications/index.mdx
+++ b/docs/docs/building_applications/index.mdx
@ -21,8 +21,8 @@ Here are the key topics that will help you build effective AI applications:
 ### 🤖 **Agent Development**
 - **[Agent Framework](./agent)** - Understand the components and design patterns of the Llama Stack agent framework
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process information, make decisions, and execute actions
+- **[Agent Execution Loop](./agent_execution_loop)** - How agents process information, make decisions, and execute actions
- **[Agents vs Responses API](./responses-vs-agents)** - Learn when to use each API for different use cases
+- **[Agents vs Responses API](./responses_vs_agents)** - Learn when to use each API for different use cases
 ### 📚 **Knowledge Integration**
 - **[RAG (Retrieval-Augmented Generation)](./rag)** - Enhance your agents with external knowledge through retrieval mechanisms
--- a/docs/docs/building_applications/responses_vs_agents.mdx
+++ b/docs/docs/building_applications/responses_vs_agents.mdx
@ -215,7 +215,7 @@ Use this framework to choose the right API for your use case:
 ## Related Resources
 - **[Agents](./agent)** - Understanding the Agents API fundamentals
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process turns and steps
+- **[Agent Execution Loop](./agent_execution_loop)** - How agents process turns and steps
 - **[Tools Integration](./tools)** - Adding capabilities to both APIs
 - **[OpenAI Compatibility](/docs/providers/openai-compatibility)** - Using OpenAI-compatible endpoints
 - **[Safety Guardrails](./safety)** - Implementing safety measures in agents
--- a/docs/docs/building_applications/safety.mdx
+++ b/docs/docs/building_applications/safety.mdx
@ -389,7 +389,7 @@ client.shields.register(
 ## Related Resources
 - **[Agents](./agent)** - Integrating safety shields with intelligent agents
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding safety in the execution flow
+- **[Agent Execution Loop](./agent_execution_loop)** - Understanding safety in the execution flow
 - **[Evaluations](./evals)** - Evaluating safety shield effectiveness
 - **[Telemetry](./telemetry)** - Monitoring safety violations and metrics
 - **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details
--- a/docs/docs/building_applications/tools.mdx
+++ b/docs/docs/building_applications/tools.mdx
@ -335,6 +335,6 @@ response = agent.create_turn(
 - **[Agents](./agent)** - Building intelligent agents with tools
 - **[RAG (Retrieval Augmented Generation)](./rag)** - Using knowledge retrieval tools
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding tool execution flow
+- **[Agent Execution Loop](./agent_execution_loop)** - Understanding tool execution flow
 - **[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Comprehensive examples
 - **[Llama Stack Apps Examples](https://github.com/meta-llama/llama-stack-apps)** - Real-world tool implementations
--- a/docs/docs/concepts/api_providers.mdx
+++ b/docs/docs/concepts/api_providers.mdx
--- a/docs/docs/concepts/evaluation_concepts.mdx
+++ b/docs/docs/concepts/evaluation_concepts.mdx
@ -30,7 +30,7 @@ The list of open-benchmarks we currently support:
 - [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
 - [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
-You can follow this [contributing guide](../references/evals-reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
+You can follow this [contributing guide](../references/evals_reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
 ### Run evaluation on open-benchmarks via CLI
@ -67,5 +67,5 @@ evaluation results over there.
 ## What's Next?
 - Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
- Check out our [Building Applications - Evaluation](../building-applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
+- Check out our [Building Applications - Evaluation](../building_applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
- Check out our [Evaluation Reference](../references/evals-reference.mdx) for more details on the APIs.
+- Check out our [Evaluation Reference](../references/evals_reference.mdx) for more details on the APIs.
--- a/docs/docs/concepts/index.mdx
+++ b/docs/docs/concepts/index.mdx
@ -13,6 +13,6 @@ This section covers the key concepts you need to understand to work effectively
 - **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits
 - **[APIs](./apis)** - Available REST APIs and planned capabilities
- **[API Providers](./api-providers)** - Remote vs inline provider implementations
+- **[API Providers](./api_providers)** - Remote vs inline provider implementations
 - **[Distributions](./distributions)** - Pre-packaged provider configurations
 - **[Resources](./resources)** - Resource federation and registration
--- a/docs/docs/contributing/index.mdx
+++ b/docs/docs/contributing/index.mdx
@ -133,8 +133,8 @@ Keep PRs small and focused. Split large changes into logically grouped, smaller
 Learn how to extend Llama Stack with new capabilities:
- **[Adding a New API Provider](./new-api-provider)** - Add new API providers to the Stack
+- **[Adding a New API Provider](./new_api_provider)** - Add new API providers to the Stack
- **[Adding a Vector Database](./new-vector-database)** - Add new vector databases
+- **[Adding a Vector Database](./new_vector_database)** - Add new vector databases
 - **[External Providers](/docs/providers/external)** - Add external providers to the Stack
 ## Testing
@ -304,12 +304,12 @@ By contributing to Llama Stack, you agree that your contributions will be licens
 ## Advanced Topics
- **[Testing Record-Replay System](./testing-record-replay)** - Deep dive into testing internals
+- **[Testing Record-Replay System](./testing_record_replay)** - Deep dive into testing internals
 ## Related Resources
- **[Adding API Providers](./new-api-provider)** - Extend Llama Stack with new providers
+- **[Adding API Providers](./new_api_provider)** - Extend Llama Stack with new providers
- **[Vector Database Integration](./new-vector-database)** - Add vector database support
+- **[Vector Database Integration](./new_vector_database)** - Add vector database support
 - **[External Providers](/docs/providers/external)** - External provider development
 - **[GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)** - Community discussion
 - **[Discord](https://discord.gg/llama-stack)** - Real-time community chat
--- a/docs/docs/contributing/new_api_provider.mdx
+++ b/docs/docs/contributing/new_api_provider.mdx
@ -279,5 +279,5 @@ class YourProvider:
 - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
 - **[External Providers](/docs/providers/external)** - Alternative implementation approach
- **[Vector Database Guide](./new-vector-database)** - Specialized provider implementation
+- **[Vector Database Guide](./new_vector_database)** - Specialized provider implementation
- **[Testing Record-Replay](./testing-record-replay)** - Advanced testing techniques
+- **[Testing Record-Replay](./testing_record_replay)** - Advanced testing techniques
--- a/docs/docs/contributing/new_vector_database.mdx
+++ b/docs/docs/contributing/new_vector_database.mdx
@ -489,5 +489,5 @@ async def add_chunks(self, chunks: List[Chunk]) -> List[str]:
 - **[Vector IO Providers](/docs/providers/vector_io)** - Existing provider implementations
 - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
- **[New API Provider Guide](./new-api-provider)** - General provider development
+- **[New API Provider Guide](./new_api_provider)** - General provider development
- **[Testing Guide](./testing-record-replay)** - Advanced testing techniques
+- **[Testing Guide](./testing_record_replay)** - Advanced testing techniques
--- a/docs/docs/contributing/testing_record_replay.mdx
+++ b/docs/docs/contributing/testing_record_replay.mdx
--- a/docs/docs/distributions/building_distro.mdx
+++ b/docs/docs/distributions/building_distro.mdx
@ -167,7 +167,7 @@ You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and
 ```
 :::tip
-The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing-run-yaml).
+The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
 :::
 </TabItem>
--- a/docs/docs/distributions/configuration.mdx
+++ b/docs/docs/distributions/configuration.mdx
@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
 The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
 :::note
-The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing-run-yaml).
+The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
 :::
 <details>
--- a/docs/docs/distributions/customizing_run_yaml.mdx
+++ b/docs/docs/distributions/customizing_run_yaml.mdx
@ -51,5 +51,5 @@ The goal is to take the generated template and adapt it to your specific infrast
 ## Related Guides
 - **[Configuration Reference](./configuration)** - Detailed configuration file format and options
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run with your custom configuration
+- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run with your custom configuration
- **[Building Custom Distributions](./building-distro)** - Create distributions with your preferred providers
+- **[Building Custom Distributions](./building_distro)** - Create distributions with your preferred providers
--- a/docs/docs/distributions/importing_as_library.mdx
+++ b/docs/docs/distributions/importing_as_library.mdx
@ -36,7 +36,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
 response = client.models.list()
 ```
-If you've created a [custom distribution](./building-distro), you can also use the run.yaml configuration file directly:
+If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:
 ```python
 client = LlamaStackAsLibraryClient(config_path)
@ -60,6 +60,6 @@ Library mode is ideal when:
 ## Related Guides
- **[Building Custom Distributions](./building-distro)** - Create your own distribution for library use
+- **[Building Custom Distributions](./building_distro)** - Create your own distribution for library use
 - **[Configuration Reference](./configuration)** - Understanding the configuration format
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - Alternative server-based deployment
+- **[Starting Llama Stack Server](./starting_llama_stack_server)** - Alternative server-based deployment
--- a/docs/docs/distributions/index.mdx
+++ b/docs/docs/distributions/index.mdx
@ -13,9 +13,9 @@ This section provides an overview of the distributions available in Llama Stack.
 ## Distribution Guides
- **[Available Distributions](./list-of-distributions)** - Complete list and comparison of all distributions
+- **[Available Distributions](./list_of_distributions)** - Complete list and comparison of all distributions
- **[Building Custom Distributions](./building-distro)** - Create your own distribution from scratch
+- **[Building Custom Distributions](./building_distro)** - Create your own distribution from scratch
- **[Customizing Configuration](./customizing-run-yaml)** - Customize run.yaml for your needs
+- **[Customizing Configuration](./customizing_run_yaml)** - Customize run.yaml for your needs
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run distributions
+- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run distributions
- **[Importing as Library](./importing-as-library)** - Use distributions in your code
+- **[Importing as Library](./importing_as_library)** - Use distributions in your code
 - **[Configuration Reference](./configuration)** - Configuration file format details
--- a/docs/docs/distributions/list_of_distributions.mdx
+++ b/docs/docs/distributions/list_of_distributions.mdx
@ -34,7 +34,7 @@ Llama Stack provides several pre-configured distributions to help you get starte
 docker pull llama-stack/distribution-starter
 ```
-**Guides:** [Starter Distribution Guide](./self-hosted-distro/starter)
+**Guides:** [Starter Distribution Guide](./self_hosted_distro/starter)
 ### 🖥️ Self-Hosted with GPU
@ -47,14 +47,14 @@ docker pull llama-stack/distribution-starter
 docker pull llama-stack/distribution-meta-reference-gpu
 ```
-**Guides:** [Meta Reference GPU Guide](./self-hosted-distro/meta-reference-gpu)
+**Guides:** [Meta Reference GPU Guide](./self_hosted_distro/meta_reference_gpu)
 ### 🖥️ Self-Hosted with NVIDA NeMo Microservices
 **Use `nvidia` if you:**
 - Want to use Llama Stack with NVIDIA NeMo Microservices
-**Guides:** [NVIDIA Distribution Guide](./self-hosted-distro/nvidia)
+**Guides:** [NVIDIA Distribution Guide](./self_hosted_distro/nvidia)
 ### ☁️ Managed Hosting
@ -65,7 +65,7 @@ docker pull llama-stack/distribution-meta-reference-gpu
 **Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)
-**Guides:** [Remote-Hosted Endpoints](./remote-hosted-distro/)
+**Guides:** [Remote-Hosted Endpoints](./remote_hosted_distro/)
 ### 📱 Mobile Development
@ -74,8 +74,8 @@ docker pull llama-stack/distribution-meta-reference-gpu
 - Need on-device inference capabilities
 - Want offline functionality
- [iOS SDK](./ondevice-distro/ios-sdk)
+- [iOS SDK](./ondevice_distro/ios_sdk)
- [Android SDK](./ondevice-distro/android-sdk)
+- [Android SDK](./ondevice_distro/android_sdk)
 ### 🔧 Custom Solutions
@ -84,23 +84,23 @@ docker pull llama-stack/distribution-meta-reference-gpu
 - You need custom configurations
 - You want to optimize for your specific use case
-**Guides:** [Building Custom Distributions](./building-distro)
+**Guides:** [Building Custom Distributions](./building_distro)
 ## Detailed Documentation
 ### Self-Hosted Distributions
- **[Starter Distribution](./self-hosted-distro/starter)** - General purpose template
+- **[Starter Distribution](./self_hosted_distro/starter)** - General purpose template
- **[Meta Reference GPU](./self-hosted-distro/meta-reference-gpu)** - High-performance GPU inference
+- **[Meta Reference GPU](./self_hosted_distro/meta_reference_gpu)** - High-performance GPU inference
 ### Remote-Hosted Solutions
- **[Remote-Hosted Overview](./remote-hosted-distro/)** - Managed hosting options
+- **[Remote-Hosted Overview](./remote_hosted_distro/)** - Managed hosting options
 ### Mobile SDKs
- **[iOS SDK](./ondevice-distro/ios-sdk)** - Native iOS development
+- **[iOS SDK](./ondevice_distro/ios_sdk)** - Native iOS development
- **[Android SDK](./ondevice-distro/android-sdk)** - Native Android development
+- **[Android SDK](./ondevice_distro/android_sdk)** - Native Android development
 ## Decision Flow
--- a/docs/docs/distributions/ondevice_distro/android_sdk.mdx
+++ b/docs/docs/distributions/ondevice_distro/android_sdk.mdx
@ -306,4 +306,4 @@ The API interface is generated using the OpenAPI standard with [Stainless](https
 - **[llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin)** - Official Kotlin SDK repository
 - **[Android Demo App](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app)** - Complete example app
 - **[ExecuTorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
- **[iOS SDK](./ios-sdk)** - iOS development guide
+- **[iOS SDK](./ios_sdk)** - iOS development guide
--- a/docs/docs/distributions/ondevice_distro/ios_sdk.mdx
+++ b/docs/docs/distributions/ondevice_distro/ios_sdk.mdx
@ -176,4 +176,4 @@ The iOS SDK is ideal for:
 - **[llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)** - Official Swift SDK repository
 - **[iOS Calendar Assistant](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant)** - Complete example app
 - **[executorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
- **[Android SDK](./android-sdk)** - Android development guide
+- **[Android SDK](./android_sdk)** - Android development guide
--- a/docs/docs/distributions/remote_hosted_distro/index.mdx
+++ b/docs/docs/distributions/remote_hosted_distro/index.mdx
@ -48,6 +48,6 @@ $ llama-stack-client models list
 ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distribution types
+- **[Available Distributions](../list_of_distributions)** - Compare with other distribution types
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Using as Library](../importing-as-library)** - Alternative deployment approach
+- **[Using as Library](../importing_as_library)** - Alternative deployment approach
--- a/docs/docs/distributions/remote_hosted_distro/watsonx.mdx
+++ b/docs/docs/distributions/remote_hosted_distro/watsonx.mdx
@ -105,6 +105,6 @@ The watsonx distribution is ideal for:
 ## Related Guides
 - **[Remote-Hosted Overview](./index)** - Overview of remote-hosted distributions
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/dell.mdx
+++ b/docs/docs/distributions/self_hosted_distro/dell.mdx
@ -217,6 +217,6 @@ The Dell distribution is ideal for:
 ## Related Guides
- **[Dell-TGI Distribution](./dell-tgi)** - Dell's TGI-specific distribution
+- **[Dell-TGI Distribution](./dell_tgi)** - Dell's TGI-specific distribution
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/dell_tgi.mdx
+++ b/docs/docs/distributions/self_hosted_distro/dell_tgi.mdx
@ -98,4 +98,4 @@ The Dell-TGI distribution is ideal for:
 - **[Dell Distribution](./dell)** - Dell's standard distribution
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
@ -0,0 +1,125 @@
 ---
 orphan: true
 ---
 <!-- This file was auto-generated by distro_codegen.py, please edit source -->
 # Meta Reference GPU Distribution
 ```{toctree}
 :maxdepth: 2
 :hidden:
 self
 ```
 The `llamastack/distribution-meta-reference-gpu` distribution consists of the following provider configurations:
 | API | Provider(s) |
 |-----|-------------|
 | agents | `inline::meta-reference` |
 | datasetio | `remote::huggingface`, `inline::localfs` |
 | eval | `inline::meta-reference` |
 | inference | `inline::meta-reference` |
 | safety | `inline::llama-guard` |
 | scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
 | telemetry | `inline::meta-reference` |
 | tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
 | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
 Note that you need access to nvidia GPUs to run this distribution. This distribution is not compatible with CPU-only machines or machines with AMD GPUs.
 ### Environment Variables
 The following environment variables can be configured:
 - `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
 - `INFERENCE_MODEL`: Inference model loaded into the Meta Reference server (default: `meta-llama/Llama-3.2-3B-Instruct`)
 - `INFERENCE_CHECKPOINT_DIR`: Directory containing the Meta Reference model checkpoint (default: `null`)
 - `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`)
 - `SAFETY_CHECKPOINT_DIR`: Directory containing the Llama-Guard model checkpoint (default: `null`)
 ## Prerequisite: Downloading Models
 Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
 ```
 $ llama model list --downloaded
 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
 ┃ Model                                   ┃ Size     ┃ Modified Time       ┃
 ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
 │ Llama3.2-1B-Instruct:int4-qlora-eo8     │ 1.53 GB  │ 2025-02-26 11:22:28 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Llama3.2-1B                             │ 2.31 GB  │ 2025-02-18 21:48:52 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Prompt-Guard-86M                        │ 0.02 GB  │ 2025-02-26 11:29:28 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB  │ 2025-02-26 11:37:41 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Llama3.2-3B                             │ 5.99 GB  │ 2025-02-18 21:51:26 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Llama3.1-8B                             │ 14.97 GB │ 2025-02-16 10:36:37 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB  │ 2025-02-26 11:35:02 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Llama-Guard-3-1B                        │ 2.80 GB  │ 2025-02-26 11:20:46 │
 ├─────────────────────────────────────────┼──────────┼─────────────────────┤
 │ Llama-Guard-3-1B:int4                   │ 0.43 GB  │ 2025-02-26 11:33:33 │
 └─────────────────────────────────────────┴──────────┴─────────────────────┘
 ```
 ## Running the Distribution
 You can do this via venv or Docker which has a pre-built image.
 ### Via Docker
 This method allows you to get started quickly without having to build the distribution code.
 ```bash
 LLAMA_STACK_PORT=8321
 docker run \
  -it \
  --pull always \
  --gpu all \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  llamastack/distribution-meta-reference-gpu \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
 ```
 If you are using Llama Stack Safety / Shield APIs, use:
 ```bash
 docker run \
  -it \
  --pull always \
  --gpu all \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  llamastack/distribution-meta-reference-gpu \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
 ```
 ### Via venv
 Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
 ```bash
 llama stack build --distro meta-reference-gpu --image-type venv
 llama stack run distributions/meta-reference-gpu/run.yaml \
  --port 8321 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
 ```
 If you are using Llama Stack Safety / Shield APIs, use:
 ```bash
 llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
  --port 8321 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
 ```
--- a/docs/docs/distributions/self_hosted_distro/meta_reference_gpu.mdx
+++ b/docs/docs/distributions/self_hosted_distro/meta_reference_gpu.mdx
@ -149,6 +149,6 @@ The Meta Reference GPU distribution is ideal for:
 ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/nvidia.md
+++ b/docs/docs/distributions/self_hosted_distro/nvidia.md
@ -0,0 +1,171 @@
 ---
 orphan: true
 ---
 <!-- This file was auto-generated by distro_codegen.py, please edit source -->
 # NVIDIA Distribution
 The `llamastack/distribution-nvidia` distribution consists of the following provider configurations.
 | API | Provider(s) |
 |-----|-------------|
 | agents | `inline::meta-reference` |
 | datasetio | `inline::localfs`, `remote::nvidia` |
 | eval | `remote::nvidia` |
 | files | `inline::localfs` |
 | inference | `remote::nvidia` |
 | post_training | `remote::nvidia` |
 | safety | `remote::nvidia` |
 | scoring | `inline::basic` |
 | telemetry | `inline::meta-reference` |
 | tool_runtime | `inline::rag-runtime` |
 | vector_io | `inline::faiss` |
 ### Environment Variables
 The following environment variables can be configured:
 - `NVIDIA_API_KEY`: NVIDIA API Key (default: ``)
 - `NVIDIA_APPEND_API_VERSION`: Whether to append the API version to the base_url (default: `True`)
 - `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`)
 - `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`)
 - `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`)
 - `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`)
 - `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`)
 - `NVIDIA_GUARDRAILS_CONFIG_ID`: NVIDIA Guardrail Configuration ID (default: `self-check`)
 - `NVIDIA_EVALUATOR_URL`: URL for the NeMo Evaluator Service (default: `http://0.0.0.0:7331`)
 - `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`)
 - `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`)
 ### Models
 The following models are available by default:
 - `meta/llama3-8b-instruct `
 - `meta/llama3-70b-instruct `
 - `meta/llama-3.1-8b-instruct `
 - `meta/llama-3.1-70b-instruct `
 - `meta/llama-3.1-405b-instruct `
 - `meta/llama-3.2-1b-instruct `
 - `meta/llama-3.2-3b-instruct `
 - `meta/llama-3.2-11b-vision-instruct `
 - `meta/llama-3.2-90b-vision-instruct `
 - `meta/llama-3.3-70b-instruct `
 - `nvidia/vila `
 - `nvidia/llama-3.2-nv-embedqa-1b-v2 `
 - `nvidia/nv-embedqa-e5-v5 `
 - `nvidia/nv-embedqa-mistral-7b-v2 `
 - `snowflake/arctic-embed-l `
 ## Prerequisites
 ### NVIDIA API Keys
 Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). Use this key for the `NVIDIA_API_KEY` environment variable.
 ### Deploy NeMo Microservices Platform
 The NVIDIA NeMo microservices platform supports end-to-end microservice deployment of a complete AI flywheel on your Kubernetes cluster through the NeMo Microservices Helm Chart. Please reference the [NVIDIA NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for platform prerequisites and instructions to install and deploy the platform.
 ## Supported Services
 Each Llama Stack API corresponds to a specific NeMo microservice. The core microservices (Customizer, Evaluator, Guardrails) are exposed by the same endpoint. The platform components (Data Store) are each exposed by separate endpoints.
 ### Inference: NVIDIA NIM
 NVIDIA NIM is used for running inference with registered models. There are two ways to access NVIDIA NIMs:
  1. Hosted (default): Preview APIs hosted at https://integrate.api.nvidia.com (Requires an API key)
  2. Self-hosted: NVIDIA NIMs that run on your own infrastructure.
 The deployed platform includes the NIM Proxy microservice, which is the service that provides to access your NIMs (for example, to run inference on a model). Set the `NVIDIA_BASE_URL` environment variable to use your NVIDIA NIM Proxy deployment.
 ### Datasetio API: NeMo Data Store
 The NeMo Data Store microservice serves as the default file storage solution for the NeMo microservices platform. It exposts APIs compatible with the Hugging Face Hub client (`HfApi`), so you can use the client to interact with Data Store. The `NVIDIA_DATASETS_URL` environment variable should point to your NeMo Data Store endpoint.
 See the {repopath}`NVIDIA Datasetio docs::llama_stack/providers/remote/datasetio/nvidia/README.md` for supported features and example usage.
 ### Eval API: NeMo Evaluator
 The NeMo Evaluator microservice supports evaluation of LLMs. Launching an Evaluation job with NeMo Evaluator requires an Evaluation Config (an object that contains metadata needed by the job). A Llama Stack Benchmark maps to an Evaluation Config, so registering a Benchmark creates an Evaluation Config in NeMo Evaluator. The `NVIDIA_EVALUATOR_URL` environment variable should point to your NeMo Microservices endpoint.
 See the {repopath}`NVIDIA Eval docs::llama_stack/providers/remote/eval/nvidia/README.md` for supported features and example usage.
 ### Post-Training API: NeMo Customizer
 The NeMo Customizer microservice supports fine-tuning models. You can reference {repopath}`this list of supported models::llama_stack/providers/remote/post_training/nvidia/models.py` that can be fine-tuned using Llama Stack. The `NVIDIA_CUSTOMIZER_URL` environment variable should point to your NeMo Microservices endpoint.
 See the {repopath}`NVIDIA Post-Training docs::llama_stack/providers/remote/post_training/nvidia/README.md` for supported features and example usage.
 ### Safety API: NeMo Guardrails
 The NeMo Guardrails microservice sits between your application and the LLM, and adds checks and content moderation to a model. The `GUARDRAILS_SERVICE_URL` environment variable should point to your NeMo Microservices endpoint.
 See the {repopath}`NVIDIA Safety docs::llama_stack/providers/remote/safety/nvidia/README.md` for supported features and example usage.
 ## Deploying models
 In order to use a registered model with the Llama Stack APIs, ensure the corresponding NIM is deployed to your environment. For example, you can use the NIM Proxy microservice to deploy `meta/llama-3.2-1b-instruct`.
 Note: For improved inference speeds, we need to use NIM with `fast_outlines` guided decoding system (specified in the request body). This is the default if you deployed the platform with the NeMo Microservices Helm Chart.
 ```sh
 # URL to NeMo NIM Proxy service
 export NEMO_URL="http://nemo.test"
 curl --location "$NEMO_URL/v1/deployment/model-deployments" \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
      "name": "llama-3.2-1b-instruct",
      "namespace": "meta",
      "config": {
         "model": "meta/llama-3.2-1b-instruct",
         "nim_deployment": {
            "image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct",
            "image_tag": "1.8.3",
            "pvc_size": "25Gi",
            "gpu": 1,
            "additional_envs": {
               "NIM_GUIDED_DECODING_BACKEND": "fast_outlines"
            }
         }
      }
   }'
 ```
 This NIM deployment should take approximately 10 minutes to go live. [See the docs](https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/deploy-nims.html) for more information on how to deploy a NIM and verify it's available for inference.
 You can also remove a deployed NIM to free up GPU resources, if needed.
 ```sh
 export NEMO_URL="http://nemo.test"
 curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct"
 ```
 ## Running Llama Stack with NVIDIA
 You can do this via venv (build code), or Docker which has a pre-built image.
 ### Via Docker
 This method allows you to get started quickly without having to build the distribution code.
 ```bash
 LLAMA_STACK_PORT=8321
 docker run \
  -it \
  --pull always \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ./run.yaml:/root/my-run.yaml \
  llamastack/distribution-nvidia \
  --config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env NVIDIA_API_KEY=$NVIDIA_API_KEY
 ```
 ### Via venv
 If you've set up your local development environment, you can also build the image using your local virtual environment.
 ```bash
 INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
 llama stack build --distro nvidia --image-type venv
 llama stack run ./run.yaml \
  --port 8321 \
  --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
  --env INFERENCE_MODEL=$INFERENCE_MODEL
 ```
 ## Example Notebooks
 For examples of how to use the NVIDIA Distribution to run inference, fine-tune, evaluate, and run safety checks on your LLMs, you can reference the example notebooks in {repopath}`docs/notebooks/nvidia`.
--- a/docs/docs/distributions/self_hosted_distro/nvidia.mdx
+++ b/docs/docs/distributions/self_hosted_distro/nvidia.mdx
@ -192,6 +192,6 @@ The NVIDIA distribution is ideal for:
 ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/passthrough.mdx
+++ b/docs/docs/distributions/self_hosted_distro/passthrough.mdx
@ -57,6 +57,6 @@ The Passthrough distribution is ideal for:
 ## Related Guides
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Starting Llama Stack Server](../starting-llama-stack-server)** - How to run distributions
+- **[Starting Llama Stack Server](../starting_llama_stack_server)** - How to run distributions
--- a/docs/docs/distributions/self_hosted_distro/starter.mdx
+++ b/docs/docs/distributions/self_hosted_distro/starter.mdx
@ -231,6 +231,6 @@ The starter distribution is ideal for developers who want to experiment with dif
 ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/starting_llama_stack_server.mdx
+++ b/docs/docs/distributions/starting_llama_stack_server.mdx
@ -13,13 +13,13 @@ You can run a Llama Stack server in one of the following ways:
 This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (e.g. fireworks, together, groq, etc.)
-**See:** [Using Llama Stack as a Library](./importing-as-library)
+**See:** [Using Llama Stack as a Library](./importing_as_library)
 ## Container
 Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have.
-**See:** [Available Distributions](./list-of-distributions) for more details on selecting the right distribution.
+**See:** [Available Distributions](./list_of_distributions) for more details on selecting the right distribution.
 ## Kubernetes
@ -69,7 +69,7 @@ If you have built a container image and want to deploy it in a Kubernetes cluste
 ## Related Guides
- **[Available Distributions](./list-of-distributions)** - Choose the right distribution
+- **[Available Distributions](./list_of_distributions)** - Choose the right distribution
- **[Building Custom Distributions](./building-distro)** - Create your own distribution
+- **[Building Custom Distributions](./building_distro)** - Create your own distribution
 - **[Configuration Reference](./configuration)** - Understanding configuration options
- **[Customizing run.yaml](./customizing-run-yaml)** - Adapt configurations to your environment
+- **[Customizing run.yaml](./customizing_run_yaml)** - Adapt configurations to your environment
--- a/docs/docs/getting_started/detailed_tutorial.mdx
+++ b/docs/docs/getting_started/detailed_tutorial.mdx
--- a/docs/docs/getting_started/index.mdx
+++ b/docs/docs/getting_started/index.mdx
--- a/docs/docs/getting_started/libraries.mdx
+++ b/docs/docs/getting_started/libraries.mdx
--- a/docs/docs/references/client_cli.mdx
+++ b/docs/docs/references/client_cli.mdx
--- a/docs/docs/references/evals_reference.mdx
+++ b/docs/docs/references/evals_reference.mdx
@ -199,7 +199,7 @@ pprint(response)
 Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.
-In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building-applications/playground) for an interactive interface to upload datasets and run scorings.
+In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building_applications/playground) for an interactive interface to upload datasets and run scorings.
 ```python
 judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"
--- a/docs/docs/references/index.mdx
+++ b/docs/docs/references/index.mdx
@ -1,6 +1,6 @@
 # References
- [Python SDK Reference](python-sdk) for the Llama Stack Python SDK
+- [Python SDK Reference](python_sdk) for the Llama Stack Python SDK
- [Llama CLI](llama-cli) for building and running your Llama Stack server
+- [Llama CLI](llama_cli) for building and running your Llama Stack server
- [Llama Stack Client CLI](client-cli) for interacting with your Llama Stack server
+- [Llama Stack Client CLI](client_cli) for interacting with your Llama Stack server
- [Evaluations Reference](evals-reference) for running evaluations and benchmarks
+- [Evaluations Reference](evals_reference) for running evaluations and benchmarks
--- a/docs/docs/references/llama_cli.mdx
+++ b/docs/docs/references/llama_cli.mdx
@ -30,7 +30,7 @@ You have two ways to install Llama Stack:
 1. `download`: Supports downloading models from Meta or Hugging Face.
 2. `model`: Lists available models and their properties.
-3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building-distro) documentation.
+3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building_distro) documentation.
 ### Sample Usage
--- a/docs/docs/references/python_sdk.mdx
+++ b/docs/docs/references/python_sdk.mdx