docs naming update

2025-10-03 19:57:35 +00:00 · 2025-09-22 17:17:07 -07:00 · 2025-09-22 17:17:07 -07:00 · 9a8652ee30
commit 9a8652ee30
parent 584f3592ce
42 changed files with 377 additions and 81 deletions
--- a/docs/docs/advanced_apis/evaluation.mdx
+++ b/docs/docs/advanced_apis/evaluation.mdx
@ -15,7 +15,7 @@ The Evaluation API works with several related APIs to provide comprehensive eval
 - `/eval` + `/benchmarks` API - Generate outputs and perform scoring

 :::tip
-For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide.
+For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide.
 :::

 ## Meta Reference
@ -112,7 +112,7 @@ Llama Stack pre-registers several popular open-benchmarks for easy model evaluat

 ## Next Steps

- Check out the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide for detailed conceptual information
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive CLI and API usage
+- Check out the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide for detailed conceptual information
+- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
+- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive CLI and API usage
 - Explore the [Scoring](./scoring.mdx) documentation for available scoring functions
--- a/docs/docs/advanced_apis/post_training.mdx
+++ b/docs/docs/advanced_apis/post_training.mdx
@ -300,6 +300,6 @@ customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:=http://nemo.test}

 ## Next Steps

- Check out the [Building Applications - Fine-tuning](../building-applications/index.mdx) guide for application-level examples
+- Check out the [Building Applications - Fine-tuning](../building_applications/index.mdx) guide for application-level examples
 - See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation
- Review the [API Reference](../api-reference/post-training.mdx) for complete API documentation
+- Review the [API Reference](../api_reference/post_training.mdx) for complete API documentation
--- a/docs/docs/advanced_apis/scoring.mdx
+++ b/docs/docs/advanced_apis/scoring.mdx
@ -188,6 +188,6 @@ The Scoring API works closely with the [Evaluation](./evaluation.mdx) API to pro
 ## Next Steps

 - Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive scoring function usage
- Explore the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) for detailed conceptual information
+- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
+- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive scoring function usage
+- Explore the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) for detailed conceptual information
--- a/docs/docs/building_applications/agent.mdx
+++ b/docs/docs/building_applications/agent.mdx
@ -102,11 +102,11 @@ Each turn consists of multiple steps that represent the agent's thought process:

 ## Agent Execution Loop

-Refer to the [Agent Execution Loop](./agent-execution-loop) for more details on what happens within an agent turn.
+Refer to the [Agent Execution Loop](./agent_execution_loop) for more details on what happens within an agent turn.

 ## Related Resources

- **[Agent Execution Loop](./agent-execution-loop)** - Understanding the internal processing flow
+- **[Agent Execution Loop](./agent_execution_loop)** - Understanding the internal processing flow
 - **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced agents
 - **[Tools Integration](./tools)** - Extending agent capabilities with external tools
 - **[Safety Guardrails](./safety)** - Implementing responsible AI practices
--- a/docs/docs/building_applications/index.mdx
+++ b/docs/docs/building_applications/index.mdx
@ -21,8 +21,8 @@ Here are the key topics that will help you build effective AI applications:

 ### 🤖 **Agent Development**
 - **[Agent Framework](./agent)** - Understand the components and design patterns of the Llama Stack agent framework
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process information, make decisions, and execute actions
- **[Agents vs Responses API](./responses-vs-agents)** - Learn when to use each API for different use cases
+- **[Agent Execution Loop](./agent_execution_loop)** - How agents process information, make decisions, and execute actions
+- **[Agents vs Responses API](./responses_vs_agents)** - Learn when to use each API for different use cases

 ### 📚 **Knowledge Integration**
 - **[RAG (Retrieval-Augmented Generation)](./rag)** - Enhance your agents with external knowledge through retrieval mechanisms
--- a/docs/docs/building_applications/responses_vs_agents.mdx
+++ b/docs/docs/building_applications/responses_vs_agents.mdx
@ -215,7 +215,7 @@ Use this framework to choose the right API for your use case:
 ## Related Resources

 - **[Agents](./agent)** - Understanding the Agents API fundamentals
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process turns and steps
+- **[Agent Execution Loop](./agent_execution_loop)** - How agents process turns and steps
 - **[Tools Integration](./tools)** - Adding capabilities to both APIs
 - **[OpenAI Compatibility](/docs/providers/openai-compatibility)** - Using OpenAI-compatible endpoints
 - **[Safety Guardrails](./safety)** - Implementing safety measures in agents
--- a/docs/docs/building_applications/safety.mdx
+++ b/docs/docs/building_applications/safety.mdx
@ -389,7 +389,7 @@ client.shields.register(
 ## Related Resources

 - **[Agents](./agent)** - Integrating safety shields with intelligent agents
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding safety in the execution flow
+- **[Agent Execution Loop](./agent_execution_loop)** - Understanding safety in the execution flow
 - **[Evaluations](./evals)** - Evaluating safety shield effectiveness
 - **[Telemetry](./telemetry)** - Monitoring safety violations and metrics
 - **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details
--- a/docs/docs/building_applications/tools.mdx
+++ b/docs/docs/building_applications/tools.mdx
@ -335,6 +335,6 @@ response = agent.create_turn(

 - **[Agents](./agent)** - Building intelligent agents with tools
 - **[RAG (Retrieval Augmented Generation)](./rag)** - Using knowledge retrieval tools
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding tool execution flow
+- **[Agent Execution Loop](./agent_execution_loop)** - Understanding tool execution flow
 - **[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Comprehensive examples
 - **[Llama Stack Apps Examples](https://github.com/meta-llama/llama-stack-apps)** - Real-world tool implementations
--- a/docs/docs/concepts/api_providers.mdx
+++ b/docs/docs/concepts/api_providers.mdx
--- a/docs/docs/concepts/evaluation_concepts.mdx
+++ b/docs/docs/concepts/evaluation_concepts.mdx
@ -30,7 +30,7 @@ The list of open-benchmarks we currently support:
 - [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
 - [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.

-You can follow this [contributing guide](../references/evals-reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
+You can follow this [contributing guide](../references/evals_reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack

 ### Run evaluation on open-benchmarks via CLI

@ -67,5 +67,5 @@ evaluation results over there.
 ## What's Next?

 - Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
- Check out our [Building Applications - Evaluation](../building-applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
- Check out our [Evaluation Reference](../references/evals-reference.mdx) for more details on the APIs.
+- Check out our [Building Applications - Evaluation](../building_applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
+- Check out our [Evaluation Reference](../references/evals_reference.mdx) for more details on the APIs.
--- a/docs/docs/concepts/index.mdx
+++ b/docs/docs/concepts/index.mdx
@ -13,6 +13,6 @@ This section covers the key concepts you need to understand to work effectively

 - **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits
 - **[APIs](./apis)** - Available REST APIs and planned capabilities
- **[API Providers](./api-providers)** - Remote vs inline provider implementations
+- **[API Providers](./api_providers)** - Remote vs inline provider implementations
 - **[Distributions](./distributions)** - Pre-packaged provider configurations
 - **[Resources](./resources)** - Resource federation and registration
--- a/docs/docs/contributing/index.mdx
+++ b/docs/docs/contributing/index.mdx
@ -133,8 +133,8 @@ Keep PRs small and focused. Split large changes into logically grouped, smaller

 Learn how to extend Llama Stack with new capabilities:

- **[Adding a New API Provider](./new-api-provider)** - Add new API providers to the Stack
- **[Adding a Vector Database](./new-vector-database)** - Add new vector databases
+- **[Adding a New API Provider](./new_api_provider)** - Add new API providers to the Stack
+- **[Adding a Vector Database](./new_vector_database)** - Add new vector databases
 - **[External Providers](/docs/providers/external)** - Add external providers to the Stack

 ## Testing
@ -304,12 +304,12 @@ By contributing to Llama Stack, you agree that your contributions will be licens

 ## Advanced Topics

- **[Testing Record-Replay System](./testing-record-replay)** - Deep dive into testing internals
+- **[Testing Record-Replay System](./testing_record_replay)** - Deep dive into testing internals

 ## Related Resources

- **[Adding API Providers](./new-api-provider)** - Extend Llama Stack with new providers
- **[Vector Database Integration](./new-vector-database)** - Add vector database support
+- **[Adding API Providers](./new_api_provider)** - Extend Llama Stack with new providers
+- **[Vector Database Integration](./new_vector_database)** - Add vector database support
 - **[External Providers](/docs/providers/external)** - External provider development
 - **[GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)** - Community discussion
 - **[Discord](https://discord.gg/llama-stack)** - Real-time community chat
--- a/docs/docs/contributing/new_api_provider.mdx
+++ b/docs/docs/contributing/new_api_provider.mdx
@ -279,5 +279,5 @@ class YourProvider:

 - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
 - **[External Providers](/docs/providers/external)** - Alternative implementation approach
- **[Vector Database Guide](./new-vector-database)** - Specialized provider implementation
- **[Testing Record-Replay](./testing-record-replay)** - Advanced testing techniques
+- **[Vector Database Guide](./new_vector_database)** - Specialized provider implementation
+- **[Testing Record-Replay](./testing_record_replay)** - Advanced testing techniques
--- a/docs/docs/contributing/new_vector_database.mdx
+++ b/docs/docs/contributing/new_vector_database.mdx
@ -489,5 +489,5 @@ async def add_chunks(self, chunks: List[Chunk]) -> List[str]:

 - **[Vector IO Providers](/docs/providers/vector_io)** - Existing provider implementations
 - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
- **[New API Provider Guide](./new-api-provider)** - General provider development
- **[Testing Guide](./testing-record-replay)** - Advanced testing techniques
+- **[New API Provider Guide](./new_api_provider)** - General provider development
+- **[Testing Guide](./testing_record_replay)** - Advanced testing techniques
--- a/docs/docs/contributing/testing_record_replay.mdx
+++ b/docs/docs/contributing/testing_record_replay.mdx
--- a/docs/docs/distributions/building_distro.mdx
+++ b/docs/docs/distributions/building_distro.mdx
@ -167,7 +167,7 @@ You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and
 ```

 :::tip
-The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing-run-yaml).
+The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
 :::

 </TabItem>
--- a/docs/docs/distributions/configuration.mdx
+++ b/docs/docs/distributions/configuration.mdx
@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
 The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:

 :::note
-The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing-run-yaml).
+The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
 :::

 <details>
--- a/docs/docs/distributions/customizing_run_yaml.mdx
+++ b/docs/docs/distributions/customizing_run_yaml.mdx
@ -51,5 +51,5 @@ The goal is to take the generated template and adapt it to your specific infrast
 ## Related Guides

 - **[Configuration Reference](./configuration)** - Detailed configuration file format and options
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run with your custom configuration
- **[Building Custom Distributions](./building-distro)** - Create distributions with your preferred providers
+- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run with your custom configuration
+- **[Building Custom Distributions](./building_distro)** - Create distributions with your preferred providers
--- a/docs/docs/distributions/importing_as_library.mdx
+++ b/docs/docs/distributions/importing_as_library.mdx
@ -36,7 +36,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
 response = client.models.list()
 ```

-If you've created a [custom distribution](./building-distro), you can also use the run.yaml configuration file directly:
+If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:

 ```python
 client = LlamaStackAsLibraryClient(config_path)
@ -60,6 +60,6 @@ Library mode is ideal when:

 ## Related Guides

- **[Building Custom Distributions](./building-distro)** - Create your own distribution for library use
+- **[Building Custom Distributions](./building_distro)** - Create your own distribution for library use
 - **[Configuration Reference](./configuration)** - Understanding the configuration format
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - Alternative server-based deployment
+- **[Starting Llama Stack Server](./starting_llama_stack_server)** - Alternative server-based deployment
--- a/docs/docs/distributions/index.mdx
+++ b/docs/docs/distributions/index.mdx
@ -13,9 +13,9 @@ This section provides an overview of the distributions available in Llama Stack.

 ## Distribution Guides

- **[Available Distributions](./list-of-distributions)** - Complete list and comparison of all distributions
- **[Building Custom Distributions](./building-distro)** - Create your own distribution from scratch
- **[Customizing Configuration](./customizing-run-yaml)** - Customize run.yaml for your needs
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run distributions
- **[Importing as Library](./importing-as-library)** - Use distributions in your code
+- **[Available Distributions](./list_of_distributions)** - Complete list and comparison of all distributions
+- **[Building Custom Distributions](./building_distro)** - Create your own distribution from scratch
+- **[Customizing Configuration](./customizing_run_yaml)** - Customize run.yaml for your needs
+- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run distributions
+- **[Importing as Library](./importing_as_library)** - Use distributions in your code
 - **[Configuration Reference](./configuration)** - Configuration file format details
--- a/docs/docs/distributions/list_of_distributions.mdx
+++ b/docs/docs/distributions/list_of_distributions.mdx
@ -34,7 +34,7 @@ Llama Stack provides several pre-configured distributions to help you get starte
 docker pull llama-stack/distribution-starter
 ```

-**Guides:** [Starter Distribution Guide](./self-hosted-distro/starter)
+**Guides:** [Starter Distribution Guide](./self_hosted_distro/starter)

 ### 🖥️ Self-Hosted with GPU

@ -47,14 +47,14 @@ docker pull llama-stack/distribution-starter
 docker pull llama-stack/distribution-meta-reference-gpu
 ```

-**Guides:** [Meta Reference GPU Guide](./self-hosted-distro/meta-reference-gpu)
+**Guides:** [Meta Reference GPU Guide](./self_hosted_distro/meta_reference_gpu)

 ### 🖥️ Self-Hosted with NVIDA NeMo Microservices

 **Use `nvidia` if you:**
 - Want to use Llama Stack with NVIDIA NeMo Microservices

-**Guides:** [NVIDIA Distribution Guide](./self-hosted-distro/nvidia)
+**Guides:** [NVIDIA Distribution Guide](./self_hosted_distro/nvidia)

 ### ☁️ Managed Hosting

@ -65,7 +65,7 @@ docker pull llama-stack/distribution-meta-reference-gpu

 **Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)

-**Guides:** [Remote-Hosted Endpoints](./remote-hosted-distro/)
+**Guides:** [Remote-Hosted Endpoints](./remote_hosted_distro/)

 ### 📱 Mobile Development

@ -74,8 +74,8 @@ docker pull llama-stack/distribution-meta-reference-gpu
 - Need on-device inference capabilities
 - Want offline functionality

- [iOS SDK](./ondevice-distro/ios-sdk)
- [Android SDK](./ondevice-distro/android-sdk)
+- [iOS SDK](./ondevice_distro/ios_sdk)
+- [Android SDK](./ondevice_distro/android_sdk)

 ### 🔧 Custom Solutions

@ -84,23 +84,23 @@ docker pull llama-stack/distribution-meta-reference-gpu
 - You need custom configurations
 - You want to optimize for your specific use case

-**Guides:** [Building Custom Distributions](./building-distro)
+**Guides:** [Building Custom Distributions](./building_distro)

 ## Detailed Documentation

 ### Self-Hosted Distributions

- **[Starter Distribution](./self-hosted-distro/starter)** - General purpose template
- **[Meta Reference GPU](./self-hosted-distro/meta-reference-gpu)** - High-performance GPU inference
+- **[Starter Distribution](./self_hosted_distro/starter)** - General purpose template
+- **[Meta Reference GPU](./self_hosted_distro/meta_reference_gpu)** - High-performance GPU inference

 ### Remote-Hosted Solutions

- **[Remote-Hosted Overview](./remote-hosted-distro/)** - Managed hosting options
+- **[Remote-Hosted Overview](./remote_hosted_distro/)** - Managed hosting options

 ### Mobile SDKs

- **[iOS SDK](./ondevice-distro/ios-sdk)** - Native iOS development
- **[Android SDK](./ondevice-distro/android-sdk)** - Native Android development
+- **[iOS SDK](./ondevice_distro/ios_sdk)** - Native iOS development
+- **[Android SDK](./ondevice_distro/android_sdk)** - Native Android development

 ## Decision Flow

--- a/docs/docs/distributions/ondevice_distro/android_sdk.mdx
+++ b/docs/docs/distributions/ondevice_distro/android_sdk.mdx
@ -306,4 +306,4 @@ The API interface is generated using the OpenAPI standard with [Stainless](https
 - **[llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin)** - Official Kotlin SDK repository
 - **[Android Demo App](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app)** - Complete example app
 - **[ExecuTorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
- **[iOS SDK](./ios-sdk)** - iOS development guide
+- **[iOS SDK](./ios_sdk)** - iOS development guide
--- a/docs/docs/distributions/ondevice_distro/ios_sdk.mdx
+++ b/docs/docs/distributions/ondevice_distro/ios_sdk.mdx
@ -176,4 +176,4 @@ The iOS SDK is ideal for:
 - **[llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)** - Official Swift SDK repository
 - **[iOS Calendar Assistant](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant)** - Complete example app
 - **[executorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
- **[Android SDK](./android-sdk)** - Android development guide
+- **[Android SDK](./android_sdk)** - Android development guide
--- a/docs/docs/distributions/remote_hosted_distro/index.mdx
+++ b/docs/docs/distributions/remote_hosted_distro/index.mdx
@ -48,6 +48,6 @@ $ llama-stack-client models list

 ## Related Guides

- **[Available Distributions](../list-of-distributions)** - Compare with other distribution types
+- **[Available Distributions](../list_of_distributions)** - Compare with other distribution types
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Using as Library](../importing-as-library)** - Alternative deployment approach
+- **[Using as Library](../importing_as_library)** - Alternative deployment approach
--- a/docs/docs/distributions/remote_hosted_distro/watsonx.mdx
+++ b/docs/docs/distributions/remote_hosted_distro/watsonx.mdx
@ -105,6 +105,6 @@ The watsonx distribution is ideal for:
 ## Related Guides

 - **[Remote-Hosted Overview](./index)** - Overview of remote-hosted distributions
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/dell.mdx
+++ b/docs/docs/distributions/self_hosted_distro/dell.mdx
@ -217,6 +217,6 @@ The Dell distribution is ideal for:

 ## Related Guides

- **[Dell-TGI Distribution](./dell-tgi)** - Dell's TGI-specific distribution
+- **[Dell-TGI Distribution](./dell_tgi)** - Dell's TGI-specific distribution
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/dell_tgi.mdx
+++ b/docs/docs/distributions/self_hosted_distro/dell_tgi.mdx
@ -98,4 +98,4 @@ The Dell-TGI distribution is ideal for:

 - **[Dell Distribution](./dell)** - Dell's standard distribution
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
@ -0,0 +1,125 @@
+---
+orphan: true
+---
+<!-- This file was auto-generated by distro_codegen.py, please edit source -->
+# Meta Reference GPU Distribution
+
+```{toctree}
+:maxdepth: 2
+:hidden:
+
+self
+```
+
+The `llamastack/distribution-meta-reference-gpu` distribution consists of the following provider configurations:
+
+| API | Provider(s) |
+|-----|-------------|
+| agents | `inline::meta-reference` |
+| datasetio | `remote::huggingface`, `inline::localfs` |
+| eval | `inline::meta-reference` |
+| inference | `inline::meta-reference` |
+| safety | `inline::llama-guard` |
+| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
+| telemetry | `inline::meta-reference` |
+| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
+| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
+
+
+Note that you need access to nvidia GPUs to run this distribution. This distribution is not compatible with CPU-only machines or machines with AMD GPUs.
+
+### Environment Variables
+
+The following environment variables can be configured:
+
+- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
+- `INFERENCE_MODEL`: Inference model loaded into the Meta Reference server (default: `meta-llama/Llama-3.2-3B-Instruct`)
+- `INFERENCE_CHECKPOINT_DIR`: Directory containing the Meta Reference model checkpoint (default: `null`)
+- `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`)
+- `SAFETY_CHECKPOINT_DIR`: Directory containing the Llama-Guard model checkpoint (default: `null`)
+
+
+## Prerequisite: Downloading Models
+
+Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+
+```
+$ llama model list --downloaded
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
+┃ Model                                   ┃ Size     ┃ Modified Time       ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
+│ Llama3.2-1B-Instruct:int4-qlora-eo8     │ 1.53 GB  │ 2025-02-26 11:22:28 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-1B                             │ 2.31 GB  │ 2025-02-18 21:48:52 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Prompt-Guard-86M                        │ 0.02 GB  │ 2025-02-26 11:29:28 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB  │ 2025-02-26 11:37:41 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-3B                             │ 5.99 GB  │ 2025-02-18 21:51:26 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.1-8B                             │ 14.97 GB │ 2025-02-16 10:36:37 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB  │ 2025-02-26 11:35:02 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama-Guard-3-1B                        │ 2.80 GB  │ 2025-02-26 11:20:46 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama-Guard-3-1B:int4                   │ 0.43 GB  │ 2025-02-26 11:33:33 │
+└─────────────────────────────────────────┴──────────┴─────────────────────┘
+```
+
+## Running the Distribution
+
+You can do this via venv or Docker which has a pre-built image.
+
+### Via Docker
+
+This method allows you to get started quickly without having to build the distribution code.
+
+```bash
+LLAMA_STACK_PORT=8321
+docker run \
+  -it \
+  --pull always \
+  --gpu all \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ~/.llama:/root/.llama \
+  llamastack/distribution-meta-reference-gpu \
+  --port $LLAMA_STACK_PORT \
+  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
+```
+
+If you are using Llama Stack Safety / Shield APIs, use:
+
+```bash
+docker run \
+  -it \
+  --pull always \
+  --gpu all \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ~/.llama:/root/.llama \
+  llamastack/distribution-meta-reference-gpu \
+  --port $LLAMA_STACK_PORT \
+  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
+  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
+```
+
+### Via venv
+
+Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
+
+```bash
+llama stack build --distro meta-reference-gpu --image-type venv
+llama stack run distributions/meta-reference-gpu/run.yaml \
+  --port 8321 \
+  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
+```
+
+If you are using Llama Stack Safety / Shield APIs, use:
+
+```bash
+llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
+  --port 8321 \
+  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
+  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
+```
--- a/docs/docs/distributions/self_hosted_distro/meta_reference_gpu.mdx
+++ b/docs/docs/distributions/self_hosted_distro/meta_reference_gpu.mdx
@ -149,6 +149,6 @@ The Meta Reference GPU distribution is ideal for:

 ## Related Guides

- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/nvidia.md
+++ b/docs/docs/distributions/self_hosted_distro/nvidia.md
@ -0,0 +1,171 @@
+---
+orphan: true
+---
+<!-- This file was auto-generated by distro_codegen.py, please edit source -->
+# NVIDIA Distribution
+
+The `llamastack/distribution-nvidia` distribution consists of the following provider configurations.
+
+| API | Provider(s) |
+|-----|-------------|
+| agents | `inline::meta-reference` |
+| datasetio | `inline::localfs`, `remote::nvidia` |
+| eval | `remote::nvidia` |
+| files | `inline::localfs` |
+| inference | `remote::nvidia` |
+| post_training | `remote::nvidia` |
+| safety | `remote::nvidia` |
+| scoring | `inline::basic` |
+| telemetry | `inline::meta-reference` |
+| tool_runtime | `inline::rag-runtime` |
+| vector_io | `inline::faiss` |
+
+
+### Environment Variables
+
+The following environment variables can be configured:
+
+- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``)
+- `NVIDIA_APPEND_API_VERSION`: Whether to append the API version to the base_url (default: `True`)
+- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`)
+- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`)
+- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`)
+- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`)
+- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`)
+- `NVIDIA_GUARDRAILS_CONFIG_ID`: NVIDIA Guardrail Configuration ID (default: `self-check`)
+- `NVIDIA_EVALUATOR_URL`: URL for the NeMo Evaluator Service (default: `http://0.0.0.0:7331`)
+- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`)
+- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`)
+
+### Models
+
+The following models are available by default:
+
+- `meta/llama3-8b-instruct `
+- `meta/llama3-70b-instruct `
+- `meta/llama-3.1-8b-instruct `
+- `meta/llama-3.1-70b-instruct `
+- `meta/llama-3.1-405b-instruct `
+- `meta/llama-3.2-1b-instruct `
+- `meta/llama-3.2-3b-instruct `
+- `meta/llama-3.2-11b-vision-instruct `
+- `meta/llama-3.2-90b-vision-instruct `
+- `meta/llama-3.3-70b-instruct `
+- `nvidia/vila `
+- `nvidia/llama-3.2-nv-embedqa-1b-v2 `
+- `nvidia/nv-embedqa-e5-v5 `
+- `nvidia/nv-embedqa-mistral-7b-v2 `
+- `snowflake/arctic-embed-l `
+
+
+## Prerequisites
+### NVIDIA API Keys
+
+Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). Use this key for the `NVIDIA_API_KEY` environment variable.
+
+### Deploy NeMo Microservices Platform
+The NVIDIA NeMo microservices platform supports end-to-end microservice deployment of a complete AI flywheel on your Kubernetes cluster through the NeMo Microservices Helm Chart. Please reference the [NVIDIA NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for platform prerequisites and instructions to install and deploy the platform.
+
+## Supported Services
+Each Llama Stack API corresponds to a specific NeMo microservice. The core microservices (Customizer, Evaluator, Guardrails) are exposed by the same endpoint. The platform components (Data Store) are each exposed by separate endpoints.
+
+### Inference: NVIDIA NIM
+NVIDIA NIM is used for running inference with registered models. There are two ways to access NVIDIA NIMs:
+  1. Hosted (default): Preview APIs hosted at https://integrate.api.nvidia.com (Requires an API key)
+  2. Self-hosted: NVIDIA NIMs that run on your own infrastructure.
+
+The deployed platform includes the NIM Proxy microservice, which is the service that provides to access your NIMs (for example, to run inference on a model). Set the `NVIDIA_BASE_URL` environment variable to use your NVIDIA NIM Proxy deployment.
+
+### Datasetio API: NeMo Data Store
+The NeMo Data Store microservice serves as the default file storage solution for the NeMo microservices platform. It exposts APIs compatible with the Hugging Face Hub client (`HfApi`), so you can use the client to interact with Data Store. The `NVIDIA_DATASETS_URL` environment variable should point to your NeMo Data Store endpoint.
+
+See the {repopath}`NVIDIA Datasetio docs::llama_stack/providers/remote/datasetio/nvidia/README.md` for supported features and example usage.
+
+### Eval API: NeMo Evaluator
+The NeMo Evaluator microservice supports evaluation of LLMs. Launching an Evaluation job with NeMo Evaluator requires an Evaluation Config (an object that contains metadata needed by the job). A Llama Stack Benchmark maps to an Evaluation Config, so registering a Benchmark creates an Evaluation Config in NeMo Evaluator. The `NVIDIA_EVALUATOR_URL` environment variable should point to your NeMo Microservices endpoint.
+
+See the {repopath}`NVIDIA Eval docs::llama_stack/providers/remote/eval/nvidia/README.md` for supported features and example usage.
+
+### Post-Training API: NeMo Customizer
+The NeMo Customizer microservice supports fine-tuning models. You can reference {repopath}`this list of supported models::llama_stack/providers/remote/post_training/nvidia/models.py` that can be fine-tuned using Llama Stack. The `NVIDIA_CUSTOMIZER_URL` environment variable should point to your NeMo Microservices endpoint.
+
+See the {repopath}`NVIDIA Post-Training docs::llama_stack/providers/remote/post_training/nvidia/README.md` for supported features and example usage.
+
+### Safety API: NeMo Guardrails
+The NeMo Guardrails microservice sits between your application and the LLM, and adds checks and content moderation to a model. The `GUARDRAILS_SERVICE_URL` environment variable should point to your NeMo Microservices endpoint.
+
+See the {repopath}`NVIDIA Safety docs::llama_stack/providers/remote/safety/nvidia/README.md` for supported features and example usage.
+
+## Deploying models
+In order to use a registered model with the Llama Stack APIs, ensure the corresponding NIM is deployed to your environment. For example, you can use the NIM Proxy microservice to deploy `meta/llama-3.2-1b-instruct`.
+
+Note: For improved inference speeds, we need to use NIM with `fast_outlines` guided decoding system (specified in the request body). This is the default if you deployed the platform with the NeMo Microservices Helm Chart.
+```sh
+# URL to NeMo NIM Proxy service
+export NEMO_URL="http://nemo.test"
+
+curl --location "$NEMO_URL/v1/deployment/model-deployments" \
+   -H 'accept: application/json' \
+   -H 'Content-Type: application/json' \
+   -d '{
+      "name": "llama-3.2-1b-instruct",
+      "namespace": "meta",
+      "config": {
+         "model": "meta/llama-3.2-1b-instruct",
+         "nim_deployment": {
+            "image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct",
+            "image_tag": "1.8.3",
+            "pvc_size": "25Gi",
+            "gpu": 1,
+            "additional_envs": {
+               "NIM_GUIDED_DECODING_BACKEND": "fast_outlines"
+            }
+         }
+      }
+   }'
+```
+This NIM deployment should take approximately 10 minutes to go live. [See the docs](https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/deploy-nims.html) for more information on how to deploy a NIM and verify it's available for inference.
+
+You can also remove a deployed NIM to free up GPU resources, if needed.
+```sh
+export NEMO_URL="http://nemo.test"
+
+curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct"
+```
+
+## Running Llama Stack with NVIDIA
+
+You can do this via venv (build code), or Docker which has a pre-built image.
+
+### Via Docker
+
+This method allows you to get started quickly without having to build the distribution code.
+
+```bash
+LLAMA_STACK_PORT=8321
+docker run \
+  -it \
+  --pull always \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ./run.yaml:/root/my-run.yaml \
+  llamastack/distribution-nvidia \
+  --config /root/my-run.yaml \
+  --port $LLAMA_STACK_PORT \
+  --env NVIDIA_API_KEY=$NVIDIA_API_KEY
+```
+
+### Via venv
+
+If you've set up your local development environment, you can also build the image using your local virtual environment.
+
+```bash
+INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
+llama stack build --distro nvidia --image-type venv
+llama stack run ./run.yaml \
+  --port 8321 \
+  --env NVIDIA_API_KEY=$NVIDIA_API_KEY \
+  --env INFERENCE_MODEL=$INFERENCE_MODEL
+```
+
+## Example Notebooks
+For examples of how to use the NVIDIA Distribution to run inference, fine-tune, evaluate, and run safety checks on your LLMs, you can reference the example notebooks in {repopath}`docs/notebooks/nvidia`.
--- a/docs/docs/distributions/self_hosted_distro/nvidia.mdx
+++ b/docs/docs/distributions/self_hosted_distro/nvidia.mdx
@ -192,6 +192,6 @@ The NVIDIA distribution is ideal for:

 ## Related Guides

- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/self_hosted_distro/passthrough.mdx
+++ b/docs/docs/distributions/self_hosted_distro/passthrough.mdx
@ -57,6 +57,6 @@ The Passthrough distribution is ideal for:

 ## Related Guides

- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Starting Llama Stack Server](../starting-llama-stack-server)** - How to run distributions
+- **[Starting Llama Stack Server](../starting_llama_stack_server)** - How to run distributions
--- a/docs/docs/distributions/self_hosted_distro/starter.mdx
+++ b/docs/docs/distributions/self_hosted_distro/starter.mdx
@ -231,6 +231,6 @@ The starter distribution is ideal for developers who want to experiment with dif

 ## Related Guides

- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
+- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
 - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
+- **[Building Custom Distributions](../building_distro)** - Create your own distribution
--- a/docs/docs/distributions/starting_llama_stack_server.mdx
+++ b/docs/docs/distributions/starting_llama_stack_server.mdx
@ -13,13 +13,13 @@ You can run a Llama Stack server in one of the following ways:

 This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (e.g. fireworks, together, groq, etc.)

-**See:** [Using Llama Stack as a Library](./importing-as-library)
+**See:** [Using Llama Stack as a Library](./importing_as_library)

 ## Container

 Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have.

-**See:** [Available Distributions](./list-of-distributions) for more details on selecting the right distribution.
+**See:** [Available Distributions](./list_of_distributions) for more details on selecting the right distribution.

 ## Kubernetes

@ -69,7 +69,7 @@ If you have built a container image and want to deploy it in a Kubernetes cluste

 ## Related Guides

- **[Available Distributions](./list-of-distributions)** - Choose the right distribution
- **[Building Custom Distributions](./building-distro)** - Create your own distribution
+- **[Available Distributions](./list_of_distributions)** - Choose the right distribution
+- **[Building Custom Distributions](./building_distro)** - Create your own distribution
 - **[Configuration Reference](./configuration)** - Understanding configuration options
- **[Customizing run.yaml](./customizing-run-yaml)** - Adapt configurations to your environment
+- **[Customizing run.yaml](./customizing_run_yaml)** - Adapt configurations to your environment
--- a/docs/docs/getting_started/detailed_tutorial.mdx
+++ b/docs/docs/getting_started/detailed_tutorial.mdx
--- a/docs/docs/getting_started/index.mdx
+++ b/docs/docs/getting_started/index.mdx
--- a/docs/docs/getting_started/libraries.mdx
+++ b/docs/docs/getting_started/libraries.mdx
--- a/docs/docs/references/client_cli.mdx
+++ b/docs/docs/references/client_cli.mdx
--- a/docs/docs/references/evals_reference.mdx
+++ b/docs/docs/references/evals_reference.mdx
@ -199,7 +199,7 @@ pprint(response)

 Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.

-In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building-applications/playground) for an interactive interface to upload datasets and run scorings.
+In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building_applications/playground) for an interactive interface to upload datasets and run scorings.

 ```python
 judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"
--- a/docs/docs/references/index.mdx
+++ b/docs/docs/references/index.mdx
@ -1,6 +1,6 @@
 # References

- [Python SDK Reference](python-sdk) for the Llama Stack Python SDK
- [Llama CLI](llama-cli) for building and running your Llama Stack server
- [Llama Stack Client CLI](client-cli) for interacting with your Llama Stack server
- [Evaluations Reference](evals-reference) for running evaluations and benchmarks
+- [Python SDK Reference](python_sdk) for the Llama Stack Python SDK
+- [Llama CLI](llama_cli) for building and running your Llama Stack server
+- [Llama Stack Client CLI](client_cli) for interacting with your Llama Stack server
+- [Evaluations Reference](evals_reference) for running evaluations and benchmarks
--- a/docs/docs/references/llama_cli.mdx
+++ b/docs/docs/references/llama_cli.mdx
@ -30,7 +30,7 @@ You have two ways to install Llama Stack:

 1. `download`: Supports downloading models from Meta or Hugging Face.
 2. `model`: Lists available models and their properties.
-3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building-distro) documentation.
+3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building_distro) documentation.

 ### Sample Usage

--- a/docs/docs/references/python_sdk.mdx
+++ b/docs/docs/references/python_sdk.mdx