diff --git a/docs/docs/advanced_apis/evaluation.mdx b/docs/docs/advanced_apis/evaluation.mdx index c199fc755..f4037a380 100644 --- a/docs/docs/advanced_apis/evaluation.mdx +++ b/docs/docs/advanced_apis/evaluation.mdx @@ -15,7 +15,7 @@ The Evaluation API works with several related APIs to provide comprehensive eval - `/eval` + `/benchmarks` API - Generate outputs and perform scoring :::tip -For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide. +For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide. ::: ## Meta Reference @@ -112,7 +112,7 @@ Llama Stack pre-registers several popular open-benchmarks for easy model evaluat ## Next Steps -- Check out the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide for detailed conceptual information -- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples -- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive CLI and API usage +- Check out the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide for detailed conceptual information +- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples +- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive CLI and API usage - Explore the [Scoring](./scoring.mdx) documentation for available scoring functions diff --git a/docs/docs/advanced_apis/post-training.mdx b/docs/docs/advanced_apis/post_training.mdx similarity index 98% rename from docs/docs/advanced_apis/post-training.mdx rename to docs/docs/advanced_apis/post_training.mdx index 1440f090a..43359d741 100644 --- a/docs/docs/advanced_apis/post-training.mdx +++ b/docs/docs/advanced_apis/post_training.mdx @@ -300,6 +300,6 @@ customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:=http://nemo.test} ## Next Steps -- Check out the [Building Applications - Fine-tuning](../building-applications/index.mdx) guide for application-level examples +- Check out the [Building Applications - Fine-tuning](../building_applications/index.mdx) guide for application-level examples - See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation -- Review the [API Reference](../api-reference/post-training.mdx) for complete API documentation +- Review the [API Reference](../api_reference/post_training.mdx) for complete API documentation diff --git a/docs/docs/advanced_apis/scoring.mdx b/docs/docs/advanced_apis/scoring.mdx index 44f8cf163..15c09fa8a 100644 --- a/docs/docs/advanced_apis/scoring.mdx +++ b/docs/docs/advanced_apis/scoring.mdx @@ -188,6 +188,6 @@ The Scoring API works closely with the [Evaluation](./evaluation.mdx) API to pro ## Next Steps - Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations -- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples -- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive scoring function usage -- Explore the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) for detailed conceptual information +- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples +- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive scoring function usage +- Explore the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) for detailed conceptual information diff --git a/docs/docs/building_applications/agent.mdx b/docs/docs/building_applications/agent.mdx index a623eaa7e..33e98ea8d 100644 --- a/docs/docs/building_applications/agent.mdx +++ b/docs/docs/building_applications/agent.mdx @@ -102,11 +102,11 @@ Each turn consists of multiple steps that represent the agent's thought process: ## Agent Execution Loop -Refer to the [Agent Execution Loop](./agent-execution-loop) for more details on what happens within an agent turn. +Refer to the [Agent Execution Loop](./agent_execution_loop) for more details on what happens within an agent turn. ## Related Resources -- **[Agent Execution Loop](./agent-execution-loop)** - Understanding the internal processing flow +- **[Agent Execution Loop](./agent_execution_loop)** - Understanding the internal processing flow - **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced agents - **[Tools Integration](./tools)** - Extending agent capabilities with external tools - **[Safety Guardrails](./safety)** - Implementing responsible AI practices diff --git a/docs/docs/building_applications/index.mdx b/docs/docs/building_applications/index.mdx index f9bbff915..0b9cb20fb 100644 --- a/docs/docs/building_applications/index.mdx +++ b/docs/docs/building_applications/index.mdx @@ -21,8 +21,8 @@ Here are the key topics that will help you build effective AI applications: ### ๐Ÿค– **Agent Development** - **[Agent Framework](./agent)** - Understand the components and design patterns of the Llama Stack agent framework -- **[Agent Execution Loop](./agent-execution-loop)** - How agents process information, make decisions, and execute actions -- **[Agents vs Responses API](./responses-vs-agents)** - Learn when to use each API for different use cases +- **[Agent Execution Loop](./agent_execution_loop)** - How agents process information, make decisions, and execute actions +- **[Agents vs Responses API](./responses_vs_agents)** - Learn when to use each API for different use cases ### ๐Ÿ“š **Knowledge Integration** - **[RAG (Retrieval-Augmented Generation)](./rag)** - Enhance your agents with external knowledge through retrieval mechanisms diff --git a/docs/docs/building_applications/responses_vs_agents.mdx b/docs/docs/building_applications/responses_vs_agents.mdx index 638ebf317..c04cd2ce8 100644 --- a/docs/docs/building_applications/responses_vs_agents.mdx +++ b/docs/docs/building_applications/responses_vs_agents.mdx @@ -215,7 +215,7 @@ Use this framework to choose the right API for your use case: ## Related Resources - **[Agents](./agent)** - Understanding the Agents API fundamentals -- **[Agent Execution Loop](./agent-execution-loop)** - How agents process turns and steps +- **[Agent Execution Loop](./agent_execution_loop)** - How agents process turns and steps - **[Tools Integration](./tools)** - Adding capabilities to both APIs - **[OpenAI Compatibility](/docs/providers/openai-compatibility)** - Using OpenAI-compatible endpoints - **[Safety Guardrails](./safety)** - Implementing safety measures in agents diff --git a/docs/docs/building_applications/safety.mdx b/docs/docs/building_applications/safety.mdx index e4f9dee27..16fe5f6f8 100644 --- a/docs/docs/building_applications/safety.mdx +++ b/docs/docs/building_applications/safety.mdx @@ -389,7 +389,7 @@ client.shields.register( ## Related Resources - **[Agents](./agent)** - Integrating safety shields with intelligent agents -- **[Agent Execution Loop](./agent-execution-loop)** - Understanding safety in the execution flow +- **[Agent Execution Loop](./agent_execution_loop)** - Understanding safety in the execution flow - **[Evaluations](./evals)** - Evaluating safety shield effectiveness - **[Telemetry](./telemetry)** - Monitoring safety violations and metrics - **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details diff --git a/docs/docs/building_applications/tools.mdx b/docs/docs/building_applications/tools.mdx index 8f7097e98..be60a1639 100644 --- a/docs/docs/building_applications/tools.mdx +++ b/docs/docs/building_applications/tools.mdx @@ -335,6 +335,6 @@ response = agent.create_turn( - **[Agents](./agent)** - Building intelligent agents with tools - **[RAG (Retrieval Augmented Generation)](./rag)** - Using knowledge retrieval tools -- **[Agent Execution Loop](./agent-execution-loop)** - Understanding tool execution flow +- **[Agent Execution Loop](./agent_execution_loop)** - Understanding tool execution flow - **[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Comprehensive examples - **[Llama Stack Apps Examples](https://github.com/meta-llama/llama-stack-apps)** - Real-world tool implementations diff --git a/docs/docs/concepts/api-providers.mdx b/docs/docs/concepts/api_providers.mdx similarity index 100% rename from docs/docs/concepts/api-providers.mdx rename to docs/docs/concepts/api_providers.mdx diff --git a/docs/docs/concepts/evaluation-concepts.mdx b/docs/docs/concepts/evaluation_concepts.mdx similarity index 95% rename from docs/docs/concepts/evaluation-concepts.mdx rename to docs/docs/concepts/evaluation_concepts.mdx index 1cf2e2361..f091815e0 100644 --- a/docs/docs/concepts/evaluation-concepts.mdx +++ b/docs/docs/concepts/evaluation_concepts.mdx @@ -30,7 +30,7 @@ The list of open-benchmarks we currently support: - [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions. - [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models. -You can follow this [contributing guide](../references/evals-reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack +You can follow this [contributing guide](../references/evals_reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack ### Run evaluation on open-benchmarks via CLI @@ -67,5 +67,5 @@ evaluation results over there. ## What's Next? - Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP). -- Check out our [Building Applications - Evaluation](../building-applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications. -- Check out our [Evaluation Reference](../references/evals-reference.mdx) for more details on the APIs. +- Check out our [Building Applications - Evaluation](../building_applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications. +- Check out our [Evaluation Reference](../references/evals_reference.mdx) for more details on the APIs. diff --git a/docs/docs/concepts/index.mdx b/docs/docs/concepts/index.mdx index 100ff717d..1779cef41 100644 --- a/docs/docs/concepts/index.mdx +++ b/docs/docs/concepts/index.mdx @@ -13,6 +13,6 @@ This section covers the key concepts you need to understand to work effectively - **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits - **[APIs](./apis)** - Available REST APIs and planned capabilities -- **[API Providers](./api-providers)** - Remote vs inline provider implementations +- **[API Providers](./api_providers)** - Remote vs inline provider implementations - **[Distributions](./distributions)** - Pre-packaged provider configurations - **[Resources](./resources)** - Resource federation and registration diff --git a/docs/docs/contributing/index.mdx b/docs/docs/contributing/index.mdx index 464e02b3f..52d940c4a 100644 --- a/docs/docs/contributing/index.mdx +++ b/docs/docs/contributing/index.mdx @@ -133,8 +133,8 @@ Keep PRs small and focused. Split large changes into logically grouped, smaller Learn how to extend Llama Stack with new capabilities: -- **[Adding a New API Provider](./new-api-provider)** - Add new API providers to the Stack -- **[Adding a Vector Database](./new-vector-database)** - Add new vector databases +- **[Adding a New API Provider](./new_api_provider)** - Add new API providers to the Stack +- **[Adding a Vector Database](./new_vector_database)** - Add new vector databases - **[External Providers](/docs/providers/external)** - Add external providers to the Stack ## Testing @@ -304,12 +304,12 @@ By contributing to Llama Stack, you agree that your contributions will be licens ## Advanced Topics -- **[Testing Record-Replay System](./testing-record-replay)** - Deep dive into testing internals +- **[Testing Record-Replay System](./testing_record_replay)** - Deep dive into testing internals ## Related Resources -- **[Adding API Providers](./new-api-provider)** - Extend Llama Stack with new providers -- **[Vector Database Integration](./new-vector-database)** - Add vector database support +- **[Adding API Providers](./new_api_provider)** - Extend Llama Stack with new providers +- **[Vector Database Integration](./new_vector_database)** - Add vector database support - **[External Providers](/docs/providers/external)** - External provider development - **[GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)** - Community discussion - **[Discord](https://discord.gg/llama-stack)** - Real-time community chat diff --git a/docs/docs/contributing/new-api-provider.mdx b/docs/docs/contributing/new_api_provider.mdx similarity index 98% rename from docs/docs/contributing/new-api-provider.mdx rename to docs/docs/contributing/new_api_provider.mdx index 612350fa6..9f06a4204 100644 --- a/docs/docs/contributing/new-api-provider.mdx +++ b/docs/docs/contributing/new_api_provider.mdx @@ -279,5 +279,5 @@ class YourProvider: - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture - **[External Providers](/docs/providers/external)** - Alternative implementation approach -- **[Vector Database Guide](./new-vector-database)** - Specialized provider implementation -- **[Testing Record-Replay](./testing-record-replay)** - Advanced testing techniques +- **[Vector Database Guide](./new_vector_database)** - Specialized provider implementation +- **[Testing Record-Replay](./testing_record_replay)** - Advanced testing techniques diff --git a/docs/docs/contributing/new-vector-database.mdx b/docs/docs/contributing/new_vector_database.mdx similarity index 98% rename from docs/docs/contributing/new-vector-database.mdx rename to docs/docs/contributing/new_vector_database.mdx index 8d860c1c5..607c10935 100644 --- a/docs/docs/contributing/new-vector-database.mdx +++ b/docs/docs/contributing/new_vector_database.mdx @@ -489,5 +489,5 @@ async def add_chunks(self, chunks: List[Chunk]) -> List[str]: - **[Vector IO Providers](/docs/providers/vector_io)** - Existing provider implementations - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture -- **[New API Provider Guide](./new-api-provider)** - General provider development -- **[Testing Guide](./testing-record-replay)** - Advanced testing techniques +- **[New API Provider Guide](./new_api_provider)** - General provider development +- **[Testing Guide](./testing_record_replay)** - Advanced testing techniques diff --git a/docs/docs/contributing/testing-record-replay.mdx b/docs/docs/contributing/testing_record_replay.mdx similarity index 100% rename from docs/docs/contributing/testing-record-replay.mdx rename to docs/docs/contributing/testing_record_replay.mdx diff --git a/docs/docs/distributions/building-distro.mdx b/docs/docs/distributions/building_distro.mdx similarity index 99% rename from docs/docs/distributions/building-distro.mdx rename to docs/docs/distributions/building_distro.mdx index 7df746a48..002b4ebc3 100644 --- a/docs/docs/distributions/building-distro.mdx +++ b/docs/docs/distributions/building_distro.mdx @@ -167,7 +167,7 @@ You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and ``` :::tip -The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing-run-yaml). +The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing_run_yaml). ::: diff --git a/docs/docs/distributions/configuration.mdx b/docs/docs/distributions/configuration.mdx index cfc519abe..4f7fc481f 100644 --- a/docs/docs/distributions/configuration.mdx +++ b/docs/docs/distributions/configuration.mdx @@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem'; The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution: :::note -The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing-run-yaml). +The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing_run_yaml). :::
diff --git a/docs/docs/distributions/customizing-run-yaml.mdx b/docs/docs/distributions/customizing_run_yaml.mdx similarity index 94% rename from docs/docs/distributions/customizing-run-yaml.mdx rename to docs/docs/distributions/customizing_run_yaml.mdx index da1823d99..1150907a4 100644 --- a/docs/docs/distributions/customizing-run-yaml.mdx +++ b/docs/docs/distributions/customizing_run_yaml.mdx @@ -51,5 +51,5 @@ The goal is to take the generated template and adapt it to your specific infrast ## Related Guides - **[Configuration Reference](./configuration)** - Detailed configuration file format and options -- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run with your custom configuration -- **[Building Custom Distributions](./building-distro)** - Create distributions with your preferred providers +- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run with your custom configuration +- **[Building Custom Distributions](./building_distro)** - Create distributions with your preferred providers diff --git a/docs/docs/distributions/importing-as-library.mdx b/docs/docs/distributions/importing_as_library.mdx similarity index 91% rename from docs/docs/distributions/importing-as-library.mdx rename to docs/docs/distributions/importing_as_library.mdx index 50e846897..7ff9d3f5e 100644 --- a/docs/docs/distributions/importing-as-library.mdx +++ b/docs/docs/distributions/importing_as_library.mdx @@ -36,7 +36,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca response = client.models.list() ``` -If you've created a [custom distribution](./building-distro), you can also use the run.yaml configuration file directly: +If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly: ```python client = LlamaStackAsLibraryClient(config_path) @@ -60,6 +60,6 @@ Library mode is ideal when: ## Related Guides -- **[Building Custom Distributions](./building-distro)** - Create your own distribution for library use +- **[Building Custom Distributions](./building_distro)** - Create your own distribution for library use - **[Configuration Reference](./configuration)** - Understanding the configuration format -- **[Starting Llama Stack Server](./starting-llama-stack-server)** - Alternative server-based deployment +- **[Starting Llama Stack Server](./starting_llama_stack_server)** - Alternative server-based deployment diff --git a/docs/docs/distributions/index.mdx b/docs/docs/distributions/index.mdx index f2101c794..97be14910 100644 --- a/docs/docs/distributions/index.mdx +++ b/docs/docs/distributions/index.mdx @@ -13,9 +13,9 @@ This section provides an overview of the distributions available in Llama Stack. ## Distribution Guides -- **[Available Distributions](./list-of-distributions)** - Complete list and comparison of all distributions -- **[Building Custom Distributions](./building-distro)** - Create your own distribution from scratch -- **[Customizing Configuration](./customizing-run-yaml)** - Customize run.yaml for your needs -- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run distributions -- **[Importing as Library](./importing-as-library)** - Use distributions in your code +- **[Available Distributions](./list_of_distributions)** - Complete list and comparison of all distributions +- **[Building Custom Distributions](./building_distro)** - Create your own distribution from scratch +- **[Customizing Configuration](./customizing_run_yaml)** - Customize run.yaml for your needs +- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run distributions +- **[Importing as Library](./importing_as_library)** - Use distributions in your code - **[Configuration Reference](./configuration)** - Configuration file format details diff --git a/docs/docs/distributions/list-of-distributions.mdx b/docs/docs/distributions/list_of_distributions.mdx similarity index 81% rename from docs/docs/distributions/list-of-distributions.mdx rename to docs/docs/distributions/list_of_distributions.mdx index f15af5d62..03d2df9ca 100644 --- a/docs/docs/distributions/list-of-distributions.mdx +++ b/docs/docs/distributions/list_of_distributions.mdx @@ -34,7 +34,7 @@ Llama Stack provides several pre-configured distributions to help you get starte docker pull llama-stack/distribution-starter ``` -**Guides:** [Starter Distribution Guide](./self-hosted-distro/starter) +**Guides:** [Starter Distribution Guide](./self_hosted_distro/starter) ### ๐Ÿ–ฅ๏ธ Self-Hosted with GPU @@ -47,14 +47,14 @@ docker pull llama-stack/distribution-starter docker pull llama-stack/distribution-meta-reference-gpu ``` -**Guides:** [Meta Reference GPU Guide](./self-hosted-distro/meta-reference-gpu) +**Guides:** [Meta Reference GPU Guide](./self_hosted_distro/meta_reference_gpu) ### ๐Ÿ–ฅ๏ธ Self-Hosted with NVIDA NeMo Microservices **Use `nvidia` if you:** - Want to use Llama Stack with NVIDIA NeMo Microservices -**Guides:** [NVIDIA Distribution Guide](./self-hosted-distro/nvidia) +**Guides:** [NVIDIA Distribution Guide](./self_hosted_distro/nvidia) ### โ˜๏ธ Managed Hosting @@ -65,7 +65,7 @@ docker pull llama-stack/distribution-meta-reference-gpu **Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz) -**Guides:** [Remote-Hosted Endpoints](./remote-hosted-distro/) +**Guides:** [Remote-Hosted Endpoints](./remote_hosted_distro/) ### ๐Ÿ“ฑ Mobile Development @@ -74,8 +74,8 @@ docker pull llama-stack/distribution-meta-reference-gpu - Need on-device inference capabilities - Want offline functionality -- [iOS SDK](./ondevice-distro/ios-sdk) -- [Android SDK](./ondevice-distro/android-sdk) +- [iOS SDK](./ondevice_distro/ios_sdk) +- [Android SDK](./ondevice_distro/android_sdk) ### ๐Ÿ”ง Custom Solutions @@ -84,23 +84,23 @@ docker pull llama-stack/distribution-meta-reference-gpu - You need custom configurations - You want to optimize for your specific use case -**Guides:** [Building Custom Distributions](./building-distro) +**Guides:** [Building Custom Distributions](./building_distro) ## Detailed Documentation ### Self-Hosted Distributions -- **[Starter Distribution](./self-hosted-distro/starter)** - General purpose template -- **[Meta Reference GPU](./self-hosted-distro/meta-reference-gpu)** - High-performance GPU inference +- **[Starter Distribution](./self_hosted_distro/starter)** - General purpose template +- **[Meta Reference GPU](./self_hosted_distro/meta_reference_gpu)** - High-performance GPU inference ### Remote-Hosted Solutions -- **[Remote-Hosted Overview](./remote-hosted-distro/)** - Managed hosting options +- **[Remote-Hosted Overview](./remote_hosted_distro/)** - Managed hosting options ### Mobile SDKs -- **[iOS SDK](./ondevice-distro/ios-sdk)** - Native iOS development -- **[Android SDK](./ondevice-distro/android-sdk)** - Native Android development +- **[iOS SDK](./ondevice_distro/ios_sdk)** - Native iOS development +- **[Android SDK](./ondevice_distro/android_sdk)** - Native Android development ## Decision Flow diff --git a/docs/docs/distributions/ondevice-distro/android-sdk.mdx b/docs/docs/distributions/ondevice_distro/android_sdk.mdx similarity index 99% rename from docs/docs/distributions/ondevice-distro/android-sdk.mdx rename to docs/docs/distributions/ondevice_distro/android_sdk.mdx index 622f614c2..3948fb175 100644 --- a/docs/docs/distributions/ondevice-distro/android-sdk.mdx +++ b/docs/docs/distributions/ondevice_distro/android_sdk.mdx @@ -306,4 +306,4 @@ The API interface is generated using the OpenAPI standard with [Stainless](https - **[llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin)** - Official Kotlin SDK repository - **[Android Demo App](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app)** - Complete example app - **[ExecuTorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library -- **[iOS SDK](./ios-sdk)** - iOS development guide +- **[iOS SDK](./ios_sdk)** - iOS development guide diff --git a/docs/docs/distributions/ondevice-distro/ios-sdk.mdx b/docs/docs/distributions/ondevice_distro/ios_sdk.mdx similarity index 99% rename from docs/docs/distributions/ondevice-distro/ios-sdk.mdx rename to docs/docs/distributions/ondevice_distro/ios_sdk.mdx index 43734622b..b15e67bce 100644 --- a/docs/docs/distributions/ondevice-distro/ios-sdk.mdx +++ b/docs/docs/distributions/ondevice_distro/ios_sdk.mdx @@ -176,4 +176,4 @@ The iOS SDK is ideal for: - **[llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)** - Official Swift SDK repository - **[iOS Calendar Assistant](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant)** - Complete example app - **[executorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library -- **[Android SDK](./android-sdk)** - Android development guide +- **[Android SDK](./android_sdk)** - Android development guide diff --git a/docs/docs/distributions/remote-hosted-distro/index.mdx b/docs/docs/distributions/remote_hosted_distro/index.mdx similarity index 94% rename from docs/docs/distributions/remote-hosted-distro/index.mdx rename to docs/docs/distributions/remote_hosted_distro/index.mdx index bbe094a0d..d63616e21 100644 --- a/docs/docs/distributions/remote-hosted-distro/index.mdx +++ b/docs/docs/distributions/remote_hosted_distro/index.mdx @@ -48,6 +48,6 @@ $ llama-stack-client models list ## Related Guides -- **[Available Distributions](../list-of-distributions)** - Compare with other distribution types +- **[Available Distributions](../list_of_distributions)** - Compare with other distribution types - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Using as Library](../importing-as-library)** - Alternative deployment approach +- **[Using as Library](../importing_as_library)** - Alternative deployment approach diff --git a/docs/docs/distributions/remote-hosted-distro/watsonx.mdx b/docs/docs/distributions/remote_hosted_distro/watsonx.mdx similarity index 96% rename from docs/docs/distributions/remote-hosted-distro/watsonx.mdx rename to docs/docs/distributions/remote_hosted_distro/watsonx.mdx index b0ef05641..5ba6714fd 100644 --- a/docs/docs/distributions/remote-hosted-distro/watsonx.mdx +++ b/docs/docs/distributions/remote_hosted_distro/watsonx.mdx @@ -105,6 +105,6 @@ The watsonx distribution is ideal for: ## Related Guides - **[Remote-Hosted Overview](./index)** - Overview of remote-hosted distributions -- **[Available Distributions](../list-of-distributions)** - Compare with other distributions +- **[Available Distributions](../list_of_distributions)** - Compare with other distributions - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Building Custom Distributions](../building-distro)** - Create your own distribution +- **[Building Custom Distributions](../building_distro)** - Create your own distribution diff --git a/docs/docs/distributions/self-hosted-distro/dell.mdx b/docs/docs/distributions/self_hosted_distro/dell.mdx similarity index 98% rename from docs/docs/distributions/self-hosted-distro/dell.mdx rename to docs/docs/distributions/self_hosted_distro/dell.mdx index a7373592a..0f8d05864 100644 --- a/docs/docs/distributions/self-hosted-distro/dell.mdx +++ b/docs/docs/distributions/self_hosted_distro/dell.mdx @@ -217,6 +217,6 @@ The Dell distribution is ideal for: ## Related Guides -- **[Dell-TGI Distribution](./dell-tgi)** - Dell's TGI-specific distribution +- **[Dell-TGI Distribution](./dell_tgi)** - Dell's TGI-specific distribution - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Building Custom Distributions](../building-distro)** - Create your own distribution +- **[Building Custom Distributions](../building_distro)** - Create your own distribution diff --git a/docs/docs/distributions/self-hosted-distro/dell-tgi.mdx b/docs/docs/distributions/self_hosted_distro/dell_tgi.mdx similarity index 98% rename from docs/docs/distributions/self-hosted-distro/dell-tgi.mdx rename to docs/docs/distributions/self_hosted_distro/dell_tgi.mdx index 53188b7bf..52e1678f9 100644 --- a/docs/docs/distributions/self-hosted-distro/dell-tgi.mdx +++ b/docs/docs/distributions/self_hosted_distro/dell_tgi.mdx @@ -98,4 +98,4 @@ The Dell-TGI distribution is ideal for: - **[Dell Distribution](./dell)** - Dell's standard distribution - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Building Custom Distributions](../building-distro)** - Create your own distribution +- **[Building Custom Distributions](../building_distro)** - Create your own distribution diff --git a/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md b/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md new file mode 100644 index 000000000..84b85b91c --- /dev/null +++ b/docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md @@ -0,0 +1,125 @@ +--- +orphan: true +--- + +# Meta Reference GPU Distribution + +```{toctree} +:maxdepth: 2 +:hidden: + +self +``` + +The `llamastack/distribution-meta-reference-gpu` distribution consists of the following provider configurations: + +| API | Provider(s) | +|-----|-------------| +| agents | `inline::meta-reference` | +| datasetio | `remote::huggingface`, `inline::localfs` | +| eval | `inline::meta-reference` | +| inference | `inline::meta-reference` | +| safety | `inline::llama-guard` | +| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` | +| telemetry | `inline::meta-reference` | +| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` | +| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` | + + +Note that you need access to nvidia GPUs to run this distribution. This distribution is not compatible with CPU-only machines or machines with AMD GPUs. + +### Environment Variables + +The following environment variables can be configured: + +- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`) +- `INFERENCE_MODEL`: Inference model loaded into the Meta Reference server (default: `meta-llama/Llama-3.2-3B-Instruct`) +- `INFERENCE_CHECKPOINT_DIR`: Directory containing the Meta Reference model checkpoint (default: `null`) +- `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`) +- `SAFETY_CHECKPOINT_DIR`: Directory containing the Llama-Guard model checkpoint (default: `null`) + + +## Prerequisite: Downloading Models + +Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints. + +``` +$ llama model list --downloaded +โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“ +โ”ƒ Model โ”ƒ Size โ”ƒ Modified Time โ”ƒ +โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ +โ”‚ Llama3.2-1B-Instruct:int4-qlora-eo8 โ”‚ 1.53 GB โ”‚ 2025-02-26 11:22:28 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Llama3.2-1B โ”‚ 2.31 GB โ”‚ 2025-02-18 21:48:52 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Prompt-Guard-86M โ”‚ 0.02 GB โ”‚ 2025-02-26 11:29:28 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Llama3.2-3B-Instruct:int4-spinquant-eo8 โ”‚ 3.69 GB โ”‚ 2025-02-26 11:37:41 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Llama3.2-3B โ”‚ 5.99 GB โ”‚ 2025-02-18 21:51:26 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Llama3.1-8B โ”‚ 14.97 GB โ”‚ 2025-02-16 10:36:37 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Llama3.2-1B-Instruct:int4-spinquant-eo8 โ”‚ 1.51 GB โ”‚ 2025-02-26 11:35:02 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Llama-Guard-3-1B โ”‚ 2.80 GB โ”‚ 2025-02-26 11:20:46 โ”‚ +โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค +โ”‚ Llama-Guard-3-1B:int4 โ”‚ 0.43 GB โ”‚ 2025-02-26 11:33:33 โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +``` + +## Running the Distribution + +You can do this via venv or Docker which has a pre-built image. + +### Via Docker + +This method allows you to get started quickly without having to build the distribution code. + +```bash +LLAMA_STACK_PORT=8321 +docker run \ + -it \ + --pull always \ + --gpu all \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ~/.llama:/root/.llama \ + llamastack/distribution-meta-reference-gpu \ + --port $LLAMA_STACK_PORT \ + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct +``` + +If you are using Llama Stack Safety / Shield APIs, use: + +```bash +docker run \ + -it \ + --pull always \ + --gpu all \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ~/.llama:/root/.llama \ + llamastack/distribution-meta-reference-gpu \ + --port $LLAMA_STACK_PORT \ + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B +``` + +### Via venv + +Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available. + +```bash +llama stack build --distro meta-reference-gpu --image-type venv +llama stack run distributions/meta-reference-gpu/run.yaml \ + --port 8321 \ + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct +``` + +If you are using Llama Stack Safety / Shield APIs, use: + +```bash +llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \ + --port 8321 \ + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B +``` diff --git a/docs/docs/distributions/self-hosted-distro/meta-reference-gpu.mdx b/docs/docs/distributions/self_hosted_distro/meta_reference_gpu.mdx similarity index 98% rename from docs/docs/distributions/self-hosted-distro/meta-reference-gpu.mdx rename to docs/docs/distributions/self_hosted_distro/meta_reference_gpu.mdx index 284fec78e..ec37d290e 100644 --- a/docs/docs/distributions/self-hosted-distro/meta-reference-gpu.mdx +++ b/docs/docs/distributions/self_hosted_distro/meta_reference_gpu.mdx @@ -149,6 +149,6 @@ The Meta Reference GPU distribution is ideal for: ## Related Guides -- **[Available Distributions](../list-of-distributions)** - Compare with other distributions +- **[Available Distributions](../list_of_distributions)** - Compare with other distributions - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Building Custom Distributions](../building-distro)** - Create your own distribution +- **[Building Custom Distributions](../building_distro)** - Create your own distribution diff --git a/docs/docs/distributions/self_hosted_distro/nvidia.md b/docs/docs/distributions/self_hosted_distro/nvidia.md new file mode 100644 index 000000000..d4f070075 --- /dev/null +++ b/docs/docs/distributions/self_hosted_distro/nvidia.md @@ -0,0 +1,171 @@ +--- +orphan: true +--- + +# NVIDIA Distribution + +The `llamastack/distribution-nvidia` distribution consists of the following provider configurations. + +| API | Provider(s) | +|-----|-------------| +| agents | `inline::meta-reference` | +| datasetio | `inline::localfs`, `remote::nvidia` | +| eval | `remote::nvidia` | +| files | `inline::localfs` | +| inference | `remote::nvidia` | +| post_training | `remote::nvidia` | +| safety | `remote::nvidia` | +| scoring | `inline::basic` | +| telemetry | `inline::meta-reference` | +| tool_runtime | `inline::rag-runtime` | +| vector_io | `inline::faiss` | + + +### Environment Variables + +The following environment variables can be configured: + +- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``) +- `NVIDIA_APPEND_API_VERSION`: Whether to append the API version to the base_url (default: `True`) +- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`) +- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`) +- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`) +- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`) +- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`) +- `NVIDIA_GUARDRAILS_CONFIG_ID`: NVIDIA Guardrail Configuration ID (default: `self-check`) +- `NVIDIA_EVALUATOR_URL`: URL for the NeMo Evaluator Service (default: `http://0.0.0.0:7331`) +- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`) +- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`) + +### Models + +The following models are available by default: + +- `meta/llama3-8b-instruct ` +- `meta/llama3-70b-instruct ` +- `meta/llama-3.1-8b-instruct ` +- `meta/llama-3.1-70b-instruct ` +- `meta/llama-3.1-405b-instruct ` +- `meta/llama-3.2-1b-instruct ` +- `meta/llama-3.2-3b-instruct ` +- `meta/llama-3.2-11b-vision-instruct ` +- `meta/llama-3.2-90b-vision-instruct ` +- `meta/llama-3.3-70b-instruct ` +- `nvidia/vila ` +- `nvidia/llama-3.2-nv-embedqa-1b-v2 ` +- `nvidia/nv-embedqa-e5-v5 ` +- `nvidia/nv-embedqa-mistral-7b-v2 ` +- `snowflake/arctic-embed-l ` + + +## Prerequisites +### NVIDIA API Keys + +Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). Use this key for the `NVIDIA_API_KEY` environment variable. + +### Deploy NeMo Microservices Platform +The NVIDIA NeMo microservices platform supports end-to-end microservice deployment of a complete AI flywheel on your Kubernetes cluster through the NeMo Microservices Helm Chart. Please reference the [NVIDIA NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for platform prerequisites and instructions to install and deploy the platform. + +## Supported Services +Each Llama Stack API corresponds to a specific NeMo microservice. The core microservices (Customizer, Evaluator, Guardrails) are exposed by the same endpoint. The platform components (Data Store) are each exposed by separate endpoints. + +### Inference: NVIDIA NIM +NVIDIA NIM is used for running inference with registered models. There are two ways to access NVIDIA NIMs: + 1. Hosted (default): Preview APIs hosted at https://integrate.api.nvidia.com (Requires an API key) + 2. Self-hosted: NVIDIA NIMs that run on your own infrastructure. + +The deployed platform includes the NIM Proxy microservice, which is the service that provides to access your NIMs (for example, to run inference on a model). Set the `NVIDIA_BASE_URL` environment variable to use your NVIDIA NIM Proxy deployment. + +### Datasetio API: NeMo Data Store +The NeMo Data Store microservice serves as the default file storage solution for the NeMo microservices platform. It exposts APIs compatible with the Hugging Face Hub client (`HfApi`), so you can use the client to interact with Data Store. The `NVIDIA_DATASETS_URL` environment variable should point to your NeMo Data Store endpoint. + +See the {repopath}`NVIDIA Datasetio docs::llama_stack/providers/remote/datasetio/nvidia/README.md` for supported features and example usage. + +### Eval API: NeMo Evaluator +The NeMo Evaluator microservice supports evaluation of LLMs. Launching an Evaluation job with NeMo Evaluator requires an Evaluation Config (an object that contains metadata needed by the job). A Llama Stack Benchmark maps to an Evaluation Config, so registering a Benchmark creates an Evaluation Config in NeMo Evaluator. The `NVIDIA_EVALUATOR_URL` environment variable should point to your NeMo Microservices endpoint. + +See the {repopath}`NVIDIA Eval docs::llama_stack/providers/remote/eval/nvidia/README.md` for supported features and example usage. + +### Post-Training API: NeMo Customizer +The NeMo Customizer microservice supports fine-tuning models. You can reference {repopath}`this list of supported models::llama_stack/providers/remote/post_training/nvidia/models.py` that can be fine-tuned using Llama Stack. The `NVIDIA_CUSTOMIZER_URL` environment variable should point to your NeMo Microservices endpoint. + +See the {repopath}`NVIDIA Post-Training docs::llama_stack/providers/remote/post_training/nvidia/README.md` for supported features and example usage. + +### Safety API: NeMo Guardrails +The NeMo Guardrails microservice sits between your application and the LLM, and adds checks and content moderation to a model. The `GUARDRAILS_SERVICE_URL` environment variable should point to your NeMo Microservices endpoint. + +See the {repopath}`NVIDIA Safety docs::llama_stack/providers/remote/safety/nvidia/README.md` for supported features and example usage. + +## Deploying models +In order to use a registered model with the Llama Stack APIs, ensure the corresponding NIM is deployed to your environment. For example, you can use the NIM Proxy microservice to deploy `meta/llama-3.2-1b-instruct`. + +Note: For improved inference speeds, we need to use NIM with `fast_outlines` guided decoding system (specified in the request body). This is the default if you deployed the platform with the NeMo Microservices Helm Chart. +```sh +# URL to NeMo NIM Proxy service +export NEMO_URL="http://nemo.test" + +curl --location "$NEMO_URL/v1/deployment/model-deployments" \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "name": "llama-3.2-1b-instruct", + "namespace": "meta", + "config": { + "model": "meta/llama-3.2-1b-instruct", + "nim_deployment": { + "image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct", + "image_tag": "1.8.3", + "pvc_size": "25Gi", + "gpu": 1, + "additional_envs": { + "NIM_GUIDED_DECODING_BACKEND": "fast_outlines" + } + } + } + }' +``` +This NIM deployment should take approximately 10 minutes to go live. [See the docs](https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/deploy-nims.html) for more information on how to deploy a NIM and verify it's available for inference. + +You can also remove a deployed NIM to free up GPU resources, if needed. +```sh +export NEMO_URL="http://nemo.test" + +curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct" +``` + +## Running Llama Stack with NVIDIA + +You can do this via venv (build code), or Docker which has a pre-built image. + +### Via Docker + +This method allows you to get started quickly without having to build the distribution code. + +```bash +LLAMA_STACK_PORT=8321 +docker run \ + -it \ + --pull always \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ./run.yaml:/root/my-run.yaml \ + llamastack/distribution-nvidia \ + --config /root/my-run.yaml \ + --port $LLAMA_STACK_PORT \ + --env NVIDIA_API_KEY=$NVIDIA_API_KEY +``` + +### Via venv + +If you've set up your local development environment, you can also build the image using your local virtual environment. + +```bash +INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct +llama stack build --distro nvidia --image-type venv +llama stack run ./run.yaml \ + --port 8321 \ + --env NVIDIA_API_KEY=$NVIDIA_API_KEY \ + --env INFERENCE_MODEL=$INFERENCE_MODEL +``` + +## Example Notebooks +For examples of how to use the NVIDIA Distribution to run inference, fine-tune, evaluate, and run safety checks on your LLMs, you can reference the example notebooks in {repopath}`docs/notebooks/nvidia`. diff --git a/docs/docs/distributions/self-hosted-distro/nvidia.mdx b/docs/docs/distributions/self_hosted_distro/nvidia.mdx similarity index 98% rename from docs/docs/distributions/self-hosted-distro/nvidia.mdx rename to docs/docs/distributions/self_hosted_distro/nvidia.mdx index 2b44dc42a..135df0c95 100644 --- a/docs/docs/distributions/self-hosted-distro/nvidia.mdx +++ b/docs/docs/distributions/self_hosted_distro/nvidia.mdx @@ -192,6 +192,6 @@ The NVIDIA distribution is ideal for: ## Related Guides -- **[Available Distributions](../list-of-distributions)** - Compare with other distributions +- **[Available Distributions](../list_of_distributions)** - Compare with other distributions - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Building Custom Distributions](../building-distro)** - Create your own distribution +- **[Building Custom Distributions](../building_distro)** - Create your own distribution diff --git a/docs/docs/distributions/self-hosted-distro/passthrough.mdx b/docs/docs/distributions/self_hosted_distro/passthrough.mdx similarity index 93% rename from docs/docs/distributions/self-hosted-distro/passthrough.mdx rename to docs/docs/distributions/self_hosted_distro/passthrough.mdx index deea6564a..8a7304a44 100644 --- a/docs/docs/distributions/self-hosted-distro/passthrough.mdx +++ b/docs/docs/distributions/self_hosted_distro/passthrough.mdx @@ -57,6 +57,6 @@ The Passthrough distribution is ideal for: ## Related Guides -- **[Building Custom Distributions](../building-distro)** - Create your own distribution +- **[Building Custom Distributions](../building_distro)** - Create your own distribution - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Starting Llama Stack Server](../starting-llama-stack-server)** - How to run distributions +- **[Starting Llama Stack Server](../starting_llama_stack_server)** - How to run distributions diff --git a/docs/docs/distributions/self-hosted-distro/starter.mdx b/docs/docs/distributions/self_hosted_distro/starter.mdx similarity index 98% rename from docs/docs/distributions/self-hosted-distro/starter.mdx rename to docs/docs/distributions/self_hosted_distro/starter.mdx index 62eabed41..224c140c2 100644 --- a/docs/docs/distributions/self-hosted-distro/starter.mdx +++ b/docs/docs/distributions/self_hosted_distro/starter.mdx @@ -231,6 +231,6 @@ The starter distribution is ideal for developers who want to experiment with dif ## Related Guides -- **[Available Distributions](../list-of-distributions)** - Compare with other distributions +- **[Available Distributions](../list_of_distributions)** - Compare with other distributions - **[Configuration Reference](../configuration)** - Understanding configuration options -- **[Building Custom Distributions](../building-distro)** - Create your own distribution +- **[Building Custom Distributions](../building_distro)** - Create your own distribution diff --git a/docs/docs/distributions/starting-llama-stack-server.mdx b/docs/docs/distributions/starting_llama_stack_server.mdx similarity index 88% rename from docs/docs/distributions/starting-llama-stack-server.mdx rename to docs/docs/distributions/starting_llama_stack_server.mdx index 9db11ef2c..3826b7642 100644 --- a/docs/docs/distributions/starting-llama-stack-server.mdx +++ b/docs/docs/distributions/starting_llama_stack_server.mdx @@ -13,13 +13,13 @@ You can run a Llama Stack server in one of the following ways: This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (e.g. fireworks, together, groq, etc.) -**See:** [Using Llama Stack as a Library](./importing-as-library) +**See:** [Using Llama Stack as a Library](./importing_as_library) ## Container Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. -**See:** [Available Distributions](./list-of-distributions) for more details on selecting the right distribution. +**See:** [Available Distributions](./list_of_distributions) for more details on selecting the right distribution. ## Kubernetes @@ -69,7 +69,7 @@ If you have built a container image and want to deploy it in a Kubernetes cluste ## Related Guides -- **[Available Distributions](./list-of-distributions)** - Choose the right distribution -- **[Building Custom Distributions](./building-distro)** - Create your own distribution +- **[Available Distributions](./list_of_distributions)** - Choose the right distribution +- **[Building Custom Distributions](./building_distro)** - Create your own distribution - **[Configuration Reference](./configuration)** - Understanding configuration options -- **[Customizing run.yaml](./customizing-run-yaml)** - Adapt configurations to your environment +- **[Customizing run.yaml](./customizing_run_yaml)** - Adapt configurations to your environment diff --git a/docs/docs/getting-started/detailed-tutorial.mdx b/docs/docs/getting_started/detailed_tutorial.mdx similarity index 100% rename from docs/docs/getting-started/detailed-tutorial.mdx rename to docs/docs/getting_started/detailed_tutorial.mdx diff --git a/docs/docs/getting-started/index.mdx b/docs/docs/getting_started/index.mdx similarity index 100% rename from docs/docs/getting-started/index.mdx rename to docs/docs/getting_started/index.mdx diff --git a/docs/docs/getting-started/libraries.mdx b/docs/docs/getting_started/libraries.mdx similarity index 100% rename from docs/docs/getting-started/libraries.mdx rename to docs/docs/getting_started/libraries.mdx diff --git a/docs/docs/references/client-cli.mdx b/docs/docs/references/client_cli.mdx similarity index 100% rename from docs/docs/references/client-cli.mdx rename to docs/docs/references/client_cli.mdx diff --git a/docs/docs/references/evals-reference.mdx b/docs/docs/references/evals_reference.mdx similarity index 99% rename from docs/docs/references/evals-reference.mdx rename to docs/docs/references/evals_reference.mdx index ff06b1c35..0ec555e66 100644 --- a/docs/docs/references/evals-reference.mdx +++ b/docs/docs/references/evals_reference.mdx @@ -199,7 +199,7 @@ pprint(response) Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets. -In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building-applications/playground) for an interactive interface to upload datasets and run scorings. +In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building_applications/playground) for an interactive interface to upload datasets and run scorings. ```python judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8" diff --git a/docs/docs/references/index.mdx b/docs/docs/references/index.mdx index 2e158217f..8e1323e4c 100644 --- a/docs/docs/references/index.mdx +++ b/docs/docs/references/index.mdx @@ -1,6 +1,6 @@ # References -- [Python SDK Reference](python-sdk) for the Llama Stack Python SDK -- [Llama CLI](llama-cli) for building and running your Llama Stack server -- [Llama Stack Client CLI](client-cli) for interacting with your Llama Stack server -- [Evaluations Reference](evals-reference) for running evaluations and benchmarks +- [Python SDK Reference](python_sdk) for the Llama Stack Python SDK +- [Llama CLI](llama_cli) for building and running your Llama Stack server +- [Llama Stack Client CLI](client_cli) for interacting with your Llama Stack server +- [Evaluations Reference](evals_reference) for running evaluations and benchmarks diff --git a/docs/docs/references/llama-cli.mdx b/docs/docs/references/llama_cli.mdx similarity index 99% rename from docs/docs/references/llama-cli.mdx rename to docs/docs/references/llama_cli.mdx index 392384556..7d5db401a 100644 --- a/docs/docs/references/llama-cli.mdx +++ b/docs/docs/references/llama_cli.mdx @@ -30,7 +30,7 @@ You have two ways to install Llama Stack: 1. `download`: Supports downloading models from Meta or Hugging Face. 2. `model`: Lists available models and their properties. -3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building-distro) documentation. +3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building_distro) documentation. ### Sample Usage diff --git a/docs/docs/references/python-sdk.mdx b/docs/docs/references/python_sdk.mdx similarity index 100% rename from docs/docs/references/python-sdk.mdx rename to docs/docs/references/python_sdk.mdx