docs naming update

This commit is contained in:
Alexey Rybak 2025-09-22 17:17:07 -07:00
parent 584f3592ce
commit 9a8652ee30
42 changed files with 377 additions and 81 deletions

View file

@ -15,7 +15,7 @@ The Evaluation API works with several related APIs to provide comprehensive eval
- `/eval` + `/benchmarks` API - Generate outputs and perform scoring - `/eval` + `/benchmarks` API - Generate outputs and perform scoring
:::tip :::tip
For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide. For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide.
::: :::
## Meta Reference ## Meta Reference
@ -112,7 +112,7 @@ Llama Stack pre-registers several popular open-benchmarks for easy model evaluat
## Next Steps ## Next Steps
- Check out the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide for detailed conceptual information - Check out the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide for detailed conceptual information
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples - See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive CLI and API usage - Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive CLI and API usage
- Explore the [Scoring](./scoring.mdx) documentation for available scoring functions - Explore the [Scoring](./scoring.mdx) documentation for available scoring functions

View file

@ -300,6 +300,6 @@ customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:=http://nemo.test}
## Next Steps ## Next Steps
- Check out the [Building Applications - Fine-tuning](../building-applications/index.mdx) guide for application-level examples - Check out the [Building Applications - Fine-tuning](../building_applications/index.mdx) guide for application-level examples
- See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation - See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation
- Review the [API Reference](../api-reference/post-training.mdx) for complete API documentation - Review the [API Reference](../api_reference/post_training.mdx) for complete API documentation

View file

@ -188,6 +188,6 @@ The Scoring API works closely with the [Evaluation](./evaluation.mdx) API to pro
## Next Steps ## Next Steps
- Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations - Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples - See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive scoring function usage - Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive scoring function usage
- Explore the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) for detailed conceptual information - Explore the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) for detailed conceptual information

View file

@ -102,11 +102,11 @@ Each turn consists of multiple steps that represent the agent's thought process:
## Agent Execution Loop ## Agent Execution Loop
Refer to the [Agent Execution Loop](./agent-execution-loop) for more details on what happens within an agent turn. Refer to the [Agent Execution Loop](./agent_execution_loop) for more details on what happens within an agent turn.
## Related Resources ## Related Resources
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding the internal processing flow - **[Agent Execution Loop](./agent_execution_loop)** - Understanding the internal processing flow
- **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced agents - **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced agents
- **[Tools Integration](./tools)** - Extending agent capabilities with external tools - **[Tools Integration](./tools)** - Extending agent capabilities with external tools
- **[Safety Guardrails](./safety)** - Implementing responsible AI practices - **[Safety Guardrails](./safety)** - Implementing responsible AI practices

View file

@ -21,8 +21,8 @@ Here are the key topics that will help you build effective AI applications:
### 🤖 **Agent Development** ### 🤖 **Agent Development**
- **[Agent Framework](./agent)** - Understand the components and design patterns of the Llama Stack agent framework - **[Agent Framework](./agent)** - Understand the components and design patterns of the Llama Stack agent framework
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process information, make decisions, and execute actions - **[Agent Execution Loop](./agent_execution_loop)** - How agents process information, make decisions, and execute actions
- **[Agents vs Responses API](./responses-vs-agents)** - Learn when to use each API for different use cases - **[Agents vs Responses API](./responses_vs_agents)** - Learn when to use each API for different use cases
### 📚 **Knowledge Integration** ### 📚 **Knowledge Integration**
- **[RAG (Retrieval-Augmented Generation)](./rag)** - Enhance your agents with external knowledge through retrieval mechanisms - **[RAG (Retrieval-Augmented Generation)](./rag)** - Enhance your agents with external knowledge through retrieval mechanisms

View file

@ -215,7 +215,7 @@ Use this framework to choose the right API for your use case:
## Related Resources ## Related Resources
- **[Agents](./agent)** - Understanding the Agents API fundamentals - **[Agents](./agent)** - Understanding the Agents API fundamentals
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process turns and steps - **[Agent Execution Loop](./agent_execution_loop)** - How agents process turns and steps
- **[Tools Integration](./tools)** - Adding capabilities to both APIs - **[Tools Integration](./tools)** - Adding capabilities to both APIs
- **[OpenAI Compatibility](/docs/providers/openai-compatibility)** - Using OpenAI-compatible endpoints - **[OpenAI Compatibility](/docs/providers/openai-compatibility)** - Using OpenAI-compatible endpoints
- **[Safety Guardrails](./safety)** - Implementing safety measures in agents - **[Safety Guardrails](./safety)** - Implementing safety measures in agents

View file

@ -389,7 +389,7 @@ client.shields.register(
## Related Resources ## Related Resources
- **[Agents](./agent)** - Integrating safety shields with intelligent agents - **[Agents](./agent)** - Integrating safety shields with intelligent agents
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding safety in the execution flow - **[Agent Execution Loop](./agent_execution_loop)** - Understanding safety in the execution flow
- **[Evaluations](./evals)** - Evaluating safety shield effectiveness - **[Evaluations](./evals)** - Evaluating safety shield effectiveness
- **[Telemetry](./telemetry)** - Monitoring safety violations and metrics - **[Telemetry](./telemetry)** - Monitoring safety violations and metrics
- **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details - **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details

View file

@ -335,6 +335,6 @@ response = agent.create_turn(
- **[Agents](./agent)** - Building intelligent agents with tools - **[Agents](./agent)** - Building intelligent agents with tools
- **[RAG (Retrieval Augmented Generation)](./rag)** - Using knowledge retrieval tools - **[RAG (Retrieval Augmented Generation)](./rag)** - Using knowledge retrieval tools
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding tool execution flow - **[Agent Execution Loop](./agent_execution_loop)** - Understanding tool execution flow
- **[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Comprehensive examples - **[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Comprehensive examples
- **[Llama Stack Apps Examples](https://github.com/meta-llama/llama-stack-apps)** - Real-world tool implementations - **[Llama Stack Apps Examples](https://github.com/meta-llama/llama-stack-apps)** - Real-world tool implementations

View file

@ -30,7 +30,7 @@ The list of open-benchmarks we currently support:
- [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions. - [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models. - [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
You can follow this [contributing guide](../references/evals-reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack You can follow this [contributing guide](../references/evals_reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
### Run evaluation on open-benchmarks via CLI ### Run evaluation on open-benchmarks via CLI
@ -67,5 +67,5 @@ evaluation results over there.
## What's Next? ## What's Next?
- Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP). - Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
- Check out our [Building Applications - Evaluation](../building-applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications. - Check out our [Building Applications - Evaluation](../building_applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
- Check out our [Evaluation Reference](../references/evals-reference.mdx) for more details on the APIs. - Check out our [Evaluation Reference](../references/evals_reference.mdx) for more details on the APIs.

View file

@ -13,6 +13,6 @@ This section covers the key concepts you need to understand to work effectively
- **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits - **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits
- **[APIs](./apis)** - Available REST APIs and planned capabilities - **[APIs](./apis)** - Available REST APIs and planned capabilities
- **[API Providers](./api-providers)** - Remote vs inline provider implementations - **[API Providers](./api_providers)** - Remote vs inline provider implementations
- **[Distributions](./distributions)** - Pre-packaged provider configurations - **[Distributions](./distributions)** - Pre-packaged provider configurations
- **[Resources](./resources)** - Resource federation and registration - **[Resources](./resources)** - Resource federation and registration

View file

@ -133,8 +133,8 @@ Keep PRs small and focused. Split large changes into logically grouped, smaller
Learn how to extend Llama Stack with new capabilities: Learn how to extend Llama Stack with new capabilities:
- **[Adding a New API Provider](./new-api-provider)** - Add new API providers to the Stack - **[Adding a New API Provider](./new_api_provider)** - Add new API providers to the Stack
- **[Adding a Vector Database](./new-vector-database)** - Add new vector databases - **[Adding a Vector Database](./new_vector_database)** - Add new vector databases
- **[External Providers](/docs/providers/external)** - Add external providers to the Stack - **[External Providers](/docs/providers/external)** - Add external providers to the Stack
## Testing ## Testing
@ -304,12 +304,12 @@ By contributing to Llama Stack, you agree that your contributions will be licens
## Advanced Topics ## Advanced Topics
- **[Testing Record-Replay System](./testing-record-replay)** - Deep dive into testing internals - **[Testing Record-Replay System](./testing_record_replay)** - Deep dive into testing internals
## Related Resources ## Related Resources
- **[Adding API Providers](./new-api-provider)** - Extend Llama Stack with new providers - **[Adding API Providers](./new_api_provider)** - Extend Llama Stack with new providers
- **[Vector Database Integration](./new-vector-database)** - Add vector database support - **[Vector Database Integration](./new_vector_database)** - Add vector database support
- **[External Providers](/docs/providers/external)** - External provider development - **[External Providers](/docs/providers/external)** - External provider development
- **[GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)** - Community discussion - **[GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)** - Community discussion
- **[Discord](https://discord.gg/llama-stack)** - Real-time community chat - **[Discord](https://discord.gg/llama-stack)** - Real-time community chat

View file

@ -279,5 +279,5 @@ class YourProvider:
- **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
- **[External Providers](/docs/providers/external)** - Alternative implementation approach - **[External Providers](/docs/providers/external)** - Alternative implementation approach
- **[Vector Database Guide](./new-vector-database)** - Specialized provider implementation - **[Vector Database Guide](./new_vector_database)** - Specialized provider implementation
- **[Testing Record-Replay](./testing-record-replay)** - Advanced testing techniques - **[Testing Record-Replay](./testing_record_replay)** - Advanced testing techniques

View file

@ -489,5 +489,5 @@ async def add_chunks(self, chunks: List[Chunk]) -> List[str]:
- **[Vector IO Providers](/docs/providers/vector_io)** - Existing provider implementations - **[Vector IO Providers](/docs/providers/vector_io)** - Existing provider implementations
- **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture - **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
- **[New API Provider Guide](./new-api-provider)** - General provider development - **[New API Provider Guide](./new_api_provider)** - General provider development
- **[Testing Guide](./testing-record-replay)** - Advanced testing techniques - **[Testing Guide](./testing_record_replay)** - Advanced testing techniques

View file

@ -167,7 +167,7 @@ You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and
``` ```
:::tip :::tip
The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing-run-yaml). The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
::: :::
</TabItem> </TabItem>

View file

@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution: The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
:::note :::note
The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing-run-yaml). The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
::: :::
<details> <details>

View file

@ -51,5 +51,5 @@ The goal is to take the generated template and adapt it to your specific infrast
## Related Guides ## Related Guides
- **[Configuration Reference](./configuration)** - Detailed configuration file format and options - **[Configuration Reference](./configuration)** - Detailed configuration file format and options
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run with your custom configuration - **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run with your custom configuration
- **[Building Custom Distributions](./building-distro)** - Create distributions with your preferred providers - **[Building Custom Distributions](./building_distro)** - Create distributions with your preferred providers

View file

@ -36,7 +36,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
response = client.models.list() response = client.models.list()
``` ```
If you've created a [custom distribution](./building-distro), you can also use the run.yaml configuration file directly: If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:
```python ```python
client = LlamaStackAsLibraryClient(config_path) client = LlamaStackAsLibraryClient(config_path)
@ -60,6 +60,6 @@ Library mode is ideal when:
## Related Guides ## Related Guides
- **[Building Custom Distributions](./building-distro)** - Create your own distribution for library use - **[Building Custom Distributions](./building_distro)** - Create your own distribution for library use
- **[Configuration Reference](./configuration)** - Understanding the configuration format - **[Configuration Reference](./configuration)** - Understanding the configuration format
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - Alternative server-based deployment - **[Starting Llama Stack Server](./starting_llama_stack_server)** - Alternative server-based deployment

View file

@ -13,9 +13,9 @@ This section provides an overview of the distributions available in Llama Stack.
## Distribution Guides ## Distribution Guides
- **[Available Distributions](./list-of-distributions)** - Complete list and comparison of all distributions - **[Available Distributions](./list_of_distributions)** - Complete list and comparison of all distributions
- **[Building Custom Distributions](./building-distro)** - Create your own distribution from scratch - **[Building Custom Distributions](./building_distro)** - Create your own distribution from scratch
- **[Customizing Configuration](./customizing-run-yaml)** - Customize run.yaml for your needs - **[Customizing Configuration](./customizing_run_yaml)** - Customize run.yaml for your needs
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run distributions - **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run distributions
- **[Importing as Library](./importing-as-library)** - Use distributions in your code - **[Importing as Library](./importing_as_library)** - Use distributions in your code
- **[Configuration Reference](./configuration)** - Configuration file format details - **[Configuration Reference](./configuration)** - Configuration file format details

View file

@ -34,7 +34,7 @@ Llama Stack provides several pre-configured distributions to help you get starte
docker pull llama-stack/distribution-starter docker pull llama-stack/distribution-starter
``` ```
**Guides:** [Starter Distribution Guide](./self-hosted-distro/starter) **Guides:** [Starter Distribution Guide](./self_hosted_distro/starter)
### 🖥️ Self-Hosted with GPU ### 🖥️ Self-Hosted with GPU
@ -47,14 +47,14 @@ docker pull llama-stack/distribution-starter
docker pull llama-stack/distribution-meta-reference-gpu docker pull llama-stack/distribution-meta-reference-gpu
``` ```
**Guides:** [Meta Reference GPU Guide](./self-hosted-distro/meta-reference-gpu) **Guides:** [Meta Reference GPU Guide](./self_hosted_distro/meta_reference_gpu)
### 🖥️ Self-Hosted with NVIDA NeMo Microservices ### 🖥️ Self-Hosted with NVIDA NeMo Microservices
**Use `nvidia` if you:** **Use `nvidia` if you:**
- Want to use Llama Stack with NVIDIA NeMo Microservices - Want to use Llama Stack with NVIDIA NeMo Microservices
**Guides:** [NVIDIA Distribution Guide](./self-hosted-distro/nvidia) **Guides:** [NVIDIA Distribution Guide](./self_hosted_distro/nvidia)
### ☁️ Managed Hosting ### ☁️ Managed Hosting
@ -65,7 +65,7 @@ docker pull llama-stack/distribution-meta-reference-gpu
**Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz) **Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)
**Guides:** [Remote-Hosted Endpoints](./remote-hosted-distro/) **Guides:** [Remote-Hosted Endpoints](./remote_hosted_distro/)
### 📱 Mobile Development ### 📱 Mobile Development
@ -74,8 +74,8 @@ docker pull llama-stack/distribution-meta-reference-gpu
- Need on-device inference capabilities - Need on-device inference capabilities
- Want offline functionality - Want offline functionality
- [iOS SDK](./ondevice-distro/ios-sdk) - [iOS SDK](./ondevice_distro/ios_sdk)
- [Android SDK](./ondevice-distro/android-sdk) - [Android SDK](./ondevice_distro/android_sdk)
### 🔧 Custom Solutions ### 🔧 Custom Solutions
@ -84,23 +84,23 @@ docker pull llama-stack/distribution-meta-reference-gpu
- You need custom configurations - You need custom configurations
- You want to optimize for your specific use case - You want to optimize for your specific use case
**Guides:** [Building Custom Distributions](./building-distro) **Guides:** [Building Custom Distributions](./building_distro)
## Detailed Documentation ## Detailed Documentation
### Self-Hosted Distributions ### Self-Hosted Distributions
- **[Starter Distribution](./self-hosted-distro/starter)** - General purpose template - **[Starter Distribution](./self_hosted_distro/starter)** - General purpose template
- **[Meta Reference GPU](./self-hosted-distro/meta-reference-gpu)** - High-performance GPU inference - **[Meta Reference GPU](./self_hosted_distro/meta_reference_gpu)** - High-performance GPU inference
### Remote-Hosted Solutions ### Remote-Hosted Solutions
- **[Remote-Hosted Overview](./remote-hosted-distro/)** - Managed hosting options - **[Remote-Hosted Overview](./remote_hosted_distro/)** - Managed hosting options
### Mobile SDKs ### Mobile SDKs
- **[iOS SDK](./ondevice-distro/ios-sdk)** - Native iOS development - **[iOS SDK](./ondevice_distro/ios_sdk)** - Native iOS development
- **[Android SDK](./ondevice-distro/android-sdk)** - Native Android development - **[Android SDK](./ondevice_distro/android_sdk)** - Native Android development
## Decision Flow ## Decision Flow

View file

@ -306,4 +306,4 @@ The API interface is generated using the OpenAPI standard with [Stainless](https
- **[llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin)** - Official Kotlin SDK repository - **[llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin)** - Official Kotlin SDK repository
- **[Android Demo App](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app)** - Complete example app - **[Android Demo App](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app)** - Complete example app
- **[ExecuTorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library - **[ExecuTorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
- **[iOS SDK](./ios-sdk)** - iOS development guide - **[iOS SDK](./ios_sdk)** - iOS development guide

View file

@ -176,4 +176,4 @@ The iOS SDK is ideal for:
- **[llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)** - Official Swift SDK repository - **[llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)** - Official Swift SDK repository
- **[iOS Calendar Assistant](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant)** - Complete example app - **[iOS Calendar Assistant](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant)** - Complete example app
- **[executorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library - **[executorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
- **[Android SDK](./android-sdk)** - Android development guide - **[Android SDK](./android_sdk)** - Android development guide

View file

@ -48,6 +48,6 @@ $ llama-stack-client models list
## Related Guides ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distribution types - **[Available Distributions](../list_of_distributions)** - Compare with other distribution types
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Using as Library](../importing-as-library)** - Alternative deployment approach - **[Using as Library](../importing_as_library)** - Alternative deployment approach

View file

@ -105,6 +105,6 @@ The watsonx distribution is ideal for:
## Related Guides ## Related Guides
- **[Remote-Hosted Overview](./index)** - Overview of remote-hosted distributions - **[Remote-Hosted Overview](./index)** - Overview of remote-hosted distributions
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions - **[Available Distributions](../list_of_distributions)** - Compare with other distributions
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution - **[Building Custom Distributions](../building_distro)** - Create your own distribution

View file

@ -217,6 +217,6 @@ The Dell distribution is ideal for:
## Related Guides ## Related Guides
- **[Dell-TGI Distribution](./dell-tgi)** - Dell's TGI-specific distribution - **[Dell-TGI Distribution](./dell_tgi)** - Dell's TGI-specific distribution
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution - **[Building Custom Distributions](../building_distro)** - Create your own distribution

View file

@ -98,4 +98,4 @@ The Dell-TGI distribution is ideal for:
- **[Dell Distribution](./dell)** - Dell's standard distribution - **[Dell Distribution](./dell)** - Dell's standard distribution
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution - **[Building Custom Distributions](../building_distro)** - Create your own distribution

View file

@ -0,0 +1,125 @@
---
orphan: true
---
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
# Meta Reference GPU Distribution
```{toctree}
:maxdepth: 2
:hidden:
self
```
The `llamastack/distribution-meta-reference-gpu` distribution consists of the following provider configurations:
| API | Provider(s) |
|-----|-------------|
| agents | `inline::meta-reference` |
| datasetio | `remote::huggingface`, `inline::localfs` |
| eval | `inline::meta-reference` |
| inference | `inline::meta-reference` |
| safety | `inline::llama-guard` |
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
| telemetry | `inline::meta-reference` |
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
Note that you need access to nvidia GPUs to run this distribution. This distribution is not compatible with CPU-only machines or machines with AMD GPUs.
### Environment Variables
The following environment variables can be configured:
- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
- `INFERENCE_MODEL`: Inference model loaded into the Meta Reference server (default: `meta-llama/Llama-3.2-3B-Instruct`)
- `INFERENCE_CHECKPOINT_DIR`: Directory containing the Meta Reference model checkpoint (default: `null`)
- `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`)
- `SAFETY_CHECKPOINT_DIR`: Directory containing the Llama-Guard model checkpoint (default: `null`)
## Prerequisite: Downloading Models
Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
```
$ llama model list --downloaded
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Model ┃ Size ┃ Modified Time ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │
└─────────────────────────────────────────┴──────────┴─────────────────────┘
```
## Running the Distribution
You can do this via venv or Docker which has a pre-built image.
### Via Docker
This method allows you to get started quickly without having to build the distribution code.
```bash
LLAMA_STACK_PORT=8321
docker run \
-it \
--pull always \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
```
If you are using Llama Stack Safety / Shield APIs, use:
```bash
docker run \
-it \
--pull always \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
```
### Via venv
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
```bash
llama stack build --distro meta-reference-gpu --image-type venv
llama stack run distributions/meta-reference-gpu/run.yaml \
--port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
```
If you are using Llama Stack Safety / Shield APIs, use:
```bash
llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
--port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
```

View file

@ -149,6 +149,6 @@ The Meta Reference GPU distribution is ideal for:
## Related Guides ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions - **[Available Distributions](../list_of_distributions)** - Compare with other distributions
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution - **[Building Custom Distributions](../building_distro)** - Create your own distribution

View file

@ -0,0 +1,171 @@
---
orphan: true
---
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
# NVIDIA Distribution
The `llamastack/distribution-nvidia` distribution consists of the following provider configurations.
| API | Provider(s) |
|-----|-------------|
| agents | `inline::meta-reference` |
| datasetio | `inline::localfs`, `remote::nvidia` |
| eval | `remote::nvidia` |
| files | `inline::localfs` |
| inference | `remote::nvidia` |
| post_training | `remote::nvidia` |
| safety | `remote::nvidia` |
| scoring | `inline::basic` |
| telemetry | `inline::meta-reference` |
| tool_runtime | `inline::rag-runtime` |
| vector_io | `inline::faiss` |
### Environment Variables
The following environment variables can be configured:
- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``)
- `NVIDIA_APPEND_API_VERSION`: Whether to append the API version to the base_url (default: `True`)
- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`)
- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`)
- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`)
- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`)
- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`)
- `NVIDIA_GUARDRAILS_CONFIG_ID`: NVIDIA Guardrail Configuration ID (default: `self-check`)
- `NVIDIA_EVALUATOR_URL`: URL for the NeMo Evaluator Service (default: `http://0.0.0.0:7331`)
- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`)
- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`)
### Models
The following models are available by default:
- `meta/llama3-8b-instruct `
- `meta/llama3-70b-instruct `
- `meta/llama-3.1-8b-instruct `
- `meta/llama-3.1-70b-instruct `
- `meta/llama-3.1-405b-instruct `
- `meta/llama-3.2-1b-instruct `
- `meta/llama-3.2-3b-instruct `
- `meta/llama-3.2-11b-vision-instruct `
- `meta/llama-3.2-90b-vision-instruct `
- `meta/llama-3.3-70b-instruct `
- `nvidia/vila `
- `nvidia/llama-3.2-nv-embedqa-1b-v2 `
- `nvidia/nv-embedqa-e5-v5 `
- `nvidia/nv-embedqa-mistral-7b-v2 `
- `snowflake/arctic-embed-l `
## Prerequisites
### NVIDIA API Keys
Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). Use this key for the `NVIDIA_API_KEY` environment variable.
### Deploy NeMo Microservices Platform
The NVIDIA NeMo microservices platform supports end-to-end microservice deployment of a complete AI flywheel on your Kubernetes cluster through the NeMo Microservices Helm Chart. Please reference the [NVIDIA NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for platform prerequisites and instructions to install and deploy the platform.
## Supported Services
Each Llama Stack API corresponds to a specific NeMo microservice. The core microservices (Customizer, Evaluator, Guardrails) are exposed by the same endpoint. The platform components (Data Store) are each exposed by separate endpoints.
### Inference: NVIDIA NIM
NVIDIA NIM is used for running inference with registered models. There are two ways to access NVIDIA NIMs:
1. Hosted (default): Preview APIs hosted at https://integrate.api.nvidia.com (Requires an API key)
2. Self-hosted: NVIDIA NIMs that run on your own infrastructure.
The deployed platform includes the NIM Proxy microservice, which is the service that provides to access your NIMs (for example, to run inference on a model). Set the `NVIDIA_BASE_URL` environment variable to use your NVIDIA NIM Proxy deployment.
### Datasetio API: NeMo Data Store
The NeMo Data Store microservice serves as the default file storage solution for the NeMo microservices platform. It exposts APIs compatible with the Hugging Face Hub client (`HfApi`), so you can use the client to interact with Data Store. The `NVIDIA_DATASETS_URL` environment variable should point to your NeMo Data Store endpoint.
See the {repopath}`NVIDIA Datasetio docs::llama_stack/providers/remote/datasetio/nvidia/README.md` for supported features and example usage.
### Eval API: NeMo Evaluator
The NeMo Evaluator microservice supports evaluation of LLMs. Launching an Evaluation job with NeMo Evaluator requires an Evaluation Config (an object that contains metadata needed by the job). A Llama Stack Benchmark maps to an Evaluation Config, so registering a Benchmark creates an Evaluation Config in NeMo Evaluator. The `NVIDIA_EVALUATOR_URL` environment variable should point to your NeMo Microservices endpoint.
See the {repopath}`NVIDIA Eval docs::llama_stack/providers/remote/eval/nvidia/README.md` for supported features and example usage.
### Post-Training API: NeMo Customizer
The NeMo Customizer microservice supports fine-tuning models. You can reference {repopath}`this list of supported models::llama_stack/providers/remote/post_training/nvidia/models.py` that can be fine-tuned using Llama Stack. The `NVIDIA_CUSTOMIZER_URL` environment variable should point to your NeMo Microservices endpoint.
See the {repopath}`NVIDIA Post-Training docs::llama_stack/providers/remote/post_training/nvidia/README.md` for supported features and example usage.
### Safety API: NeMo Guardrails
The NeMo Guardrails microservice sits between your application and the LLM, and adds checks and content moderation to a model. The `GUARDRAILS_SERVICE_URL` environment variable should point to your NeMo Microservices endpoint.
See the {repopath}`NVIDIA Safety docs::llama_stack/providers/remote/safety/nvidia/README.md` for supported features and example usage.
## Deploying models
In order to use a registered model with the Llama Stack APIs, ensure the corresponding NIM is deployed to your environment. For example, you can use the NIM Proxy microservice to deploy `meta/llama-3.2-1b-instruct`.
Note: For improved inference speeds, we need to use NIM with `fast_outlines` guided decoding system (specified in the request body). This is the default if you deployed the platform with the NeMo Microservices Helm Chart.
```sh
# URL to NeMo NIM Proxy service
export NEMO_URL="http://nemo.test"
curl --location "$NEMO_URL/v1/deployment/model-deployments" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "llama-3.2-1b-instruct",
"namespace": "meta",
"config": {
"model": "meta/llama-3.2-1b-instruct",
"nim_deployment": {
"image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct",
"image_tag": "1.8.3",
"pvc_size": "25Gi",
"gpu": 1,
"additional_envs": {
"NIM_GUIDED_DECODING_BACKEND": "fast_outlines"
}
}
}
}'
```
This NIM deployment should take approximately 10 minutes to go live. [See the docs](https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/deploy-nims.html) for more information on how to deploy a NIM and verify it's available for inference.
You can also remove a deployed NIM to free up GPU resources, if needed.
```sh
export NEMO_URL="http://nemo.test"
curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct"
```
## Running Llama Stack with NVIDIA
You can do this via venv (build code), or Docker which has a pre-built image.
### Via Docker
This method allows you to get started quickly without having to build the distribution code.
```bash
LLAMA_STACK_PORT=8321
docker run \
-it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
llamastack/distribution-nvidia \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
```
### Via venv
If you've set up your local development environment, you can also build the image using your local virtual environment.
```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
llama stack build --distro nvidia --image-type venv
llama stack run ./run.yaml \
--port 8321 \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
--env INFERENCE_MODEL=$INFERENCE_MODEL
```
## Example Notebooks
For examples of how to use the NVIDIA Distribution to run inference, fine-tune, evaluate, and run safety checks on your LLMs, you can reference the example notebooks in {repopath}`docs/notebooks/nvidia`.

View file

@ -192,6 +192,6 @@ The NVIDIA distribution is ideal for:
## Related Guides ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions - **[Available Distributions](../list_of_distributions)** - Compare with other distributions
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution - **[Building Custom Distributions](../building_distro)** - Create your own distribution

View file

@ -57,6 +57,6 @@ The Passthrough distribution is ideal for:
## Related Guides ## Related Guides
- **[Building Custom Distributions](../building-distro)** - Create your own distribution - **[Building Custom Distributions](../building_distro)** - Create your own distribution
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Starting Llama Stack Server](../starting-llama-stack-server)** - How to run distributions - **[Starting Llama Stack Server](../starting_llama_stack_server)** - How to run distributions

View file

@ -231,6 +231,6 @@ The starter distribution is ideal for developers who want to experiment with dif
## Related Guides ## Related Guides
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions - **[Available Distributions](../list_of_distributions)** - Compare with other distributions
- **[Configuration Reference](../configuration)** - Understanding configuration options - **[Configuration Reference](../configuration)** - Understanding configuration options
- **[Building Custom Distributions](../building-distro)** - Create your own distribution - **[Building Custom Distributions](../building_distro)** - Create your own distribution

View file

@ -13,13 +13,13 @@ You can run a Llama Stack server in one of the following ways:
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (e.g. fireworks, together, groq, etc.) This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (e.g. fireworks, together, groq, etc.)
**See:** [Using Llama Stack as a Library](./importing-as-library) **See:** [Using Llama Stack as a Library](./importing_as_library)
## Container ## Container
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have.
**See:** [Available Distributions](./list-of-distributions) for more details on selecting the right distribution. **See:** [Available Distributions](./list_of_distributions) for more details on selecting the right distribution.
## Kubernetes ## Kubernetes
@ -69,7 +69,7 @@ If you have built a container image and want to deploy it in a Kubernetes cluste
## Related Guides ## Related Guides
- **[Available Distributions](./list-of-distributions)** - Choose the right distribution - **[Available Distributions](./list_of_distributions)** - Choose the right distribution
- **[Building Custom Distributions](./building-distro)** - Create your own distribution - **[Building Custom Distributions](./building_distro)** - Create your own distribution
- **[Configuration Reference](./configuration)** - Understanding configuration options - **[Configuration Reference](./configuration)** - Understanding configuration options
- **[Customizing run.yaml](./customizing-run-yaml)** - Adapt configurations to your environment - **[Customizing run.yaml](./customizing_run_yaml)** - Adapt configurations to your environment

View file

@ -199,7 +199,7 @@ pprint(response)
Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets. Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.
In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building-applications/playground) for an interactive interface to upload datasets and run scorings. In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building_applications/playground) for an interactive interface to upload datasets and run scorings.
```python ```python
judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8" judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"

View file

@ -1,6 +1,6 @@
# References # References
- [Python SDK Reference](python-sdk) for the Llama Stack Python SDK - [Python SDK Reference](python_sdk) for the Llama Stack Python SDK
- [Llama CLI](llama-cli) for building and running your Llama Stack server - [Llama CLI](llama_cli) for building and running your Llama Stack server
- [Llama Stack Client CLI](client-cli) for interacting with your Llama Stack server - [Llama Stack Client CLI](client_cli) for interacting with your Llama Stack server
- [Evaluations Reference](evals-reference) for running evaluations and benchmarks - [Evaluations Reference](evals_reference) for running evaluations and benchmarks

View file

@ -30,7 +30,7 @@ You have two ways to install Llama Stack:
1. `download`: Supports downloading models from Meta or Hugging Face. 1. `download`: Supports downloading models from Meta or Hugging Face.
2. `model`: Lists available models and their properties. 2. `model`: Lists available models and their properties.
3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building-distro) documentation. 3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building_distro) documentation.
### Sample Usage ### Sample Usage