mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-03 19:57:35 +00:00
docs naming update
This commit is contained in:
parent
584f3592ce
commit
9a8652ee30
42 changed files with 377 additions and 81 deletions
|
@ -15,7 +15,7 @@ The Evaluation API works with several related APIs to provide comprehensive eval
|
|||
- `/eval` + `/benchmarks` API - Generate outputs and perform scoring
|
||||
|
||||
:::tip
|
||||
For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide.
|
||||
For conceptual information about evaluations, see our [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide.
|
||||
:::
|
||||
|
||||
## Meta Reference
|
||||
|
@ -112,7 +112,7 @@ Llama Stack pre-registers several popular open-benchmarks for easy model evaluat
|
|||
|
||||
## Next Steps
|
||||
|
||||
- Check out the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) guide for detailed conceptual information
|
||||
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples
|
||||
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive CLI and API usage
|
||||
- Check out the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) guide for detailed conceptual information
|
||||
- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
|
||||
- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive CLI and API usage
|
||||
- Explore the [Scoring](./scoring.mdx) documentation for available scoring functions
|
||||
|
|
|
@ -300,6 +300,6 @@ customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:=http://nemo.test}
|
|||
|
||||
## Next Steps
|
||||
|
||||
- Check out the [Building Applications - Fine-tuning](../building-applications/index.mdx) guide for application-level examples
|
||||
- Check out the [Building Applications - Fine-tuning](../building_applications/index.mdx) guide for application-level examples
|
||||
- See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation
|
||||
- Review the [API Reference](../api-reference/post-training.mdx) for complete API documentation
|
||||
- Review the [API Reference](../api_reference/post_training.mdx) for complete API documentation
|
|
@ -188,6 +188,6 @@ The Scoring API works closely with the [Evaluation](./evaluation.mdx) API to pro
|
|||
## Next Steps
|
||||
|
||||
- Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations
|
||||
- See the [Building Applications - Evaluation](../building-applications/evals.mdx) guide for application examples
|
||||
- Review the [Evaluation Reference](../references/evals-reference.mdx) for comprehensive scoring function usage
|
||||
- Explore the [Evaluation Concepts](../concepts/evaluation-concepts.mdx) for detailed conceptual information
|
||||
- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
|
||||
- Review the [Evaluation Reference](../references/evals_reference.mdx) for comprehensive scoring function usage
|
||||
- Explore the [Evaluation Concepts](../concepts/evaluation_concepts.mdx) for detailed conceptual information
|
||||
|
|
|
@ -102,11 +102,11 @@ Each turn consists of multiple steps that represent the agent's thought process:
|
|||
|
||||
## Agent Execution Loop
|
||||
|
||||
Refer to the [Agent Execution Loop](./agent-execution-loop) for more details on what happens within an agent turn.
|
||||
Refer to the [Agent Execution Loop](./agent_execution_loop) for more details on what happens within an agent turn.
|
||||
|
||||
## Related Resources
|
||||
|
||||
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding the internal processing flow
|
||||
- **[Agent Execution Loop](./agent_execution_loop)** - Understanding the internal processing flow
|
||||
- **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced agents
|
||||
- **[Tools Integration](./tools)** - Extending agent capabilities with external tools
|
||||
- **[Safety Guardrails](./safety)** - Implementing responsible AI practices
|
||||
|
|
|
@ -21,8 +21,8 @@ Here are the key topics that will help you build effective AI applications:
|
|||
|
||||
### 🤖 **Agent Development**
|
||||
- **[Agent Framework](./agent)** - Understand the components and design patterns of the Llama Stack agent framework
|
||||
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process information, make decisions, and execute actions
|
||||
- **[Agents vs Responses API](./responses-vs-agents)** - Learn when to use each API for different use cases
|
||||
- **[Agent Execution Loop](./agent_execution_loop)** - How agents process information, make decisions, and execute actions
|
||||
- **[Agents vs Responses API](./responses_vs_agents)** - Learn when to use each API for different use cases
|
||||
|
||||
### 📚 **Knowledge Integration**
|
||||
- **[RAG (Retrieval-Augmented Generation)](./rag)** - Enhance your agents with external knowledge through retrieval mechanisms
|
||||
|
|
|
@ -215,7 +215,7 @@ Use this framework to choose the right API for your use case:
|
|||
## Related Resources
|
||||
|
||||
- **[Agents](./agent)** - Understanding the Agents API fundamentals
|
||||
- **[Agent Execution Loop](./agent-execution-loop)** - How agents process turns and steps
|
||||
- **[Agent Execution Loop](./agent_execution_loop)** - How agents process turns and steps
|
||||
- **[Tools Integration](./tools)** - Adding capabilities to both APIs
|
||||
- **[OpenAI Compatibility](/docs/providers/openai-compatibility)** - Using OpenAI-compatible endpoints
|
||||
- **[Safety Guardrails](./safety)** - Implementing safety measures in agents
|
||||
|
|
|
@ -389,7 +389,7 @@ client.shields.register(
|
|||
## Related Resources
|
||||
|
||||
- **[Agents](./agent)** - Integrating safety shields with intelligent agents
|
||||
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding safety in the execution flow
|
||||
- **[Agent Execution Loop](./agent_execution_loop)** - Understanding safety in the execution flow
|
||||
- **[Evaluations](./evals)** - Evaluating safety shield effectiveness
|
||||
- **[Telemetry](./telemetry)** - Monitoring safety violations and metrics
|
||||
- **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details
|
||||
|
|
|
@ -335,6 +335,6 @@ response = agent.create_turn(
|
|||
|
||||
- **[Agents](./agent)** - Building intelligent agents with tools
|
||||
- **[RAG (Retrieval Augmented Generation)](./rag)** - Using knowledge retrieval tools
|
||||
- **[Agent Execution Loop](./agent-execution-loop)** - Understanding tool execution flow
|
||||
- **[Agent Execution Loop](./agent_execution_loop)** - Understanding tool execution flow
|
||||
- **[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Comprehensive examples
|
||||
- **[Llama Stack Apps Examples](https://github.com/meta-llama/llama-stack-apps)** - Real-world tool implementations
|
||||
|
|
|
@ -30,7 +30,7 @@ The list of open-benchmarks we currently support:
|
|||
- [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
|
||||
- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
|
||||
|
||||
You can follow this [contributing guide](../references/evals-reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
|
||||
You can follow this [contributing guide](../references/evals_reference.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
|
||||
|
||||
### Run evaluation on open-benchmarks via CLI
|
||||
|
||||
|
@ -67,5 +67,5 @@ evaluation results over there.
|
|||
## What's Next?
|
||||
|
||||
- Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
|
||||
- Check out our [Building Applications - Evaluation](../building-applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
|
||||
- Check out our [Evaluation Reference](../references/evals-reference.mdx) for more details on the APIs.
|
||||
- Check out our [Building Applications - Evaluation](../building_applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
|
||||
- Check out our [Evaluation Reference](../references/evals_reference.mdx) for more details on the APIs.
|
|
@ -13,6 +13,6 @@ This section covers the key concepts you need to understand to work effectively
|
|||
|
||||
- **[Architecture](./architecture)** - Llama Stack's service-oriented design and benefits
|
||||
- **[APIs](./apis)** - Available REST APIs and planned capabilities
|
||||
- **[API Providers](./api-providers)** - Remote vs inline provider implementations
|
||||
- **[API Providers](./api_providers)** - Remote vs inline provider implementations
|
||||
- **[Distributions](./distributions)** - Pre-packaged provider configurations
|
||||
- **[Resources](./resources)** - Resource federation and registration
|
||||
|
|
|
@ -133,8 +133,8 @@ Keep PRs small and focused. Split large changes into logically grouped, smaller
|
|||
|
||||
Learn how to extend Llama Stack with new capabilities:
|
||||
|
||||
- **[Adding a New API Provider](./new-api-provider)** - Add new API providers to the Stack
|
||||
- **[Adding a Vector Database](./new-vector-database)** - Add new vector databases
|
||||
- **[Adding a New API Provider](./new_api_provider)** - Add new API providers to the Stack
|
||||
- **[Adding a Vector Database](./new_vector_database)** - Add new vector databases
|
||||
- **[External Providers](/docs/providers/external)** - Add external providers to the Stack
|
||||
|
||||
## Testing
|
||||
|
@ -304,12 +304,12 @@ By contributing to Llama Stack, you agree that your contributions will be licens
|
|||
|
||||
## Advanced Topics
|
||||
|
||||
- **[Testing Record-Replay System](./testing-record-replay)** - Deep dive into testing internals
|
||||
- **[Testing Record-Replay System](./testing_record_replay)** - Deep dive into testing internals
|
||||
|
||||
## Related Resources
|
||||
|
||||
- **[Adding API Providers](./new-api-provider)** - Extend Llama Stack with new providers
|
||||
- **[Vector Database Integration](./new-vector-database)** - Add vector database support
|
||||
- **[Adding API Providers](./new_api_provider)** - Extend Llama Stack with new providers
|
||||
- **[Vector Database Integration](./new_vector_database)** - Add vector database support
|
||||
- **[External Providers](/docs/providers/external)** - External provider development
|
||||
- **[GitHub Discussions](https://github.com/meta-llama/llama-stack/discussions)** - Community discussion
|
||||
- **[Discord](https://discord.gg/llama-stack)** - Real-time community chat
|
||||
|
|
|
@ -279,5 +279,5 @@ class YourProvider:
|
|||
|
||||
- **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
|
||||
- **[External Providers](/docs/providers/external)** - Alternative implementation approach
|
||||
- **[Vector Database Guide](./new-vector-database)** - Specialized provider implementation
|
||||
- **[Testing Record-Replay](./testing-record-replay)** - Advanced testing techniques
|
||||
- **[Vector Database Guide](./new_vector_database)** - Specialized provider implementation
|
||||
- **[Testing Record-Replay](./testing_record_replay)** - Advanced testing techniques
|
|
@ -489,5 +489,5 @@ async def add_chunks(self, chunks: List[Chunk]) -> List[str]:
|
|||
|
||||
- **[Vector IO Providers](/docs/providers/vector_io)** - Existing provider implementations
|
||||
- **[Core Concepts](/docs/concepts)** - Understanding Llama Stack architecture
|
||||
- **[New API Provider Guide](./new-api-provider)** - General provider development
|
||||
- **[Testing Guide](./testing-record-replay)** - Advanced testing techniques
|
||||
- **[New API Provider Guide](./new_api_provider)** - General provider development
|
||||
- **[Testing Guide](./testing_record_replay)** - Advanced testing techniques
|
|
@ -167,7 +167,7 @@ You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and
|
|||
```
|
||||
|
||||
:::tip
|
||||
The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing-run-yaml).
|
||||
The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
|
||||
:::
|
||||
|
||||
</TabItem>
|
|
@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem';
|
|||
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
|
||||
|
||||
:::note
|
||||
The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing-run-yaml).
|
||||
The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](./customizing_run_yaml).
|
||||
:::
|
||||
|
||||
<details>
|
||||
|
|
|
@ -51,5 +51,5 @@ The goal is to take the generated template and adapt it to your specific infrast
|
|||
## Related Guides
|
||||
|
||||
- **[Configuration Reference](./configuration)** - Detailed configuration file format and options
|
||||
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run with your custom configuration
|
||||
- **[Building Custom Distributions](./building-distro)** - Create distributions with your preferred providers
|
||||
- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run with your custom configuration
|
||||
- **[Building Custom Distributions](./building_distro)** - Create distributions with your preferred providers
|
|
@ -36,7 +36,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
|
|||
response = client.models.list()
|
||||
```
|
||||
|
||||
If you've created a [custom distribution](./building-distro), you can also use the run.yaml configuration file directly:
|
||||
If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:
|
||||
|
||||
```python
|
||||
client = LlamaStackAsLibraryClient(config_path)
|
||||
|
@ -60,6 +60,6 @@ Library mode is ideal when:
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Building Custom Distributions](./building-distro)** - Create your own distribution for library use
|
||||
- **[Building Custom Distributions](./building_distro)** - Create your own distribution for library use
|
||||
- **[Configuration Reference](./configuration)** - Understanding the configuration format
|
||||
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - Alternative server-based deployment
|
||||
- **[Starting Llama Stack Server](./starting_llama_stack_server)** - Alternative server-based deployment
|
|
@ -13,9 +13,9 @@ This section provides an overview of the distributions available in Llama Stack.
|
|||
|
||||
## Distribution Guides
|
||||
|
||||
- **[Available Distributions](./list-of-distributions)** - Complete list and comparison of all distributions
|
||||
- **[Building Custom Distributions](./building-distro)** - Create your own distribution from scratch
|
||||
- **[Customizing Configuration](./customizing-run-yaml)** - Customize run.yaml for your needs
|
||||
- **[Starting Llama Stack Server](./starting-llama-stack-server)** - How to run distributions
|
||||
- **[Importing as Library](./importing-as-library)** - Use distributions in your code
|
||||
- **[Available Distributions](./list_of_distributions)** - Complete list and comparison of all distributions
|
||||
- **[Building Custom Distributions](./building_distro)** - Create your own distribution from scratch
|
||||
- **[Customizing Configuration](./customizing_run_yaml)** - Customize run.yaml for your needs
|
||||
- **[Starting Llama Stack Server](./starting_llama_stack_server)** - How to run distributions
|
||||
- **[Importing as Library](./importing_as_library)** - Use distributions in your code
|
||||
- **[Configuration Reference](./configuration)** - Configuration file format details
|
||||
|
|
|
@ -34,7 +34,7 @@ Llama Stack provides several pre-configured distributions to help you get starte
|
|||
docker pull llama-stack/distribution-starter
|
||||
```
|
||||
|
||||
**Guides:** [Starter Distribution Guide](./self-hosted-distro/starter)
|
||||
**Guides:** [Starter Distribution Guide](./self_hosted_distro/starter)
|
||||
|
||||
### 🖥️ Self-Hosted with GPU
|
||||
|
||||
|
@ -47,14 +47,14 @@ docker pull llama-stack/distribution-starter
|
|||
docker pull llama-stack/distribution-meta-reference-gpu
|
||||
```
|
||||
|
||||
**Guides:** [Meta Reference GPU Guide](./self-hosted-distro/meta-reference-gpu)
|
||||
**Guides:** [Meta Reference GPU Guide](./self_hosted_distro/meta_reference_gpu)
|
||||
|
||||
### 🖥️ Self-Hosted with NVIDA NeMo Microservices
|
||||
|
||||
**Use `nvidia` if you:**
|
||||
- Want to use Llama Stack with NVIDIA NeMo Microservices
|
||||
|
||||
**Guides:** [NVIDIA Distribution Guide](./self-hosted-distro/nvidia)
|
||||
**Guides:** [NVIDIA Distribution Guide](./self_hosted_distro/nvidia)
|
||||
|
||||
### ☁️ Managed Hosting
|
||||
|
||||
|
@ -65,7 +65,7 @@ docker pull llama-stack/distribution-meta-reference-gpu
|
|||
|
||||
**Partners:** [Fireworks.ai](https://fireworks.ai) and [Together.xyz](https://together.xyz)
|
||||
|
||||
**Guides:** [Remote-Hosted Endpoints](./remote-hosted-distro/)
|
||||
**Guides:** [Remote-Hosted Endpoints](./remote_hosted_distro/)
|
||||
|
||||
### 📱 Mobile Development
|
||||
|
||||
|
@ -74,8 +74,8 @@ docker pull llama-stack/distribution-meta-reference-gpu
|
|||
- Need on-device inference capabilities
|
||||
- Want offline functionality
|
||||
|
||||
- [iOS SDK](./ondevice-distro/ios-sdk)
|
||||
- [Android SDK](./ondevice-distro/android-sdk)
|
||||
- [iOS SDK](./ondevice_distro/ios_sdk)
|
||||
- [Android SDK](./ondevice_distro/android_sdk)
|
||||
|
||||
### 🔧 Custom Solutions
|
||||
|
||||
|
@ -84,23 +84,23 @@ docker pull llama-stack/distribution-meta-reference-gpu
|
|||
- You need custom configurations
|
||||
- You want to optimize for your specific use case
|
||||
|
||||
**Guides:** [Building Custom Distributions](./building-distro)
|
||||
**Guides:** [Building Custom Distributions](./building_distro)
|
||||
|
||||
## Detailed Documentation
|
||||
|
||||
### Self-Hosted Distributions
|
||||
|
||||
- **[Starter Distribution](./self-hosted-distro/starter)** - General purpose template
|
||||
- **[Meta Reference GPU](./self-hosted-distro/meta-reference-gpu)** - High-performance GPU inference
|
||||
- **[Starter Distribution](./self_hosted_distro/starter)** - General purpose template
|
||||
- **[Meta Reference GPU](./self_hosted_distro/meta_reference_gpu)** - High-performance GPU inference
|
||||
|
||||
### Remote-Hosted Solutions
|
||||
|
||||
- **[Remote-Hosted Overview](./remote-hosted-distro/)** - Managed hosting options
|
||||
- **[Remote-Hosted Overview](./remote_hosted_distro/)** - Managed hosting options
|
||||
|
||||
### Mobile SDKs
|
||||
|
||||
- **[iOS SDK](./ondevice-distro/ios-sdk)** - Native iOS development
|
||||
- **[Android SDK](./ondevice-distro/android-sdk)** - Native Android development
|
||||
- **[iOS SDK](./ondevice_distro/ios_sdk)** - Native iOS development
|
||||
- **[Android SDK](./ondevice_distro/android_sdk)** - Native Android development
|
||||
|
||||
## Decision Flow
|
||||
|
|
@ -306,4 +306,4 @@ The API interface is generated using the OpenAPI standard with [Stainless](https
|
|||
- **[llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin)** - Official Kotlin SDK repository
|
||||
- **[Android Demo App](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app)** - Complete example app
|
||||
- **[ExecuTorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
|
||||
- **[iOS SDK](./ios-sdk)** - iOS development guide
|
||||
- **[iOS SDK](./ios_sdk)** - iOS development guide
|
|
@ -176,4 +176,4 @@ The iOS SDK is ideal for:
|
|||
- **[llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)** - Official Swift SDK repository
|
||||
- **[iOS Calendar Assistant](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant)** - Complete example app
|
||||
- **[executorch](https://github.com/pytorch/executorch/)** - PyTorch on-device inference library
|
||||
- **[Android SDK](./android-sdk)** - Android development guide
|
||||
- **[Android SDK](./android_sdk)** - Android development guide
|
|
@ -48,6 +48,6 @@ $ llama-stack-client models list
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Available Distributions](../list-of-distributions)** - Compare with other distribution types
|
||||
- **[Available Distributions](../list_of_distributions)** - Compare with other distribution types
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Using as Library](../importing-as-library)** - Alternative deployment approach
|
||||
- **[Using as Library](../importing_as_library)** - Alternative deployment approach
|
|
@ -105,6 +105,6 @@ The watsonx distribution is ideal for:
|
|||
## Related Guides
|
||||
|
||||
- **[Remote-Hosted Overview](./index)** - Overview of remote-hosted distributions
|
||||
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
|
||||
- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
|
||||
- **[Building Custom Distributions](../building_distro)** - Create your own distribution
|
|
@ -217,6 +217,6 @@ The Dell distribution is ideal for:
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Dell-TGI Distribution](./dell-tgi)** - Dell's TGI-specific distribution
|
||||
- **[Dell-TGI Distribution](./dell_tgi)** - Dell's TGI-specific distribution
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
|
||||
- **[Building Custom Distributions](../building_distro)** - Create your own distribution
|
|
@ -98,4 +98,4 @@ The Dell-TGI distribution is ideal for:
|
|||
|
||||
- **[Dell Distribution](./dell)** - Dell's standard distribution
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
|
||||
- **[Building Custom Distributions](../building_distro)** - Create your own distribution
|
125
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
Normal file
125
docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
Normal file
|
@ -0,0 +1,125 @@
|
|||
---
|
||||
orphan: true
|
||||
---
|
||||
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
|
||||
# Meta Reference GPU Distribution
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 2
|
||||
:hidden:
|
||||
|
||||
self
|
||||
```
|
||||
|
||||
The `llamastack/distribution-meta-reference-gpu` distribution consists of the following provider configurations:
|
||||
|
||||
| API | Provider(s) |
|
||||
|-----|-------------|
|
||||
| agents | `inline::meta-reference` |
|
||||
| datasetio | `remote::huggingface`, `inline::localfs` |
|
||||
| eval | `inline::meta-reference` |
|
||||
| inference | `inline::meta-reference` |
|
||||
| safety | `inline::llama-guard` |
|
||||
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
|
||||
| telemetry | `inline::meta-reference` |
|
||||
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
|
||||
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
||||
|
||||
|
||||
Note that you need access to nvidia GPUs to run this distribution. This distribution is not compatible with CPU-only machines or machines with AMD GPUs.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The following environment variables can be configured:
|
||||
|
||||
- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
|
||||
- `INFERENCE_MODEL`: Inference model loaded into the Meta Reference server (default: `meta-llama/Llama-3.2-3B-Instruct`)
|
||||
- `INFERENCE_CHECKPOINT_DIR`: Directory containing the Meta Reference model checkpoint (default: `null`)
|
||||
- `SAFETY_MODEL`: Name of the safety (Llama-Guard) model to use (default: `meta-llama/Llama-Guard-3-1B`)
|
||||
- `SAFETY_CHECKPOINT_DIR`: Directory containing the Llama-Guard model checkpoint (default: `null`)
|
||||
|
||||
|
||||
## Prerequisite: Downloading Models
|
||||
|
||||
Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
|
||||
|
||||
```
|
||||
$ llama model list --downloaded
|
||||
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
|
||||
┃ Model ┃ Size ┃ Modified Time ┃
|
||||
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
|
||||
│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │
|
||||
├─────────────────────────────────────────┼──────────┼─────────────────────┤
|
||||
│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │
|
||||
└─────────────────────────────────────────┴──────────┴─────────────────────┘
|
||||
```
|
||||
|
||||
## Running the Distribution
|
||||
|
||||
You can do this via venv or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
This method allows you to get started quickly without having to build the distribution code.
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_PORT=8321
|
||||
docker run \
|
||||
-it \
|
||||
--pull always \
|
||||
--gpu all \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ~/.llama:/root/.llama \
|
||||
llamastack/distribution-meta-reference-gpu \
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
|
||||
```
|
||||
|
||||
If you are using Llama Stack Safety / Shield APIs, use:
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
-it \
|
||||
--pull always \
|
||||
--gpu all \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ~/.llama:/root/.llama \
|
||||
llamastack/distribution-meta-reference-gpu \
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
||||
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
|
||||
```
|
||||
|
||||
### Via venv
|
||||
|
||||
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
|
||||
|
||||
```bash
|
||||
llama stack build --distro meta-reference-gpu --image-type venv
|
||||
llama stack run distributions/meta-reference-gpu/run.yaml \
|
||||
--port 8321 \
|
||||
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
|
||||
```
|
||||
|
||||
If you are using Llama Stack Safety / Shield APIs, use:
|
||||
|
||||
```bash
|
||||
llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
|
||||
--port 8321 \
|
||||
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
||||
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
|
||||
```
|
|
@ -149,6 +149,6 @@ The Meta Reference GPU distribution is ideal for:
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
|
||||
- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
|
||||
- **[Building Custom Distributions](../building_distro)** - Create your own distribution
|
171
docs/docs/distributions/self_hosted_distro/nvidia.md
Normal file
171
docs/docs/distributions/self_hosted_distro/nvidia.md
Normal file
|
@ -0,0 +1,171 @@
|
|||
---
|
||||
orphan: true
|
||||
---
|
||||
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
|
||||
# NVIDIA Distribution
|
||||
|
||||
The `llamastack/distribution-nvidia` distribution consists of the following provider configurations.
|
||||
|
||||
| API | Provider(s) |
|
||||
|-----|-------------|
|
||||
| agents | `inline::meta-reference` |
|
||||
| datasetio | `inline::localfs`, `remote::nvidia` |
|
||||
| eval | `remote::nvidia` |
|
||||
| files | `inline::localfs` |
|
||||
| inference | `remote::nvidia` |
|
||||
| post_training | `remote::nvidia` |
|
||||
| safety | `remote::nvidia` |
|
||||
| scoring | `inline::basic` |
|
||||
| telemetry | `inline::meta-reference` |
|
||||
| tool_runtime | `inline::rag-runtime` |
|
||||
| vector_io | `inline::faiss` |
|
||||
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The following environment variables can be configured:
|
||||
|
||||
- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``)
|
||||
- `NVIDIA_APPEND_API_VERSION`: Whether to append the API version to the base_url (default: `True`)
|
||||
- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`)
|
||||
- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`)
|
||||
- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`)
|
||||
- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`)
|
||||
- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`)
|
||||
- `NVIDIA_GUARDRAILS_CONFIG_ID`: NVIDIA Guardrail Configuration ID (default: `self-check`)
|
||||
- `NVIDIA_EVALUATOR_URL`: URL for the NeMo Evaluator Service (default: `http://0.0.0.0:7331`)
|
||||
- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`)
|
||||
- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`)
|
||||
|
||||
### Models
|
||||
|
||||
The following models are available by default:
|
||||
|
||||
- `meta/llama3-8b-instruct `
|
||||
- `meta/llama3-70b-instruct `
|
||||
- `meta/llama-3.1-8b-instruct `
|
||||
- `meta/llama-3.1-70b-instruct `
|
||||
- `meta/llama-3.1-405b-instruct `
|
||||
- `meta/llama-3.2-1b-instruct `
|
||||
- `meta/llama-3.2-3b-instruct `
|
||||
- `meta/llama-3.2-11b-vision-instruct `
|
||||
- `meta/llama-3.2-90b-vision-instruct `
|
||||
- `meta/llama-3.3-70b-instruct `
|
||||
- `nvidia/vila `
|
||||
- `nvidia/llama-3.2-nv-embedqa-1b-v2 `
|
||||
- `nvidia/nv-embedqa-e5-v5 `
|
||||
- `nvidia/nv-embedqa-mistral-7b-v2 `
|
||||
- `snowflake/arctic-embed-l `
|
||||
|
||||
|
||||
## Prerequisites
|
||||
### NVIDIA API Keys
|
||||
|
||||
Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). Use this key for the `NVIDIA_API_KEY` environment variable.
|
||||
|
||||
### Deploy NeMo Microservices Platform
|
||||
The NVIDIA NeMo microservices platform supports end-to-end microservice deployment of a complete AI flywheel on your Kubernetes cluster through the NeMo Microservices Helm Chart. Please reference the [NVIDIA NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for platform prerequisites and instructions to install and deploy the platform.
|
||||
|
||||
## Supported Services
|
||||
Each Llama Stack API corresponds to a specific NeMo microservice. The core microservices (Customizer, Evaluator, Guardrails) are exposed by the same endpoint. The platform components (Data Store) are each exposed by separate endpoints.
|
||||
|
||||
### Inference: NVIDIA NIM
|
||||
NVIDIA NIM is used for running inference with registered models. There are two ways to access NVIDIA NIMs:
|
||||
1. Hosted (default): Preview APIs hosted at https://integrate.api.nvidia.com (Requires an API key)
|
||||
2. Self-hosted: NVIDIA NIMs that run on your own infrastructure.
|
||||
|
||||
The deployed platform includes the NIM Proxy microservice, which is the service that provides to access your NIMs (for example, to run inference on a model). Set the `NVIDIA_BASE_URL` environment variable to use your NVIDIA NIM Proxy deployment.
|
||||
|
||||
### Datasetio API: NeMo Data Store
|
||||
The NeMo Data Store microservice serves as the default file storage solution for the NeMo microservices platform. It exposts APIs compatible with the Hugging Face Hub client (`HfApi`), so you can use the client to interact with Data Store. The `NVIDIA_DATASETS_URL` environment variable should point to your NeMo Data Store endpoint.
|
||||
|
||||
See the {repopath}`NVIDIA Datasetio docs::llama_stack/providers/remote/datasetio/nvidia/README.md` for supported features and example usage.
|
||||
|
||||
### Eval API: NeMo Evaluator
|
||||
The NeMo Evaluator microservice supports evaluation of LLMs. Launching an Evaluation job with NeMo Evaluator requires an Evaluation Config (an object that contains metadata needed by the job). A Llama Stack Benchmark maps to an Evaluation Config, so registering a Benchmark creates an Evaluation Config in NeMo Evaluator. The `NVIDIA_EVALUATOR_URL` environment variable should point to your NeMo Microservices endpoint.
|
||||
|
||||
See the {repopath}`NVIDIA Eval docs::llama_stack/providers/remote/eval/nvidia/README.md` for supported features and example usage.
|
||||
|
||||
### Post-Training API: NeMo Customizer
|
||||
The NeMo Customizer microservice supports fine-tuning models. You can reference {repopath}`this list of supported models::llama_stack/providers/remote/post_training/nvidia/models.py` that can be fine-tuned using Llama Stack. The `NVIDIA_CUSTOMIZER_URL` environment variable should point to your NeMo Microservices endpoint.
|
||||
|
||||
See the {repopath}`NVIDIA Post-Training docs::llama_stack/providers/remote/post_training/nvidia/README.md` for supported features and example usage.
|
||||
|
||||
### Safety API: NeMo Guardrails
|
||||
The NeMo Guardrails microservice sits between your application and the LLM, and adds checks and content moderation to a model. The `GUARDRAILS_SERVICE_URL` environment variable should point to your NeMo Microservices endpoint.
|
||||
|
||||
See the {repopath}`NVIDIA Safety docs::llama_stack/providers/remote/safety/nvidia/README.md` for supported features and example usage.
|
||||
|
||||
## Deploying models
|
||||
In order to use a registered model with the Llama Stack APIs, ensure the corresponding NIM is deployed to your environment. For example, you can use the NIM Proxy microservice to deploy `meta/llama-3.2-1b-instruct`.
|
||||
|
||||
Note: For improved inference speeds, we need to use NIM with `fast_outlines` guided decoding system (specified in the request body). This is the default if you deployed the platform with the NeMo Microservices Helm Chart.
|
||||
```sh
|
||||
# URL to NeMo NIM Proxy service
|
||||
export NEMO_URL="http://nemo.test"
|
||||
|
||||
curl --location "$NEMO_URL/v1/deployment/model-deployments" \
|
||||
-H 'accept: application/json' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"name": "llama-3.2-1b-instruct",
|
||||
"namespace": "meta",
|
||||
"config": {
|
||||
"model": "meta/llama-3.2-1b-instruct",
|
||||
"nim_deployment": {
|
||||
"image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct",
|
||||
"image_tag": "1.8.3",
|
||||
"pvc_size": "25Gi",
|
||||
"gpu": 1,
|
||||
"additional_envs": {
|
||||
"NIM_GUIDED_DECODING_BACKEND": "fast_outlines"
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
This NIM deployment should take approximately 10 minutes to go live. [See the docs](https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/deploy-nims.html) for more information on how to deploy a NIM and verify it's available for inference.
|
||||
|
||||
You can also remove a deployed NIM to free up GPU resources, if needed.
|
||||
```sh
|
||||
export NEMO_URL="http://nemo.test"
|
||||
|
||||
curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct"
|
||||
```
|
||||
|
||||
## Running Llama Stack with NVIDIA
|
||||
|
||||
You can do this via venv (build code), or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
This method allows you to get started quickly without having to build the distribution code.
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_PORT=8321
|
||||
docker run \
|
||||
-it \
|
||||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ./run.yaml:/root/my-run.yaml \
|
||||
llamastack/distribution-nvidia \
|
||||
--config /root/my-run.yaml \
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
|
||||
```
|
||||
|
||||
### Via venv
|
||||
|
||||
If you've set up your local development environment, you can also build the image using your local virtual environment.
|
||||
|
||||
```bash
|
||||
INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
|
||||
llama stack build --distro nvidia --image-type venv
|
||||
llama stack run ./run.yaml \
|
||||
--port 8321 \
|
||||
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL
|
||||
```
|
||||
|
||||
## Example Notebooks
|
||||
For examples of how to use the NVIDIA Distribution to run inference, fine-tune, evaluate, and run safety checks on your LLMs, you can reference the example notebooks in {repopath}`docs/notebooks/nvidia`.
|
|
@ -192,6 +192,6 @@ The NVIDIA distribution is ideal for:
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
|
||||
- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
|
||||
- **[Building Custom Distributions](../building_distro)** - Create your own distribution
|
|
@ -57,6 +57,6 @@ The Passthrough distribution is ideal for:
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
|
||||
- **[Building Custom Distributions](../building_distro)** - Create your own distribution
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Starting Llama Stack Server](../starting-llama-stack-server)** - How to run distributions
|
||||
- **[Starting Llama Stack Server](../starting_llama_stack_server)** - How to run distributions
|
|
@ -231,6 +231,6 @@ The starter distribution is ideal for developers who want to experiment with dif
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Available Distributions](../list-of-distributions)** - Compare with other distributions
|
||||
- **[Available Distributions](../list_of_distributions)** - Compare with other distributions
|
||||
- **[Configuration Reference](../configuration)** - Understanding configuration options
|
||||
- **[Building Custom Distributions](../building-distro)** - Create your own distribution
|
||||
- **[Building Custom Distributions](../building_distro)** - Create your own distribution
|
|
@ -13,13 +13,13 @@ You can run a Llama Stack server in one of the following ways:
|
|||
|
||||
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (e.g. fireworks, together, groq, etc.)
|
||||
|
||||
**See:** [Using Llama Stack as a Library](./importing-as-library)
|
||||
**See:** [Using Llama Stack as a Library](./importing_as_library)
|
||||
|
||||
## Container
|
||||
|
||||
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have.
|
||||
|
||||
**See:** [Available Distributions](./list-of-distributions) for more details on selecting the right distribution.
|
||||
**See:** [Available Distributions](./list_of_distributions) for more details on selecting the right distribution.
|
||||
|
||||
## Kubernetes
|
||||
|
||||
|
@ -69,7 +69,7 @@ If you have built a container image and want to deploy it in a Kubernetes cluste
|
|||
|
||||
## Related Guides
|
||||
|
||||
- **[Available Distributions](./list-of-distributions)** - Choose the right distribution
|
||||
- **[Building Custom Distributions](./building-distro)** - Create your own distribution
|
||||
- **[Available Distributions](./list_of_distributions)** - Choose the right distribution
|
||||
- **[Building Custom Distributions](./building_distro)** - Create your own distribution
|
||||
- **[Configuration Reference](./configuration)** - Understanding configuration options
|
||||
- **[Customizing run.yaml](./customizing-run-yaml)** - Adapt configurations to your environment
|
||||
- **[Customizing run.yaml](./customizing_run_yaml)** - Adapt configurations to your environment
|
|
@ -199,7 +199,7 @@ pprint(response)
|
|||
|
||||
Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.
|
||||
|
||||
In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building-applications/playground) for an interactive interface to upload datasets and run scorings.
|
||||
In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building_applications/playground) for an interactive interface to upload datasets and run scorings.
|
||||
|
||||
```python
|
||||
judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"
|
|
@ -1,6 +1,6 @@
|
|||
# References
|
||||
|
||||
- [Python SDK Reference](python-sdk) for the Llama Stack Python SDK
|
||||
- [Llama CLI](llama-cli) for building and running your Llama Stack server
|
||||
- [Llama Stack Client CLI](client-cli) for interacting with your Llama Stack server
|
||||
- [Evaluations Reference](evals-reference) for running evaluations and benchmarks
|
||||
- [Python SDK Reference](python_sdk) for the Llama Stack Python SDK
|
||||
- [Llama CLI](llama_cli) for building and running your Llama Stack server
|
||||
- [Llama Stack Client CLI](client_cli) for interacting with your Llama Stack server
|
||||
- [Evaluations Reference](evals_reference) for running evaluations and benchmarks
|
||||
|
|
|
@ -30,7 +30,7 @@ You have two ways to install Llama Stack:
|
|||
|
||||
1. `download`: Supports downloading models from Meta or Hugging Face.
|
||||
2. `model`: Lists available models and their properties.
|
||||
3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building-distro) documentation.
|
||||
3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../distributions/building_distro) documentation.
|
||||
|
||||
### Sample Usage
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue