docs: provider and distro codegen migration (#3531)

# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

- Updates provider and distro codegen to handle the new format
- Migrates provider and distro files to the new format

## Test Plan

- Manual testing

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
This commit is contained in:
Alexey Rybak 2025-09-24 14:01:29 -07:00 committed by GitHub
parent 45da31801c
commit d23865757f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
103 changed files with 1796 additions and 423 deletions

View file

@ -0,0 +1,16 @@
---
sidebar_label: Datasetio
title: Datasetio
---
# Datasetio
## Overview
This section contains documentation for all available providers for the **datasetio** API.
## Providers
- [Localfs](./inline_localfs)
- [Remote - Huggingface](./remote_huggingface)
- [Remote - Nvidia](./remote_nvidia)

View file

@ -0,0 +1,16 @@
---
sidebar_label: Datasetio
title: Datasetio
---
# Datasetio
## Overview
This section contains documentation for all available providers for the **datasetio** API.
## Providers
- [Localfs](./inline_localfs)
- [Remote - Huggingface](./remote_huggingface)
- [Remote - Nvidia](./remote_nvidia)

View file

@ -0,0 +1,25 @@
---
description: "Local filesystem-based dataset I/O provider for reading and writing datasets to local storage."
sidebar_label: Localfs
title: inline::localfs
---
# inline::localfs
## Description
Local filesystem-based dataset I/O provider for reading and writing datasets to local storage.
## Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/localfs_datasetio.db
```

View file

@ -0,0 +1,25 @@
---
description: "HuggingFace datasets provider for accessing and managing datasets from the HuggingFace Hub."
sidebar_label: Remote - Huggingface
title: remote::huggingface
---
# remote::huggingface
## Description
HuggingFace datasets provider for accessing and managing datasets from the HuggingFace Hub.
## Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/huggingface_datasetio.db
```

View file

@ -0,0 +1,29 @@
---
description: "NVIDIA's dataset I/O provider for accessing datasets from NVIDIA's data platform."
sidebar_label: Remote - Nvidia
title: remote::nvidia
---
# remote::nvidia
## Description
NVIDIA's dataset I/O provider for accessing datasets from NVIDIA's data platform.
## Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | The NVIDIA API key. |
| `dataset_namespace` | `str \| None` | No | default | The NVIDIA dataset namespace. |
| `project_id` | `str \| None` | No | test-project | The NVIDIA project ID. |
| `datasets_url` | `<class 'str'>` | No | http://nemo.test | Base URL for the NeMo Dataset API |
## Sample Configuration
```yaml
api_key: ${env.NVIDIA_API_KEY:=}
dataset_namespace: ${env.NVIDIA_DATASET_NAMESPACE:=default}
project_id: ${env.NVIDIA_PROJECT_ID:=test-project}
datasets_url: ${env.NVIDIA_DATASETS_URL:=http://nemo.test}
```