mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-18 16:19:49 +00:00

History

Sébastien Han bc64635835 feat: load config class when doing variable substitution When using bash style substitution env variable in distribution template, we are processing the string and convert it to the type associated with the provider's config class. This allows us to return the proper type. This is crucial for api key since they are not strings anymore but SecretStr. If the key is unset we will get an empty string which will result in a Pydantic error like: ``` ERROR 2025-09-25 21:40:44,565 __main__:527 core::server: Error creating app: 1 validation error for AnthropicConfig api_key Input should be a valid string For further information visit https://errors.pydantic.dev/2.11/v/string_type ``` Signed-off-by: Sébastien Han <seb@redhat.com>		2025-09-29 09:55:19 +02:00
..
__init__.py	feat: Add NVIDIA NeMo datastore (#1852 )	2025-04-28 09:41:59 -07:00
config.py	feat: load config class when doing variable substitution	2025-09-29 09:55:19 +02:00
datasetio.py	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
README.md	chore: rename templates to distributions (#3035 )	2025-08-04 11:34:17 -07:00

README.md

NVIDIA DatasetIO Provider for LlamaStack

This provider enables dataset management using NVIDIA's NeMo Customizer service.

Features

Register datasets for fine-tuning LLMs
Unregister datasets

Getting Started

Prerequisites

LlamaStack with NVIDIA configuration
Access to Hosted NVIDIA NeMo Microservice
API key for authentication with the NVIDIA service

Setup

Build the NVIDIA environment:

llama stack build --distro nvidia --image-type venv

Basic Usage using the LlamaStack Python Client

Initialize the client

import os

os.environ["NVIDIA_API_KEY"] = "your-api-key"
os.environ["NVIDIA_CUSTOMIZER_URL"] = "http://nemo.test"
os.environ["NVIDIA_DATASET_NAMESPACE"] = "default"
os.environ["NVIDIA_PROJECT_ID"] = "test-project"
from llama_stack.core.library_client import LlamaStackAsLibraryClient

client = LlamaStackAsLibraryClient("nvidia")
client.initialize()

Register a dataset

client.datasets.register(
    purpose="post-training/messages",
    dataset_id="my-training-dataset",
    source={"type": "uri", "uri": "hf://datasets/default/sample-dataset"},
    metadata={
        "format": "json",
        "description": "Dataset for LLM fine-tuning",
        "provider": "nvidia",
    },
)

Get a list of all registered datasets

datasets = client.datasets.list()
for dataset in datasets:
    print(f"Dataset ID: {dataset.identifier}")
    print(f"Description: {dataset.metadata.get('description', '')}")
    print(f"Source: {dataset.source.uri}")
    print("---")

Unregister a dataset

client.datasets.unregister(dataset_id="my-training-dataset")