mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-16 06:27:58 +00:00
As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change. |
||
---|---|---|
.. | ||
__init__.py | ||
config.py | ||
datasetio.py | ||
README.md |
NVIDIA DatasetIO Provider for LlamaStack
This provider enables dataset management using NVIDIA's NeMo Customizer service.
Features
- Register datasets for fine-tuning LLMs
- Unregister datasets
Getting Started
Prerequisites
- LlamaStack with NVIDIA configuration
- Access to Hosted NVIDIA NeMo Microservice
- API key for authentication with the NVIDIA service
Setup
Build the NVIDIA environment:
llama stack build --distro nvidia --image-type venv
Basic Usage using the LlamaStack Python Client
Initialize the client
import os
os.environ["NVIDIA_API_KEY"] = "your-api-key"
os.environ["NVIDIA_CUSTOMIZER_URL"] = "http://nemo.test"
os.environ["NVIDIA_DATASET_NAMESPACE"] = "default"
os.environ["NVIDIA_PROJECT_ID"] = "test-project"
from llama_stack.core.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient("nvidia")
client.initialize()
Register a dataset
client.datasets.register(
purpose="post-training/messages",
dataset_id="my-training-dataset",
source={"type": "uri", "uri": "hf://datasets/default/sample-dataset"},
metadata={
"format": "json",
"description": "Dataset for LLM fine-tuning",
"provider": "nvidia",
},
)
Get a list of all registered datasets
datasets = client.datasets.list()
for dataset in datasets:
print(f"Dataset ID: {dataset.identifier}")
print(f"Description: {dataset.metadata.get('description', '')}")
print(f"Source: {dataset.source.uri}")
print("---")
Unregister a dataset
client.datasets.unregister(dataset_id="my-training-dataset")