datastore documentation

This commit is contained in:
raspawar 2025-04-09 12:31:46 +00:00 committed by Rashmi Pawar
parent a3c07ac10a
commit 60bf0eb532
4 changed files with 78 additions and 3 deletions

View file

@ -0,0 +1,75 @@
# NVIDIA DatasetIO Provider for LlamaStack
This provider enables dataset management using NVIDIA's NeMo Customizer service.
## Features
- Register datasets for fine-tuning LLMs
- Unregister datasets
## Getting Started
### Prerequisites
- LlamaStack with NVIDIA configuration
- Access to Hosted NVIDIA NeMo Microservice
- API key for authentication with the NVIDIA service
### Setup
Build the NVIDIA environment:
```bash
llama stack build --template nvidia --image-type conda
```
### Basic Usage using the LlamaStack Python Client
#### Initialize the client
```python
import os
os.environ["NVIDIA_API_KEY"] = "your-api-key"
os.environ["NVIDIA_CUSTOMIZER_URL"] = "http://nemo.test"
os.environ["NVIDIA_USER_ID"] = "llama-stack-user"
os.environ["NVIDIA_DATASET_NAMESPACE"] = "default"
os.environ["NVIDIA_PROJECT_ID"] = "test-project"
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient("nvidia")
client.initialize()
```
#### Register a dataset
```python
client.datasets.register(
purpose="post-training/messages",
dataset_id="my-training-dataset",
source={
"type": "uri",
"uri": "hf://datasets/default/sample-dataset"
},
metadata={
"format": "json",
"description": "Dataset for LLM fine-tuning",
"provider": "nvidia"
}
)
```
#### Get a list of all registered datasets
```python
datasets = client.datasets.list()
for dataset in datasets:
print(f"Dataset ID: {dataset.identifier}")
print(f"Description: {dataset.metadata.get('description', '')}")
print(f"Source: {dataset.source.uri}")
print("---")
```
#### Unregister a dataset
```python
client.datasets.unregister(dataset_id="my-training-dataset")
```

View file

@ -55,7 +55,7 @@ class NvidiaDatasetIOConfig(BaseModel):
def sample_run_config(cls, **kwargs) -> Dict[str, Any]:
return {
"api_key": "${env.NVIDIA_API_KEY:}",
"user_id": "${env.NVIDIA_USER_ID:llama-stack-user}",
"dataset_namespace": "${env.NVIDIA_DATASET_NAMESPACE:default}",
"project_id": "${env.NVIDIA_PROJECT_ID:test-project}",
"datasets_url": "${env.NVIDIA_DATASETS_URL:http://nemo.test}",
}

View file

@ -78,9 +78,9 @@ providers:
provider_type: remote::nvidia
config:
api_key: ${env.NVIDIA_API_KEY:}
user_id: ${env.NVIDIA_USER_ID:llama-stack-user}
dataset_namespace: ${env.NVIDIA_DATASET_NAMESPACE:default}
project_id: ${env.NVIDIA_PROJECT_ID:test-project}
datasets_url: ${env.NVIDIA_DATASETS_URL:http://nemo.test}
scoring:
- provider_id: basic
provider_type: inline::basic

View file

@ -73,9 +73,9 @@ providers:
provider_type: remote::nvidia
config:
api_key: ${env.NVIDIA_API_KEY:}
user_id: ${env.NVIDIA_USER_ID:llama-stack-user}
dataset_namespace: ${env.NVIDIA_DATASET_NAMESPACE:default}
project_id: ${env.NVIDIA_PROJECT_ID:test-project}
datasets_url: ${env.NVIDIA_DATASETS_URL:http://nemo.test}
scoring:
- provider_id: basic
provider_type: inline::basic