feat: Add NVIDIA NeMo datastore (#1852)

# What does this PR do?
Implemetation of NeMO Datastore register, unregister API.

Open Issues: 
- provider_id gets set to `localfs` in client.datasets.register() as it
is specified in routing_tables.py: DatasetsRoutingTable
see: #1860

Currently I have passed `"provider_id":"nvidia"` in metadata and have
parsed that in `DatasetsRoutingTable`
(Not the best approach, but just a quick workaround to make it work for
now.)

## Test Plan
- Unit test cases: `pytest
tests/unit/providers/nvidia/test_datastore.py`
```bash
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.26.0, nbval-0.11.0, metadata-3.1.1, html-4.1.1, cov-6.1.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                        

tests/unit/providers/nvidia/test_datastore.py ..                                                                                   [100%]

============================================================ warnings summary ============================================================

====================================================== 2 passed, 1 warning in 0.84s ======================================================
```

cc: @dglogo, @mattf, @yanxi0830
This commit is contained in:
Rashmi Pawar 2025-04-28 22:11:59 +05:30 committed by GitHub
parent c149cf2e0f
commit e6bbf8d20b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 514 additions and 9 deletions

View file

@ -394,6 +394,7 @@
"aiosqlite",
"blobfile",
"chardet",
"datasets",
"faiss-cpu",
"fastapi",
"fire",

View file

@ -18,6 +18,7 @@ distribution_spec:
- remote::nvidia
datasetio:
- inline::localfs
- remote::nvidia
scoring:
- inline::basic
tool_runtime:

View file

@ -7,6 +7,7 @@
from pathlib import Path
from llama_stack.distribution.datatypes import ModelInput, Provider, ShieldInput, ToolGroupInput
from llama_stack.providers.remote.datasetio.nvidia import NvidiaDatasetIOConfig
from llama_stack.providers.remote.eval.nvidia import NVIDIAEvalConfig
from llama_stack.providers.remote.inference.nvidia import NVIDIAConfig
from llama_stack.providers.remote.inference.nvidia.models import MODEL_ENTRIES
@ -23,7 +24,7 @@ def get_distribution_template() -> DistributionTemplate:
"telemetry": ["inline::meta-reference"],
"eval": ["remote::nvidia"],
"post_training": ["remote::nvidia"],
"datasetio": ["inline::localfs"],
"datasetio": ["inline::localfs", "remote::nvidia"],
"scoring": ["inline::basic"],
"tool_runtime": ["inline::rag-runtime"],
}
@ -38,6 +39,11 @@ def get_distribution_template() -> DistributionTemplate:
provider_type="remote::nvidia",
config=NVIDIASafetyConfig.sample_run_config(),
)
datasetio_provider = Provider(
provider_id="nvidia",
provider_type="remote::nvidia",
config=NvidiaDatasetIOConfig.sample_run_config(),
)
eval_provider = Provider(
provider_id="nvidia",
provider_type="remote::nvidia",
@ -75,6 +81,7 @@ def get_distribution_template() -> DistributionTemplate:
"run.yaml": RunConfigSettings(
provider_overrides={
"inference": [inference_provider],
"datasetio": [datasetio_provider],
"eval": [eval_provider],
},
default_models=default_models,

View file

@ -74,6 +74,13 @@ providers:
type: sqlite
namespace: null
db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/nvidia}/localfs_datasetio.db
- provider_id: nvidia
provider_type: remote::nvidia
config:
api_key: ${env.NVIDIA_API_KEY:}
dataset_namespace: ${env.NVIDIA_DATASET_NAMESPACE:default}
project_id: ${env.NVIDIA_PROJECT_ID:test-project}
datasets_url: ${env.NVIDIA_DATASETS_URL:http://nemo.test}
scoring:
- provider_id: basic
provider_type: inline::basic

View file

@ -62,13 +62,13 @@ providers:
project_id: ${env.NVIDIA_PROJECT_ID:test-project}
customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:http://nemo.test}
datasetio:
- provider_id: localfs
provider_type: inline::localfs
- provider_id: nvidia
provider_type: remote::nvidia
config:
kvstore:
type: sqlite
namespace: null
db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/nvidia}/localfs_datasetio.db
api_key: ${env.NVIDIA_API_KEY:}
dataset_namespace: ${env.NVIDIA_DATASET_NAMESPACE:default}
project_id: ${env.NVIDIA_PROJECT_ID:test-project}
datasets_url: ${env.NVIDIA_DATASETS_URL:http://nemo.test}
scoring:
- provider_id: basic
provider_type: inline::basic