fix: Pass model parameter as config name to NeMo Customizer (#2218)

# What does this PR do? When launching a fine-tuning job, an upcoming version of NeMo Customizer will expect the `config` name to be formatted as `namespace/name@version`. Here, `config` is a reference to a model + additional metadata. There could be multiple `config`s that reference the same base model. This PR updates NVIDIA's `supervised_fine_tune` to simply pass the `model` param as-is to NeMo Customizer. Currently, it expects a specific, allowlisted llama model (i.e. `meta/Llama3.1-8B-Instruct`) and converts it to the provider format (`meta/llama-3.1-8b-instruct`). [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan From a notebook, I built an image with my changes: ``` !llama stack build --template nvidia --image-type venv from llama_stack.distribution.library_client import LlamaStackAsLibraryClient client = LlamaStackAsLibraryClient("nvidia") client.initialize() ``` And could successfully launch a job: ``` response = client.post_training.supervised_fine_tune( job_uuid="", model="meta/llama-3.2-1b-instruct@v1.0.0+A100", # Model passed as-is to Customimzer ... ) job_id = response.job_uuid print(f"Created job with ID: {job_id}") Output: Created job with ID: cust-Jm4oGmbwcvoufaLU4XkrRU ``` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
2025-06-27 18:50:41 +00:00 · 2025-05-20 12:51:39 -04:00 · 2025-05-20 12:51:39 -04:00 · 1a770cf8ac
commit 1a770cf8ac
parent 2eae8568e1
3 changed files with 7 additions and 10 deletions
--- a/llama_stack/providers/remote/post_training/nvidia/post_training.py
+++ b/llama_stack/providers/remote/post_training/nvidia/post_training.py
@ -224,7 +224,7 @@ class NvidiaPostTrainingAdapter(ModelRegistryHelper):

        Parameters:
            training_config: TrainingConfig - Configuration for training
-            model: str - Model identifier
+            model: str - NeMo Customizer configuration name
            algorithm_config: Optional[AlgorithmConfig] - Algorithm-specific configuration
            checkpoint_dir: Optional[str] - Directory containing model checkpoints, ignored atm
            job_uuid: str - Unique identifier for the job, ignored atm
@ -299,9 +299,6 @@ class NvidiaPostTrainingAdapter(ModelRegistryHelper):

            User is informed about unsupported parameters via warnings.
        """
-        # Map model to nvidia model name
-        # See `_MODEL_ENTRIES` for supported models
-        nvidia_model = self.get_provider_model_id(model)

        # Check for unsupported method parameters
        unsupported_method_params = []
@ -347,7 +344,7 @@ class NvidiaPostTrainingAdapter(ModelRegistryHelper):

        # Prepare base job configuration
        job_config = {
-            "config": nvidia_model,
+            "config": model,
            "dataset": {
                "name": training_config["data_config"]["dataset_id"],
                "namespace": self.config.dataset_namespace,