From f450a0fd3257fc4b4ef401ba9b438c0f381e51a7 Mon Sep 17 00:00:00 2001
From: Botao Chen <markchen1015@meta.com>
Date: Fri, 3 Jan 2025 08:37:48 -0800
Subject: [PATCH] Change post training run.yaml inference config  (#710)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Context
Colab notebook provides some limited free T4 GPU.

Making post training template e2e works with colab notebook T4 is
critical for early adoption of the stack post training apis. However, we
found that the existing LlamaModelParallelGenerator
(https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/inference/meta_reference/inference.py#L82)
in meta-reference inference implementation isn't compatible with T4
machine.

In this PR, We change to disable create_distributed_process_group for
inference api in post training run.yaml config and setup up the
distributed env variables in notebook
<img width="493" alt="Screenshot 2025-01-02 at 3 48 08 PM"
src="https://github.com/user-attachments/assets/dd159f70-4cff-475c-b459-1fc6e2c720ba"
/>

to make meta reference inference compatible with the free T4 machine

 ## test
Test with the WIP post training showcase colab notebook
https://colab.research.google.com/drive/1K4Q2wZq232_Bpy2ud4zL9aRxvCWAwyQs?usp=sharing
---
 llama_stack/templates/experimental-post-training/run.yaml | 1 +
 1 file changed, 1 insertion(+)
diff --git a/llama_stack/templates/experimental-post-training/run.yaml b/llama_stack/templates/experimental-post-training/run.yaml
index 3f390d83c..a654c375e 100644
--- a/llama_stack/templates/experimental-post-training/run.yaml
+++ b/llama_stack/templates/experimental-post-training/run.yaml
@@ -19,6 +19,7 @@ providers:
     config:
       max_seq_len: 4096
       checkpoint_dir: null
+      create_distributed_process_group: False
   eval:
   - provider_id: meta-reference
     provider_type: inline::meta-reference