llama-stack-mirror/llama_stack/templates
Botao Chen f450a0fd32
Change post training run.yaml inference config (#710)
## Context
Colab notebook provides some limited free T4 GPU. 

Making post training template e2e works with colab notebook T4 is
critical for early adoption of the stack post training apis. However, we
found that the existing LlamaModelParallelGenerator
(https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/inference/meta_reference/inference.py#L82)
in meta-reference inference implementation isn't compatible with T4
machine.

In this PR, We change to disable create_distributed_process_group for
inference api in post training run.yaml config and setup up the
distributed env variables in notebook
<img width="493" alt="Screenshot 2025-01-02 at 3 48 08 PM"
src="https://github.com/user-attachments/assets/dd159f70-4cff-475c-b459-1fc6e2c720ba"
/>

to make meta reference inference compatible with the free T4 machine

 ## test
Test with the WIP post training showcase colab notebook
https://colab.research.google.com/drive/1K4Q2wZq232_Bpy2ud4zL9aRxvCWAwyQs?usp=sharing
2025-01-03 08:37:48 -08:00
..
bedrock Dont include 3B / 1B models for bedrock since they arent ondemand 2024-12-18 06:30:02 -08:00
cerebras Update Cerebras from Llama 3.1 to 3.3 (#645) 2024-12-17 16:28:24 -08:00
experimental-post-training Change post training run.yaml inference config (#710) 2025-01-03 08:37:48 -08:00
fireworks Add Llama 70B 3.3 to fireworks (#654) 2024-12-19 17:32:49 -08:00
hf-endpoint add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
hf-serverless add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
meta-reference-gpu add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
meta-reference-quantized-gpu add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
ollama add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
remote-vllm add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
tgi add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
together add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
vllm-gpu add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
__init__.py Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
template.py add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00