llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-19 16:49:40 +00:00

Author	SHA1	Message	Date
Botao Chen	d0a72cc288	fix misc	2024-12-13 14:55:01 -08:00
Botao Chen	d55a8343ea	merge	2024-12-13 12:55:21 -08:00
Botao Chen	e2a0dce8ad	Merge branch 'main' into post_training_v3	2024-12-13 12:09:01 -08:00
Botao Chen	aeb76390fc	[1/n] torchtune <> llama-stack integration skeleton (#540 ) ### Context This is the 1st of series PRs that integrate torchtune with llama-stack as meta reference post-training implementation. For MVP, we will focus on single device LoRA SFT. Though this PR is still WIP, we want to get early feedback on the high level design of this skeleton while still working on several details ### Scope To limit the scope of this PR, we focus on the skeleton of the implementation. What are included? - refine the post-training SFT apis - skeleton of supervised_fine_tune implementation. We verified that we can call the supervised_fine_tune API successfully from llama stack client SDK (client side PR: https://github.com/meta-llama/llama-stack-client-python/pull/51) - a very basic single device LoRA training recipe based on torchtune core components - parity check with torchtune library and post training api unit test What are not includes? - implementation of other job management, get training artifacts apis (separate PR) - refactor the meta reference inference logic to support eval on finetuned model (separate PR) - several necessary functionality in the training recipe such as logging, validation etc (separate PR) - interop with telemetry for tracing and metrics logging, currently temporarily log to local disk (separate PR) ### Testing e2e test Although we haven't added detailed testing and numerical parity check with torchtune yet, we did a simple E2E test from client to server 1. setup server with` llama stack build --template experimental-post-training --image-type conda` and `llama stack run experimental-post-training ` 2. On client, run `llama-stack-client --endpoint http://devgpu018.nha2.facebook.com:5000 post_training supervised_fine_tune` 3. Training finishes successfully. On server side, get the finetune checkpoints under output dir. On client side, get the job uuid server <img width="1110" alt="Screenshot 2024-12-02 at 5 52 32 PM" src="https://github.com/user-attachments/assets/b548eb90-7a9b-4edc-a858-ee237cc4361d"> client <img width="807" alt="Screenshot 2024-12-02 at 5 52 37 PM" src="https://github.com/user-attachments/assets/1138ffa8-4698-40fa-b190-3d7b99646838"> parity check torchtune dataloader output and llama-stack post training dataloader output are same <img width="1116" alt="Screenshot 2024-12-04 at 8 18 46 PM" src="https://github.com/user-attachments/assets/5e295cdc-4c24-4ea6-82c0-ca96ef1bd6ee"> torchtune LoRA SFT and llama-stack post training LoRA SFT on alpaca dataset with llama3.2 3B instruct model are numerical match <img width="860" alt="Screenshot 2024-12-04 at 8 17 01 PM" src="https://github.com/user-attachments/assets/c05cf0a8-c674-4d2e-9f0a-c5d01b2dca99"> <img width="1049" alt="Screenshot 2024-12-04 at 8 17 06 PM" src="https://github.com/user-attachments/assets/b911d4e2-e7b1-41a9-b62c-d75529b6d443"> unit test ![Uploading Screenshot 2024-12-09 at 1.35.10 PM.png…]()	2024-12-13 11:05:35 -08:00
Botao Chen	e5993c565e	misc	2024-12-10 15:24:46 -08:00
Botao Chen	214d0645ae	add unit test	2024-12-10 14:57:03 -08:00
Botao Chen	c9a009b5e7	temp commit	2024-12-09 20:24:30 -08:00
Botao Chen	9c1ae088f9	refine	2024-12-09 13:35:44 -08:00
Botao Chen	9c80a57667	remove unnecessary provider apis from expermental post training template	2024-12-04 20:26:52 -08:00
Botao Chen	12eef58543	address comment	2024-12-04 15:19:54 -08:00
Botao Chen	2a15a8a005	temp commit	2024-12-04 13:59:40 -08:00
Botao Chen	41cf2bb0a7	refine api	2024-12-03 20:01:27 -08:00
Botao Chen	5838b7211d	fix pre-commit	2024-12-02 17:59:53 -08:00
Botao Chen	79c525be94	temp commit	2024-12-02 17:24:25 -08:00
Botao Chen	6c709abc4d	temp commit	2024-11-27 16:46:29 -08:00
Botao Chen	bfc782c054	temp commit	2024-11-27 15:22:55 -08:00
Botao Chen	9a976bcabd	temp commit	2024-11-26 10:49:03 -08:00
Botao Chen	d7598c68d7	temp commit	2024-11-25 17:27:26 -08:00

18 commits