feat: enable DPO training with HuggingFace inline provider

2025-07-27 06:28:50 +00:00 · 2025-07-23 15:39:36 +00:00 · 2025-07-23 15:39:36 +00:00 · 1c7be17113
commit 1c7be17113
parent 874b1cb00f
7 changed files with 813 additions and 101 deletions
--- a/docs/source/providers/post_training/inline_huggingface.md
+++ b/docs/source/providers/post_training/inline_huggingface.md
@ -24,6 +24,9 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
 | `weight_decay` | `<class 'float'>` | No | 0.01 |  |
 | `dataloader_num_workers` | `<class 'int'>` | No | 4 |  |
 | `dataloader_pin_memory` | `<class 'bool'>` | No | True |  |
+| `dpo_beta` | `<class 'float'>` | No | 0.1 |  |
+| `use_reference_model` | `<class 'bool'>` | No | True |  |
+| `dpo_loss_type` | `Literal['sigmoid', 'hinge', 'ipo', 'kto_pair'` | No | sigmoid |  |

 ## Sample Configuration