feat: enable DPO training with HuggingFace inline provider

This commit is contained in:
Ubuntu 2025-07-23 15:39:36 +00:00
parent 874b1cb00f
commit 1c7be17113
7 changed files with 813 additions and 101 deletions

View file

@ -24,6 +24,9 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
| `weight_decay` | `<class 'float'>` | No | 0.01 | |
| `dataloader_num_workers` | `<class 'int'>` | No | 4 | |
| `dataloader_pin_memory` | `<class 'bool'>` | No | True | |
| `dpo_beta` | `<class 'float'>` | No | 0.1 | |
| `use_reference_model` | `<class 'bool'>` | No | True | |
| `dpo_loss_type` | `Literal['sigmoid', 'hinge', 'ipo', 'kto_pair'` | No | sigmoid | |
## Sample Configuration