Fix DPOAlignmentConfig schema to use correct DPO parameters

- Replace incorrect PPO-like parameters (reward_scale, reward_clip, epsilon, gamma)
- Add proper DPO parameters: beta (KL coefficient) and loss_type
- Update spec to reflect the correct schema
This commit is contained in:
Ubuntu 2025-07-17 19:55:44 +00:00
parent 477bcd4d09
commit 37875a1985
2 changed files with 13 additions and 15 deletions

View file

@ -10111,20 +10111,20 @@ components:
DPOAlignmentConfig:
type: object
properties:
reward_scale:
type: number
reward_clip:
type: number
epsilon:
type: number
gamma:
beta:
type: number
loss_type:
type: string
enum:
- sigmoid
- hinge
- ipo
- kto_pair
default: sigmoid
additionalProperties: false
required:
- reward_scale
- reward_clip
- epsilon
- gamma
- beta
- loss_type
title: DPOAlignmentConfig
DataConfig:
type: object