llama-stack-mirror/llama_stack/apis/post_training
Ubuntu 37875a1985 Fix DPOAlignmentConfig schema to use correct DPO parameters
- Replace incorrect PPO-like parameters (reward_scale, reward_clip, epsilon, gamma)
- Add proper DPO parameters: beta (KL coefficient) and loss_type
- Update spec to reflect the correct schema
2025-07-17 19:55:44 +00:00
..
__init__.py chore: remove nested imports (#2515) 2025-06-26 08:01:05 +05:30
post_training.py Fix DPOAlignmentConfig schema to use correct DPO parameters 2025-07-17 19:55:44 +00:00