mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-22 22:49:43 +00:00
- Replace incorrect PPO-like parameters (reward_scale, reward_clip, epsilon, gamma) - Add proper DPO parameters: beta (KL coefficient) and loss_type - Update spec to reflect the correct schema |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| post_training.py | ||