What does this PR do?
This PR adds support for Direct Preference Optimization (DPO) training
via the existing HuggingFace inline provider. It introduces a new DPO
training recipe, config schema updates, dataset integration, and
end-to-end testing to support preference-based fine-tuning with TRL.
Test Plan
Added integration test:
tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize
Ran tests on both CPU and CUDA environments
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Simple approach to get some provider pages in the docs.
Add or update description fields in the provider configuration class
using Pydantic’s Field, ensuring these descriptions are clear and
complete, as they will be used to auto-generate provider documentation
via ./scripts/distro_codegen.py instead of editing the docs manually.
Signed-off-by: Sébastien Han <seb@redhat.com>