mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-15 22:18:00 +00:00
fix: remove unused DPO parameters from schema and tests (#2988)
# What does this PR do?
I removed these DPO parameters from the schema in [this
PR](https://github.com/meta-llama/llama-stack/pull/2804), but I may not
have done it correctly, since they were reintroduced in [this
commit](cb7354a9ce (diff-4e9a8cb358213d6118c4b6ec2a76d0367af06441bf0717e13a775ade75e2061dR15081)
)—likely
due to a pre-commit hook.
I've made the changes again, and the pre-commit hook automatically
updated the spec sheet.
This commit is contained in:
parent
5c33bc1353
commit
3a574ef23c
4 changed files with 0 additions and 50 deletions
18
docs/_static/llama-stack-spec.yaml
vendored
18
docs/_static/llama-stack-spec.yaml
vendored
|
@ -11163,20 +11163,6 @@ components:
|
|||
DPOAlignmentConfig:
|
||||
type: object
|
||||
properties:
|
||||
reward_scale:
|
||||
type: number
|
||||
description: Scaling factor for the reward signal
|
||||
reward_clip:
|
||||
type: number
|
||||
description: >-
|
||||
Maximum absolute value for reward clipping
|
||||
epsilon:
|
||||
type: number
|
||||
description: >-
|
||||
Small value added for numerical stability
|
||||
gamma:
|
||||
type: number
|
||||
description: Discount factor for future rewards
|
||||
beta:
|
||||
type: number
|
||||
description: Temperature parameter for the DPO loss
|
||||
|
@ -11186,10 +11172,6 @@ components:
|
|||
description: The type of loss function to use for DPO
|
||||
additionalProperties: false
|
||||
required:
|
||||
- reward_scale
|
||||
- reward_clip
|
||||
- epsilon
|
||||
- gamma
|
||||
- beta
|
||||
- loss_type
|
||||
title: DPOAlignmentConfig
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue