scoring dataset schemas

This commit is contained in:
Xi Yan 2025-03-12 23:56:19 -07:00
parent c5f2861a7e
commit 10f6528164
3 changed files with 83 additions and 5 deletions

View file

@ -6812,7 +6812,10 @@
"enum": [
"post-training/messages",
"eval/question-answer",
"eval/messages-answer"
"eval/messages-answer",
"scoring/question-generation-answer",
"scoring/messages-generation-answer",
"scoring/generation-answer"
],
"title": "DatasetPurpose",
"description": "Purpose of the dataset. Each purpose has a required input data schema."
@ -8792,7 +8795,11 @@
"type": "string",
"enum": [
"post-training/messages",
"eval/messages-answer"
"eval/question-answer",
"eval/messages-answer",
"scoring/question-generation-answer",
"scoring/messages-generation-answer",
"scoring/generation-answer"
],
"title": "DatasetPurpose",
"description": "Purpose of the dataset. Each purpose has a required input data schema."
@ -9887,9 +9894,12 @@
"enum": [
"post-training/messages",
"eval/question-answer",
"eval/messages-answer"
"eval/messages-answer",
"scoring/question-generation-answer",
"scoring/messages-generation-answer",
"scoring/generation-answer"
],
"description": "The purpose of the dataset. One of - \"post-training/messages\": The dataset contains a messages column with list of messages for post-training. { \"messages\": [ {\"role\": \"user\", \"content\": \"Hello, world!\"}, {\"role\": \"assistant\", \"content\": \"Hello, world!\"}, ] } - \"eval/question-answer\": The dataset contains a question column and an answer column for evaluation. { \"question\": \"What is the capital of France?\", \"answer\": \"Paris\" } - \"eval/messages-answer\": The dataset contains a messages column with list of messages and an answer column for evaluation. { \"messages\": [ {\"role\": \"user\", \"content\": \"Hello, my name is John Doe.\"}, {\"role\": \"assistant\", \"content\": \"Hello, John Doe. How can I help you today?\"}, {\"role\": \"user\", \"content\": \"What's my name?\"}, ], \"answer\": \"John Doe\" }"
"description": "The purpose of the dataset. One of - \"post-training/messages\": The dataset contains a messages column with list of messages for post-training. { \"messages\": [ {\"role\": \"user\", \"content\": \"Hello, world!\"}, {\"role\": \"assistant\", \"content\": \"Hello, world!\"}, ] } - \"eval/question-answer\": The dataset contains a question column and an answer column for evaluation. { \"question\": \"What is the capital of France?\", \"answer\": \"Paris\" } - \"eval/messages-answer\": The dataset contains a messages column with list of messages and an answer column for evaluation. { \"messages\": [ {\"role\": \"user\", \"content\": \"Hello, my name is John Doe.\"}, {\"role\": \"assistant\", \"content\": \"Hello, John Doe. How can I help you today?\"}, {\"role\": \"user\", \"content\": \"What's my name?\"}, ], \"answer\": \"John Doe\" } - \"scoring/question-generation-answer\": The dataset contains a question column, a generation column and an answer column for scoring. { \"question\": \"What is the capital of France?\", \"generation\": \"Paris\", \"answer\": \"Paris\" } - \"scoring/messages-generation-answer\": The dataset contains a messages column with list of messages, a generation column and an answer column for scoring. { \"messages\": [ {\"role\": \"user\", \"content\": \"Hello, my name is John Doe.\"}, {\"role\": \"assistant\", \"content\": \"Hello, John Doe. How can I help you today?\"}, {\"role\": \"user\", \"content\": \"What's my name?\"}, ], \"generation\": \"John Doe\", \"answer\": \"John Doe\" } - \"scoring/generation-answer\": The dataset contains a generation column and an answer column for scoring. { \"generation\": \"Paris\", \"answer\": \"Paris\" }"
},
"source": {
"$ref": "#/components/schemas/DataSource",

View file

@ -4718,6 +4718,9 @@ components:
- post-training/messages
- eval/question-answer
- eval/messages-answer
- scoring/question-generation-answer
- scoring/messages-generation-answer
- scoring/generation-answer
title: DatasetPurpose
description: >-
Purpose of the dataset. Each purpose has a required input data schema.
@ -6070,7 +6073,11 @@ components:
type: string
enum:
- post-training/messages
- eval/question-answer
- eval/messages-answer
- scoring/question-generation-answer
- scoring/messages-generation-answer
- scoring/generation-answer
title: DatasetPurpose
description: >-
Purpose of the dataset. Each purpose has a required input data schema.
@ -6779,6 +6786,9 @@ components:
- post-training/messages
- eval/question-answer
- eval/messages-answer
- scoring/question-generation-answer
- scoring/messages-generation-answer
- scoring/generation-answer
description: >-
The purpose of the dataset. One of - "post-training/messages": The dataset
contains a messages column with list of messages for post-training. {
@ -6790,7 +6800,18 @@ components:
column for evaluation. { "messages": [ {"role": "user", "content": "Hello,
my name is John Doe."}, {"role": "assistant", "content": "Hello, John
Doe. How can I help you today?"}, {"role": "user", "content": "What's
my name?"}, ], "answer": "John Doe" }
my name?"}, ], "answer": "John Doe" } - "scoring/question-generation-answer":
The dataset contains a question column, a generation column and an answer
column for scoring. { "question": "What is the capital of France?", "generation":
"Paris", "answer": "Paris" } - "scoring/messages-generation-answer": The
dataset contains a messages column with list of messages, a generation
column and an answer column for scoring. { "messages": [ {"role": "user",
"content": "Hello, my name is John Doe."}, {"role": "assistant", "content":
"Hello, John Doe. How can I help you today?"}, {"role": "user", "content":
"What's my name?"}, ], "generation": "John Doe", "answer": "John Doe"
} - "scoring/generation-answer": The dataset contains a generation column
and an answer column for scoring. { "generation": "Paris", "answer": "Paris"
}
source:
$ref: '#/components/schemas/DataSource'
description: >-

View file

@ -38,11 +38,37 @@ class DatasetPurpose(Enum):
],
"answer": "John Doe"
}
:cvar scoring/question-generation-answer: The dataset contains a question column, a generation column and an answer column.
{
"question": "What is the capital of France?",
"generation": "Paris",
"answer": "Paris"
}
:cvar scoring/messages-generation-answer: The dataset contains a messages column with list of messages, a generation column and an answer column.
{
"messages": [
{"role": "user", "content": "Hello, my name is John Doe."},
{"role": "assistant", "content": "Hello, John Doe. How can I help you today?"},
{"role": "user", "content": "What's my name?"},
],
"generation": "John Doe",
"answer": "John Doe"
}
:cvar scoring/generation-answer: The dataset contains a generation column and an answer column.
{
"generation": "Paris",
"answer": "Paris"
}
"""
post_training_messages = "post-training/messages"
eval_question_answer = "eval/question-answer"
eval_messages_answer = "eval/messages-answer"
scoring_question_generation_answer = "scoring/question-generation-answer"
scoring_messages_generation_answer = "scoring/messages-generation-answer"
scoring_generation_answer = "scoring/generation-answer"
# TODO: add more schemas here
@ -180,6 +206,27 @@ class Datasets(Protocol):
],
"answer": "John Doe"
}
- "scoring/question-generation-answer": The dataset contains a question column, a generation column and an answer column for scoring.
{
"question": "What is the capital of France?",
"generation": "Paris",
"answer": "Paris"
}
- "scoring/messages-generation-answer": The dataset contains a messages column with list of messages, a generation column and an answer column for scoring.
{
"messages": [
{"role": "user", "content": "Hello, my name is John Doe."},
{"role": "assistant", "content": "Hello, John Doe. How can I help you today?"},
{"role": "user", "content": "What's my name?"},
],
"generation": "John Doe",
"answer": "John Doe"
}
- "scoring/generation-answer": The dataset contains a generation column and an answer column for scoring.
{
"generation": "Paris",
"answer": "Paris"
}
:param source: The data source of the dataset. Examples:
- {
"type": "uri",