scoring dataset schemas

This commit is contained in:
Xi Yan 2025-03-12 23:56:19 -07:00
parent c5f2861a7e
commit 10f6528164
3 changed files with 83 additions and 5 deletions

View file

@ -38,11 +38,37 @@ class DatasetPurpose(Enum):
],
"answer": "John Doe"
}
:cvar scoring/question-generation-answer: The dataset contains a question column, a generation column and an answer column.
{
"question": "What is the capital of France?",
"generation": "Paris",
"answer": "Paris"
}
:cvar scoring/messages-generation-answer: The dataset contains a messages column with list of messages, a generation column and an answer column.
{
"messages": [
{"role": "user", "content": "Hello, my name is John Doe."},
{"role": "assistant", "content": "Hello, John Doe. How can I help you today?"},
{"role": "user", "content": "What's my name?"},
],
"generation": "John Doe",
"answer": "John Doe"
}
:cvar scoring/generation-answer: The dataset contains a generation column and an answer column.
{
"generation": "Paris",
"answer": "Paris"
}
"""
post_training_messages = "post-training/messages"
eval_question_answer = "eval/question-answer"
eval_messages_answer = "eval/messages-answer"
scoring_question_generation_answer = "scoring/question-generation-answer"
scoring_messages_generation_answer = "scoring/messages-generation-answer"
scoring_generation_answer = "scoring/generation-answer"
# TODO: add more schemas here
@ -180,6 +206,27 @@ class Datasets(Protocol):
],
"answer": "John Doe"
}
- "scoring/question-generation-answer": The dataset contains a question column, a generation column and an answer column for scoring.
{
"question": "What is the capital of France?",
"generation": "Paris",
"answer": "Paris"
}
- "scoring/messages-generation-answer": The dataset contains a messages column with list of messages, a generation column and an answer column for scoring.
{
"messages": [
{"role": "user", "content": "Hello, my name is John Doe."},
{"role": "assistant", "content": "Hello, John Doe. How can I help you today?"},
{"role": "user", "content": "What's my name?"},
],
"generation": "John Doe",
"answer": "John Doe"
}
- "scoring/generation-answer": The dataset contains a generation column and an answer column for scoring.
{
"generation": "Paris",
"answer": "Paris"
}
:param source: The data source of the dataset. Examples:
- {
"type": "uri",