llama-stack/llama_stack/providers/tests/datasetio/test_dataset.csv at 426d821e7f567e322e82934f8c4a1e6d0508e918 - phoenix-oss/llama-stack - Git for basel.kvant.cloud

phoenix-oss/llama-stack

forked from phoenix-oss/llama-stack-mirror

Xi Yan cb84034567

[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296 )

* wip

* dataset validation

* test_scoring

* cleanup

* clean up test

* comments

* error checking

* dataset client

* test client:

* datasetio client

* clean up

* basic scoring function works

* scorer wip

* equality scorer

* score batch impl

* score batch

* update scoring test

* refactor

* validate scorer input

* address comments

* add all rows scores to ScoringResult

* bugfix

* scoring function def rename

2024-10-24 14:52:30 -07:00

6 lines

310 B

CSV

Raw Blame History

 input_query,generated_answer,expected_answer
 What is the capital of France?,London,Paris
 Who is the CEO of Meta?,Mark Zuckerberg,Mark Zuckerberg
 What is the largest planet in our solar system?,Jupiter,Jupiter
 What is the smallest country in the world?,China,Vatican City
 What is the currency of Japan?,Yen,Yen