Commit graph

11 commits

Author SHA1 Message Date
Xi Yan
027ee2335c delete old tests 2024-11-07 18:06:21 -08:00
Xi Yan
51c20f9c29 api refactor 2024-11-07 13:54:26 -08:00
Xi Yan
413a1b6d8b fix eval 2024-11-06 21:10:54 -08:00
Xi Yan
56239fce90 scoring fix 2024-11-06 18:07:16 -08:00
Xi Yan
c5cf9c30be score batch 2024-11-06 17:30:46 -08:00
Xi Yan
0bce74402f scoring test pass 2024-11-06 17:27:55 -08:00
Xi Yan
0351072531 fix scoring register 2024-11-06 17:18:16 -08:00
Xi Yan
def6d5d8ad scoring resolve 2024-11-06 17:04:25 -08:00
Xi Yan
ed833bb758
[Evals API][7/n] braintrust scoring provider (#333)
* wip scoring refactor

* llm as judge, move folders

* test full generation + eval

* extract score regex to llm context

* remove prints, cleanup braintrust in this branch

* braintrust skeleton

* datasetio test fix

* braintrust provider

* remove prints

* dependencies

* change json -> class

* json -> class

* remove initialize

* address nits

* check identifier prefix

* braintrust scoring identifier check, rebase

* udpate MANIFEST

* manifest

* remove braintrust scoring_fn

* remove comments

* tests

* imports fix
2024-10-28 18:59:35 -07:00
Xi Yan
7b8748c53e
[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330)
* wip scoring refactor

* llm as judge, move folders

* test full generation + eval

* extract score regex to llm context

* remove prints, cleanup braintrust in this branch

* change json -> class

* remove initialize

* address nits

* check identifier prefix

* udpate MANIFEST
2024-10-28 14:08:42 -07:00
Xi Yan
cb84034567
[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296)
* wip

* dataset validation

* test_scoring

* cleanup

* clean up test

* comments

* error checking

* dataset client

* test client:

* datasetio client

* clean up

* basic scoring function works

* scorer wip

* equality scorer

* score batch impl

* score batch

* update scoring test

* refactor

* validate scorer input

* address comments

* add all rows scores to ScoringResult

* bugfix

* scoring function def rename
2024-10-24 14:52:30 -07:00