Xi Yan
|
b4416b72fd
|
Folder restructure for evals/datasets/scoring (#419)
* rename evals related stuff
* fix datasetio
* fix scoring test
* localfs -> LocalFS
* refactor scoring
* refactor scoring
* remove 8b_correctness scoring_fn from tests
* tests w/ eval params
* scoring fn braintrust fixture
* import
|
2024-11-11 17:35:40 -05:00 |
|
Ashwin Bharambe
|
994732e2e0
|
impls -> inline , adapters -> remote (#381)
|
2024-11-06 14:54:05 -08:00 |
|
Xi Yan
|
ed833bb758
|
[Evals API][7/n] braintrust scoring provider (#333)
* wip scoring refactor
* llm as judge, move folders
* test full generation + eval
* extract score regex to llm context
* remove prints, cleanup braintrust in this branch
* braintrust skeleton
* datasetio test fix
* braintrust provider
* remove prints
* dependencies
* change json -> class
* json -> class
* remove initialize
* address nits
* check identifier prefix
* braintrust scoring identifier check, rebase
* udpate MANIFEST
* manifest
* remove braintrust scoring_fn
* remove comments
* tests
* imports fix
|
2024-10-28 18:59:35 -07:00 |
|
Xi Yan
|
7b8748c53e
|
[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330)
* wip scoring refactor
* llm as judge, move folders
* test full generation + eval
* extract score regex to llm context
* remove prints, cleanup braintrust in this branch
* change json -> class
* remove initialize
* address nits
* check identifier prefix
* udpate MANIFEST
|
2024-10-28 14:08:42 -07:00 |
|
Xi Yan
|
cb84034567
|
[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296)
* wip
* dataset validation
* test_scoring
* cleanup
* clean up test
* comments
* error checking
* dataset client
* test client:
* datasetio client
* clean up
* basic scoring function works
* scorer wip
* equality scorer
* score batch impl
* score batch
* update scoring test
* refactor
* validate scorer input
* address comments
* add all rows scores to ScoringResult
* bugfix
* scoring function def rename
|
2024-10-24 14:52:30 -07:00 |
|