Xi Yan
|
027ee2335c
|
delete old tests
|
2024-11-07 18:06:21 -08:00 |
|
Xi Yan
|
51c20f9c29
|
api refactor
|
2024-11-07 13:54:26 -08:00 |
|
Xi Yan
|
413a1b6d8b
|
fix eval
|
2024-11-06 21:10:54 -08:00 |
|
Xi Yan
|
56239fce90
|
scoring fix
|
2024-11-06 18:07:16 -08:00 |
|
Xi Yan
|
c5cf9c30be
|
score batch
|
2024-11-06 17:30:46 -08:00 |
|
Xi Yan
|
0bce74402f
|
scoring test pass
|
2024-11-06 17:27:55 -08:00 |
|
Xi Yan
|
0351072531
|
fix scoring register
|
2024-11-06 17:18:16 -08:00 |
|
Xi Yan
|
def6d5d8ad
|
scoring resolve
|
2024-11-06 17:04:25 -08:00 |
|
Xi Yan
|
ed833bb758
|
[Evals API][7/n] braintrust scoring provider (#333)
* wip scoring refactor
* llm as judge, move folders
* test full generation + eval
* extract score regex to llm context
* remove prints, cleanup braintrust in this branch
* braintrust skeleton
* datasetio test fix
* braintrust provider
* remove prints
* dependencies
* change json -> class
* json -> class
* remove initialize
* address nits
* check identifier prefix
* braintrust scoring identifier check, rebase
* udpate MANIFEST
* manifest
* remove braintrust scoring_fn
* remove comments
* tests
* imports fix
|
2024-10-28 18:59:35 -07:00 |
|
Xi Yan
|
7b8748c53e
|
[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330)
* wip scoring refactor
* llm as judge, move folders
* test full generation + eval
* extract score regex to llm context
* remove prints, cleanup braintrust in this branch
* change json -> class
* remove initialize
* address nits
* check identifier prefix
* udpate MANIFEST
|
2024-10-28 14:08:42 -07:00 |
|
Xi Yan
|
cb84034567
|
[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296)
* wip
* dataset validation
* test_scoring
* cleanup
* clean up test
* comments
* error checking
* dataset client
* test client:
* datasetio client
* clean up
* basic scoring function works
* scorer wip
* equality scorer
* score batch impl
* score batch
* update scoring test
* refactor
* validate scorer input
* address comments
* add all rows scores to ScoringResult
* bugfix
* scoring function def rename
|
2024-10-24 14:52:30 -07:00 |
|