Xi Yan
|
b4416b72fd
|
Folder restructure for evals/datasets/scoring (#419)
* rename evals related stuff
* fix datasetio
* fix scoring test
* localfs -> LocalFS
* refactor scoring
* refactor scoring
* remove 8b_correctness scoring_fn from tests
* tests w/ eval params
* scoring fn braintrust fixture
* import
|
2024-11-11 17:35:40 -05:00 |
|
Xi Yan
|
2b7d70ba86
|
[Evals API][11/n] huggingface dataset provider + mmlu scoring fn (#392)
* wip
* scoring fn api
* eval api
* eval task
* evaluate api update
* pre commit
* unwrap context -> config
* config field doc
* typo
* naming fix
* separate benchmark / app eval
* api name
* rename
* wip tests
* wip
* datasetio test
* delete unused
* fixture
* scoring resolve
* fix scoring register
* scoring test pass
* score batch
* scoring fix
* fix eval
* test eval works
* huggingface provider
* datasetdef files
* mmlu scoring fn
* test wip
* remove type ignore
* api refactor
* add default task_eval_id for routing
* add eval_id for jobs
* remove type ignore
* huggingface provider
* wip huggingface register
* only keep 1 run_eval
* fix optional
* register task required
* register task required
* delete old tests
* fix
* mmlu loose
* refactor
* msg
* fix tests
* move benchmark task def to file
* msg
* gen openapi
* openapi gen
* move dataset to hf llamastack repo
* remove todo
* refactor
* add register model to unit test
* rename
* register to client
* delete preregistered dataset/eval task
* comments
* huggingface -> remote adapter
* openapi gen
|
2024-11-11 14:49:50 -05:00 |
|
Xi Yan
|
6192bf43a4
|
[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379)
* wip
* scoring fn api
* eval api
* eval task
* evaluate api update
* pre commit
* unwrap context -> config
* config field doc
* typo
* naming fix
* separate benchmark / app eval
* api name
* rename
* wip tests
* wip
* datasetio test
* delete unused
* fixture
* scoring resolve
* fix scoring register
* scoring test pass
* score batch
* scoring fix
* fix eval
* test eval works
* remove type ignore
* api refactor
* add default task_eval_id for routing
* add eval_id for jobs
* remove type ignore
* only keep 1 run_eval
* fix optional
* register task required
* register task required
* delete old tests
* delete old tests
* fixture return impl
|
2024-11-07 21:24:12 -08:00 |
|
Xi Yan
|
7b8748c53e
|
[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330)
* wip scoring refactor
* llm as judge, move folders
* test full generation + eval
* extract score regex to llm context
* remove prints, cleanup braintrust in this branch
* change json -> class
* remove initialize
* address nits
* check identifier prefix
* udpate MANIFEST
|
2024-10-28 14:08:42 -07:00 |
|
Xi Yan
|
abdf7cddf3
|
[Evals API][4/n] evals with generation meta-reference impl (#303)
* wip
* dataset validation
* test_scoring
* cleanup
* clean up test
* comments
* error checking
* dataset client
* test client:
* datasetio client
* clean up
* basic scoring function works
* scorer wip
* equality scorer
* score batch impl
* score batch
* update scoring test
* refactor
* validate scorer input
* address comments
* evals with generation
* add all rows scores to ScoringResult
* minor typing
* bugfix
* scoring function def rename
* rebase name
* refactor
* address comments
* Update iOS inference instructions for new quantization
* Small updates to quantization config
* Fix score threshold in faiss
* Bump version to 0.0.45
* Handle both ipv6 and ipv4 interfaces together
* update manifest for build templates
* Update getting_started.md
* chatcompletion & completion input type validation
* inclusion->subsetof
* error checking
* scoring_function -> scoring_fn rename, scorer -> scoring_fn rename
* address comments
* [Evals API][5/n] fixes to generate openapi spec (#323)
* generate openapi
* typing comment, dataset -> dataset_id
* remove custom type
* sample eval run.yaml
---------
Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
|
2024-10-25 13:12:39 -07:00 |
|