Commit graph

31 commits

Author SHA1 Message Date
Xi Yan
9ff903e63b delete preregistered dataset/eval task 2024-11-11 11:05:47 -05:00
Xi Yan
75ccc05296 rename 2024-11-11 10:48:47 -05:00
Xi Yan
e690eb7ad3 Merge branch 'main' into mmlu_benchmark 2024-11-11 10:22:32 -05:00
Dinesh Yeduguru
d800a16acd
Resource oriented design for shields (#399)
* init

* working bedrock tests

* bedrock test for inference fixes

* use env vars for bedrock guardrail vars

* add register in meta reference

* use correct shield impl in meta ref

* dont add together fixture

* right naming

* minor updates

* improved registration flow

* address feedback

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 12:16:11 -08:00
Xi Yan
58c6138df1 move dataset to hf llamastack repo 2024-11-08 11:42:16 -08:00
Xi Yan
989f070bc0 move benchmark task def to file 2024-11-07 21:35:02 -08:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* delete old tests

* fixture return impl
2024-11-07 21:24:12 -08:00
Xi Yan
6525b43906 refactor 2024-11-07 18:41:33 -08:00
Xi Yan
edeb6dcf04 mmlu loose 2024-11-07 18:36:41 -08:00
Xi Yan
6ee02ca23b fix 2024-11-07 18:25:39 -08:00
Xi Yan
33b6d9b7b7 merge 2024-11-07 18:15:13 -08:00
Xi Yan
3c17853d79 register task required 2024-11-07 16:42:44 -08:00
Xi Yan
94a56cc3f3 register task required 2024-11-07 16:41:23 -08:00
Xi Yan
7ca479f400 fix optional 2024-11-07 16:22:33 -08:00
Xi Yan
fd581c3d88 only keep 1 run_eval 2024-11-07 16:17:49 -08:00
Xi Yan
d1633dc412 huggingface provider 2024-11-07 15:20:22 -08:00
Xi Yan
cc6edf6287 Merge branch 'eval_task_register' into mmlu_benchmark 2024-11-07 14:41:50 -08:00
Xi Yan
f05db9a25c add eval_id for jobs 2024-11-07 14:30:46 -08:00
Xi Yan
ea80f623fb add default task_eval_id for routing 2024-11-07 14:19:33 -08:00
Xi Yan
51c20f9c29 api refactor 2024-11-07 13:54:26 -08:00
Xi Yan
97dcd5704c Merge branch 'main' into eval_task_register 2024-11-07 13:08:58 -08:00
Ashwin Bharambe
694c142b89
Add provider deprecation support; change directory structure (#397)
* Add provider deprecation support; change directory structure

* fix a couple dangling imports

* move the meta_reference safety dir also
2024-11-07 13:04:53 -08:00
Xi Yan
93995ecc4c test wip 2024-11-07 11:11:27 -08:00
Xi Yan
3322aa9ee4 mmlu scoring fn 2024-11-07 10:54:00 -08:00
Xi Yan
3f1ac29d57 test eval works 2024-11-06 21:40:38 -08:00
Xi Yan
413a1b6d8b fix eval 2024-11-06 21:10:54 -08:00
Xi Yan
56239fce90 scoring fix 2024-11-06 18:07:16 -08:00
Xi Yan
0bce74402f scoring test pass 2024-11-06 17:27:55 -08:00
Xi Yan
def6d5d8ad scoring resolve 2024-11-06 17:04:25 -08:00
Xi Yan
8fc2d212a2
fix safety signature mismatch (#388)
* fix safety sig

* shield_type->identifier
2024-11-06 16:30:47 -08:00
Ashwin Bharambe
994732e2e0
impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00