Xi Yan
9ff903e63b
delete preregistered dataset/eval task
2024-11-11 11:05:47 -05:00
Xi Yan
75ccc05296
rename
2024-11-11 10:48:47 -05:00
Xi Yan
1031f1404b
add register model to unit test
2024-11-11 10:35:59 -05:00
Xi Yan
e690eb7ad3
Merge branch 'main' into mmlu_benchmark
2024-11-11 10:22:32 -05:00
Xi Yan
ba82021d4b
precommit
2024-11-08 17:58:58 -08:00
Xi Yan
1ebf6447c5
add missing inits
2024-11-08 17:54:24 -08:00
Xi Yan
89c3129f0b
add missing inits
2024-11-08 17:49:29 -08:00
Dinesh Yeduguru
ec644d3418
migrate model to Resource and new registration signature ( #410 )
...
* resource oriented object design for models
* add back llama_model field
* working tests
* register singature fix
* address feedback
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 16:12:57 -08:00
Xi Yan
490f7e9a75
refactor
2024-11-08 14:19:55 -08:00
Dinesh Yeduguru
d800a16acd
Resource oriented design for shields ( #399 )
...
* init
* working bedrock tests
* bedrock test for inference fixes
* use env vars for bedrock guardrail vars
* add register in meta reference
* use correct shield impl in meta ref
* dont add together fixture
* right naming
* minor updates
* improved registration flow
* address feedback
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 12:16:11 -08:00
Xi Yan
9d04f11543
remove todo
2024-11-08 11:43:15 -08:00
Xi Yan
58c6138df1
move dataset to hf llamastack repo
2024-11-08 11:42:16 -08:00
Xi Yan
d42774c41b
msg
2024-11-07 21:36:49 -08:00
Xi Yan
989f070bc0
move benchmark task def to file
2024-11-07 21:35:02 -08:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration ( #379 )
...
* wip
* scoring fn api
* eval api
* eval task
* evaluate api update
* pre commit
* unwrap context -> config
* config field doc
* typo
* naming fix
* separate benchmark / app eval
* api name
* rename
* wip tests
* wip
* datasetio test
* delete unused
* fixture
* scoring resolve
* fix scoring register
* scoring test pass
* score batch
* scoring fix
* fix eval
* test eval works
* remove type ignore
* api refactor
* add default task_eval_id for routing
* add eval_id for jobs
* remove type ignore
* only keep 1 run_eval
* fix optional
* register task required
* register task required
* delete old tests
* delete old tests
* fixture return impl
2024-11-07 21:24:12 -08:00
Xi Yan
6525b43906
refactor
2024-11-07 18:41:33 -08:00
Xi Yan
edeb6dcf04
mmlu loose
2024-11-07 18:36:41 -08:00
Xi Yan
6ee02ca23b
fix
2024-11-07 18:25:39 -08:00
Xi Yan
33b6d9b7b7
merge
2024-11-07 18:15:13 -08:00
Xi Yan
3c17853d79
register task required
2024-11-07 16:42:44 -08:00
Xi Yan
94a56cc3f3
register task required
2024-11-07 16:41:23 -08:00
Xi Yan
7ca479f400
fix optional
2024-11-07 16:22:33 -08:00
Xi Yan
fd581c3d88
only keep 1 run_eval
2024-11-07 16:17:49 -08:00
Xi Yan
37d87c585a
wip huggingface register
2024-11-07 15:59:55 -08:00
Xi Yan
d1633dc412
huggingface provider
2024-11-07 15:20:22 -08:00
Xi Yan
cc6edf6287
Merge branch 'eval_task_register' into mmlu_benchmark
2024-11-07 14:41:50 -08:00
Xi Yan
f05db9a25c
add eval_id for jobs
2024-11-07 14:30:46 -08:00
Xi Yan
ea80f623fb
add default task_eval_id for routing
2024-11-07 14:19:33 -08:00
Xi Yan
51c20f9c29
api refactor
2024-11-07 13:54:26 -08:00
Xi Yan
97dcd5704c
Merge branch 'main' into eval_task_register
2024-11-07 13:08:58 -08:00
Ashwin Bharambe
694c142b89
Add provider deprecation support; change directory structure ( #397 )
...
* Add provider deprecation support; change directory structure
* fix a couple dangling imports
* move the meta_reference safety dir also
2024-11-07 13:04:53 -08:00
Xi Yan
93995ecc4c
test wip
2024-11-07 11:11:27 -08:00
Xi Yan
3322aa9ee4
mmlu scoring fn
2024-11-07 10:54:00 -08:00
Xi Yan
b946afddc0
datasetdef files
2024-11-07 10:28:51 -08:00
Xi Yan
d75095033d
huggingface provider
2024-11-07 10:21:25 -08:00
Xi Yan
3f1ac29d57
test eval works
2024-11-06 21:40:38 -08:00
Xi Yan
413a1b6d8b
fix eval
2024-11-06 21:10:54 -08:00
Xi Yan
56239fce90
scoring fix
2024-11-06 18:07:16 -08:00
Xi Yan
0bce74402f
scoring test pass
2024-11-06 17:27:55 -08:00
Xi Yan
def6d5d8ad
scoring resolve
2024-11-06 17:04:25 -08:00
Xi Yan
8fc2d212a2
fix safety signature mismatch ( #388 )
...
* fix safety sig
* shield_type->identifier
2024-11-06 16:30:47 -08:00
Ashwin Bharambe
994732e2e0
impls
-> inline
, adapters
-> remote
(#381 )
2024-11-06 14:54:05 -08:00