Commit graph

491 commits

Author SHA1 Message Date
Xi Yan
d42774c41b msg 2024-11-07 21:36:49 -08:00
Xi Yan
989f070bc0 move benchmark task def to file 2024-11-07 21:35:02 -08:00
Xi Yan
f429e75b3e fix tests 2024-11-07 21:31:05 -08:00
Xi Yan
0443b36cc1 merge 2024-11-07 21:27:08 -08:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* delete old tests

* fixture return impl
2024-11-07 21:24:12 -08:00
Xi Yan
8350f2df4c
[docs] refactor remote-hosted distro (#402)
* move docs

* docs
2024-11-07 19:16:38 -08:00
Xi Yan
4ae1d37c2f msg 2024-11-07 18:43:21 -08:00
Xi Yan
6525b43906 refactor 2024-11-07 18:41:33 -08:00
Xi Yan
edeb6dcf04 mmlu loose 2024-11-07 18:36:41 -08:00
Xi Yan
6ee02ca23b fix 2024-11-07 18:25:39 -08:00
Xi Yan
33b6d9b7b7 merge 2024-11-07 18:15:13 -08:00
Xi Yan
027ee2335c delete old tests 2024-11-07 18:06:21 -08:00
Xi Yan
3c17853d79 register task required 2024-11-07 16:42:44 -08:00
Xi Yan
94a56cc3f3 register task required 2024-11-07 16:41:23 -08:00
Xi Yan
7ca479f400 fix optional 2024-11-07 16:22:33 -08:00
Xi Yan
fd581c3d88 only keep 1 run_eval 2024-11-07 16:17:49 -08:00
Xi Yan
37d87c585a wip huggingface register 2024-11-07 15:59:55 -08:00
Xi Yan
d1633dc412 huggingface provider 2024-11-07 15:20:22 -08:00
Xi Yan
cc6edf6287 Merge branch 'eval_task_register' into mmlu_benchmark 2024-11-07 14:41:50 -08:00
Xi Yan
6b889651d6 Merge branch 'main' into eval_task_register 2024-11-07 14:41:29 -08:00
Xi Yan
6da74262ef remove type ignore 2024-11-07 14:37:50 -08:00
Xi Yan
f05db9a25c add eval_id for jobs 2024-11-07 14:30:46 -08:00
Xi Yan
ea80f623fb add default task_eval_id for routing 2024-11-07 14:19:33 -08:00
Xi Yan
51c20f9c29 api refactor 2024-11-07 13:54:26 -08:00
Dalton Flanagan
345ae07317
Factor out create_dist_registry (#398) 2024-11-07 16:13:19 -05:00
Xi Yan
97dcd5704c Merge branch 'main' into eval_task_register 2024-11-07 13:08:58 -08:00
Ashwin Bharambe
694c142b89
Add provider deprecation support; change directory structure (#397)
* Add provider deprecation support; change directory structure

* fix a couple dangling imports

* move the meta_reference safety dir also
2024-11-07 13:04:53 -08:00
Xi Yan
36e2538eb0
fix together inference validator (#393) 2024-11-07 11:31:53 -08:00
Xi Yan
eaa6a29cef Merge branch 'main' into eval_task_register 2024-11-07 11:18:37 -08:00
Xi Yan
3acd37bcb3 remove type ignore 2024-11-07 11:15:36 -08:00
Xi Yan
93995ecc4c test wip 2024-11-07 11:11:27 -08:00
Xi Yan
3322aa9ee4 mmlu scoring fn 2024-11-07 10:54:00 -08:00
Xi Yan
b946afddc0 datasetdef files 2024-11-07 10:28:51 -08:00
Xi Yan
d75095033d huggingface provider 2024-11-07 10:21:25 -08:00
Yufei (Benny) Chen
31c5fbda5e
[LlamaStack][Fireworks] Update client and add unittest (#390) 2024-11-07 10:11:28 -08:00
Ashwin Bharambe
cfcc0a871c Slightly update PR template 2024-11-06 22:49:01 -08:00
Xi Yan
283b5c1def Merge branch 'main' into eval_task_register 2024-11-06 21:50:09 -08:00
Xi Yan
3f1ac29d57 test eval works 2024-11-06 21:40:38 -08:00
Xi Yan
413a1b6d8b fix eval 2024-11-06 21:10:54 -08:00
Ashwin Bharambe
489f74a70b Allow simpler initialization of RemoteProviderConfig; fix issue in httpx client 2024-11-06 19:19:26 -08:00
Xi Yan
56239fce90 scoring fix 2024-11-06 18:07:16 -08:00
Ashwin Bharambe
064d2a5287
Remove the safety adapter for Together; we can just use "meta-reference" (#387) 2024-11-06 17:36:57 -08:00
Xi Yan
c5cf9c30be score batch 2024-11-06 17:30:46 -08:00
Xi Yan
0bce74402f scoring test pass 2024-11-06 17:27:55 -08:00
Xi Yan
0351072531 fix scoring register 2024-11-06 17:18:16 -08:00
Xi Yan
def6d5d8ad scoring resolve 2024-11-06 17:04:25 -08:00
Xi Yan
c53733d1a3 fixture 2024-11-06 16:41:17 -08:00
Xi Yan
00869799a1 Merge branch 'main' into eval_task_register 2024-11-06 16:34:22 -08:00
Xi Yan
8fc2d212a2
fix safety signature mismatch (#388)
* fix safety sig

* shield_type->identifier
2024-11-06 16:30:47 -08:00
Ashwin Bharambe
7c340f0236 rename test_inference -> test_text_inference 2024-11-06 16:12:50 -08:00