llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-21 06:42:25 +00:00

Author	SHA1	Message	Date
Xi Yan	edeb6dcf04	mmlu loose	2024-11-07 18:36:41 -08:00
Xi Yan	6ee02ca23b	fix	2024-11-07 18:25:39 -08:00
Xi Yan	33b6d9b7b7	merge	2024-11-07 18:15:13 -08:00
Xi Yan	94a56cc3f3	register task required	2024-11-07 16:41:23 -08:00
Xi Yan	fd581c3d88	only keep 1 run_eval	2024-11-07 16:17:49 -08:00
Xi Yan	37d87c585a	wip huggingface register	2024-11-07 15:59:55 -08:00
Xi Yan	d1633dc412	huggingface provider	2024-11-07 15:20:22 -08:00
Xi Yan	cc6edf6287	Merge branch 'eval_task_register' into mmlu_benchmark	2024-11-07 14:41:50 -08:00
Xi Yan	f05db9a25c	add eval_id for jobs	2024-11-07 14:30:46 -08:00
Xi Yan	51c20f9c29	api refactor	2024-11-07 13:54:26 -08:00
Xi Yan	93995ecc4c	test wip	2024-11-07 11:11:27 -08:00
Xi Yan	3f1ac29d57	test eval works	2024-11-06 21:40:38 -08:00
Xi Yan	413a1b6d8b	fix eval	2024-11-06 21:10:54 -08:00
Xi Yan	def6d5d8ad	scoring resolve	2024-11-06 17:04:25 -08:00
Xi Yan	7b8748c53e	[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330 ) * wip scoring refactor * llm as judge, move folders * test full generation + eval * extract score regex to llm context * remove prints, cleanup braintrust in this branch * change json -> class * remove initialize * address nits * check identifier prefix * udpate MANIFEST	2024-10-28 14:08:42 -07:00
Xi Yan	abdf7cddf3	[Evals API][4/n] evals with generation meta-reference impl (#303 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * evals with generation * add all rows scores to ScoringResult * minor typing * bugfix * scoring function def rename * rebase name * refactor * address comments * Update iOS inference instructions for new quantization * Small updates to quantization config * Fix score threshold in faiss * Bump version to 0.0.45 * Handle both ipv6 and ipv4 interfaces together * update manifest for build templates * Update getting_started.md * chatcompletion & completion input type validation * inclusion->subsetof * error checking * scoring_function -> scoring_fn rename, scorer -> scoring_fn rename * address comments * [Evals API][5/n] fixes to generate openapi spec (#323) * generate openapi * typing comment, dataset -> dataset_id * remove custom type * sample eval run.yaml --------- Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2024-10-25 13:12:39 -07:00

16 commits