llama-stack-mirror/llama_stack/templates
Botao Chen f369871083
feat: [New Eval Benchamark] IfEval (#1708)
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.


## Test Plan
spin up a llama stack server with open-benchmark template

run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
2025-03-19 16:39:59 -07:00
..
bedrock test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
cerebras test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
ci-tests test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
dell test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
dev test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
experimental-post-training feat: [post training] support save hf safetensor format checkpoint (#845) 2025-02-25 23:29:08 -08:00
fireworks test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
groq test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
hf-endpoint test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
hf-serverless test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
meta-reference-gpu test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
meta-reference-quantized-gpu test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
nvidia feat: added nvidia as safety provider (#1248) 2025-03-17 14:39:23 -07:00
ollama test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
open-benchmark feat: [New Eval Benchamark] IfEval (#1708) 2025-03-19 16:39:59 -07:00
passthrough fix: passthrough provider template + fix (#1612) 2025-03-13 09:44:26 -07:00
remote-vllm fix: Add the option to not verify SSL at remote-vllm provider (#1585) 2025-03-18 09:33:35 -04:00
sambanova test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
tgi test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
together test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
vllm-gpu test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
__init__.py Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
template.py feat(api): (1/n) datasets api clean up (#1573) 2025-03-17 16:55:45 -07:00