llama-stack/.github/workflows
Botao Chen f369871083
feat: [New Eval Benchamark] IfEval (#1708)
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.


## Test Plan
spin up a llama stack server with open-benchmark template

run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
2025-03-19 16:39:59 -07:00
..
changelog.yml ci: Add scheduled workflow to update changelog (#1503) 2025-03-18 14:39:22 -07:00
gha_workflow_llama_stack_tests.yml build(deps): bump thollander/actions-comment-pull-request from 2 to 3 (#1485) 2025-03-07 17:31:53 -05:00
integration-tests.yml feat: [New Eval Benchamark] IfEval (#1708) 2025-03-19 16:39:59 -07:00
pre-commit.yml build: update uv lock to sync package versions (#1026) 2025-02-10 11:42:30 -05:00
providers-build.yml refactor: simplify command execution and remove PTY handling (#1641) 2025-03-17 15:03:14 -07:00
semantic-pr.yml ci: Add semantic PR title check (#979) 2025-02-06 12:22:34 -08:00
stale_bot.yml ci: add GitHub Action to close stale issues and PRs (#1613) 2025-03-13 12:09:04 -07:00
tests.yml test: Split inference tests to text and vision (#1008) 2025-02-07 09:35:49 -08:00
unit-tests.yml ci: limit PR testing based on modified files (#1644) 2025-03-17 15:20:29 -07:00
update-readthedocs.yml fix: make sure readthedocs is triggered if pyproject.toml is updated 2025-03-08 23:05:10 -08:00