llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Botao Chen f369871083 feat: [New Eval Benchamark] IfEval (#1708 ) # What does this PR do? In this PR, we added a new eval open benchmark IfEval based on paper https://arxiv.org/abs/2311.07911 to measure the model capability of instruction following. ## Test Plan spin up a llama stack server with open-benchmark template run `llama-stack-client --endpoint xxx eval run-benchmark "meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/" --num-examples 20` on client side and get the eval aggregate results		2025-03-19 16:39:59 -07:00
..
changelog.yml	ci: Add scheduled workflow to update changelog (#1503 )	2025-03-18 14:39:22 -07:00
gha_workflow_llama_stack_tests.yml	build(deps): bump thollander/actions-comment-pull-request from 2 to 3 (#1485 )	2025-03-07 17:31:53 -05:00
integration-tests.yml	feat: [New Eval Benchamark] IfEval (#1708 )	2025-03-19 16:39:59 -07:00
pre-commit.yml	build: update uv lock to sync package versions (#1026 )	2025-02-10 11:42:30 -05:00
providers-build.yml	refactor: simplify command execution and remove PTY handling (#1641 )	2025-03-17 15:03:14 -07:00
semantic-pr.yml	ci: Add semantic PR title check (#979 )	2025-02-06 12:22:34 -08:00
stale_bot.yml	ci: add GitHub Action to close stale issues and PRs (#1613 )	2025-03-13 12:09:04 -07:00
tests.yml	test: Split inference tests to text and vision (#1008 )	2025-02-07 09:35:49 -08:00
unit-tests.yml	ci: limit PR testing based on modified files (#1644 )	2025-03-17 15:20:29 -07:00
update-readthedocs.yml	fix: make sure readthedocs is triggered if pyproject.toml is updated	2025-03-08 23:05:10 -08:00