llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Botao Chen f369871083 feat: [New Eval Benchamark] IfEval (#1708 ) # What does this PR do? In this PR, we added a new eval open benchmark IfEval based on paper https://arxiv.org/abs/2311.07911 to measure the model capability of instruction following. ## Test Plan spin up a llama stack server with open-benchmark template run `llama-stack-client --endpoint xxx eval run-benchmark "meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/" --num-examples 20` on client side and get the eval aggregate results		2025-03-19 16:39:59 -07:00
..
agents	chore: deprecate ToolResponseMessage in agent.resume API (#1566 )	2025-03-12 12:10:21 -07:00
batch_inference	fix: solve ruff B008 warnings (#1444 )	2025-03-06 16:48:35 -08:00
benchmarks	fix: return 4xx for non-existent resources in GET requests (#1635 )	2025-03-18 14:06:53 -07:00
common	ci: add mypy for static type checking (#1101 )	2025-02-21 13:15:40 -08:00
datasetio	feat(api): (1/n) datasets api clean up (#1573 )	2025-03-17 16:55:45 -07:00
datasets	fix: fix open-benchmark template (#1695 )	2025-03-19 11:27:11 -07:00
eval	fix: return 4xx for non-existent resources in GET requests (#1635 )	2025-03-18 14:06:53 -07:00
files	fix: return 4xx for non-existent resources in GET requests (#1635 )	2025-03-18 14:06:53 -07:00
inference	feat(api): remove tool_name from ToolResponseMessage (#1599 )	2025-03-12 19:41:48 -07:00
inspect	feat: add provider API for listing and inspecting provider info (#1429 )	2025-03-13 15:07:21 -07:00
models	fix: return 4xx for non-existent resources in GET requests (#1635 )	2025-03-18 14:06:53 -07:00
post_training	chore: fix mypy violations in post_training modules (#1548 )	2025-03-18 14:58:16 -07:00
providers	fix: OpenAPI with provider get (#1627 )	2025-03-13 19:56:32 -07:00
safety	chore: move all Llama Stack types from llama-models to llama-stack (#1098 )	2025-02-14 09:10:59 -08:00
scoring	docs: api documentation for agents/eval/scoring/datasets (#1400 )	2025-03-05 09:40:24 -08:00
scoring_functions	feat: [New Eval Benchamark] IfEval (#1708 )	2025-03-19 16:39:59 -07:00
shields	fix: return 4xx for non-existent resources in GET requests (#1635 )	2025-03-18 14:06:53 -07:00
synthetic_data_generation	chore: move all Llama Stack types from llama-models to llama-stack (#1098 )	2025-02-14 09:10:59 -08:00
telemetry	feat: Add new compact MetricInResponse type (#1593 )	2025-03-12 15:45:44 -07:00
tools	docs: add documentation for RAGDocument (#1693 )	2025-03-19 10:16:00 -07:00
vector_dbs	fix: return 4xx for non-existent resources in GET requests (#1635 )	2025-03-18 14:06:53 -07:00
vector_io	chore: move all Llama Stack types from llama-models to llama-stack (#1098 )	2025-02-14 09:10:59 -08:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
datatypes.py	feat: add provider API for listing and inspecting provider info (#1429 )	2025-03-13 15:07:21 -07:00
resource.py	fix!: update eval-tasks -> benchmarks (#1032 )	2025-02-13 16:40:58 -08:00
version.py	llama-stack version alpha -> v1	2025-01-15 05:58:09 -08:00