From ade76e4a69e679c88742f25d1dd0e99636e48ede Mon Sep 17 00:00:00 2001 From: Botao Chen Date: Fri, 7 Mar 2025 15:05:27 -0800 Subject: [PATCH] fix: update the open benchmark eval doc (#1497) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## What does this PR do? add proper links to the doc ## test preview the doc Screenshot 2025-03-07 at 3 03 22 PM Screenshot 2025-03-07 at 3 03 32 PM --- docs/source/concepts/evaluation_concepts.md | 2 +- docs/source/references/evals_reference/index.md | 2 +- llama_stack/templates/open-benchmark/run.yaml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/concepts/evaluation_concepts.md b/docs/source/concepts/evaluation_concepts.md index 61a695d9f..abe5898b6 100644 --- a/docs/source/concepts/evaluation_concepts.md +++ b/docs/source/concepts/evaluation_concepts.md @@ -37,7 +37,7 @@ The list of open-benchmarks we currently support: - [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models. -You can follow this contributing guidance to add more open-benchmarks to Llama Stack +You can follow this [contributing guide](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack ### Run evaluation on open-benchmarks via CLI diff --git a/docs/source/references/evals_reference/index.md b/docs/source/references/evals_reference/index.md index d55537c47..c10becc7d 100644 --- a/docs/source/references/evals_reference/index.md +++ b/docs/source/references/evals_reference/index.md @@ -372,7 +372,7 @@ The purpose of scoring function is to calculate the score for each example based Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe. ### Add new benchmark into template -Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in templates/open-benchmark/run.yaml +Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/open-benchmark/run.yaml) Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have - `benchmark_id`: identifier of the benchmark diff --git a/llama_stack/templates/open-benchmark/run.yaml b/llama_stack/templates/open-benchmark/run.yaml index ba495923c..47a2f2eb5 100644 --- a/llama_stack/templates/open-benchmark/run.yaml +++ b/llama_stack/templates/open-benchmark/run.yaml @@ -1,5 +1,5 @@ version: '2' -image_name: dev +image_name: open-benchmark apis: - agents - datasetio