mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 02:53:30 +00:00
fix: update the open benchmark eval doc (#1497)
## What does this PR do? add proper links to the doc ## test preview the doc <img width="1304" alt="Screenshot 2025-03-07 at 3 03 22 PM" src="https://github.com/user-attachments/assets/0a0e2a3d-2420-4af0-99c3-a4786855fae0" /> <img width="1303" alt="Screenshot 2025-03-07 at 3 03 32 PM" src="https://github.com/user-attachments/assets/e11844e7-ee8a-4a64-8617-abafa02b2868" />
This commit is contained in:
parent
89e449c2cb
commit
ade76e4a69
3 changed files with 3 additions and 3 deletions
|
@ -37,7 +37,7 @@ The list of open-benchmarks we currently support:
|
||||||
- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
|
- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
|
||||||
|
|
||||||
|
|
||||||
You can follow this contributing guidance to add more open-benchmarks to Llama Stack
|
You can follow this [contributing guide](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
|
||||||
|
|
||||||
### Run evaluation on open-benchmarks via CLI
|
### Run evaluation on open-benchmarks via CLI
|
||||||
|
|
||||||
|
|
|
@ -372,7 +372,7 @@ The purpose of scoring function is to calculate the score for each example based
|
||||||
Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe.
|
Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe.
|
||||||
|
|
||||||
### Add new benchmark into template
|
### Add new benchmark into template
|
||||||
Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in templates/open-benchmark/run.yaml
|
Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/open-benchmark/run.yaml)
|
||||||
|
|
||||||
Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have
|
Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have
|
||||||
- `benchmark_id`: identifier of the benchmark
|
- `benchmark_id`: identifier of the benchmark
|
||||||
|
|
|
@ -1,5 +1,5 @@
|
||||||
version: '2'
|
version: '2'
|
||||||
image_name: dev
|
image_name: open-benchmark
|
||||||
apis:
|
apis:
|
||||||
- agents
|
- agents
|
||||||
- datasetio
|
- datasetio
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue