llama-stack/llama_stack
yyymeta a626b7bce3
feat: [new open benchmark] BFCL_v3 (#1578)
# What does this PR do?
create a new dataset BFCL_v3 from
https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html

overall each question asks the model to perform a task described in
natural language, and additionally a set of available functions and
their schema are given for the model to choose from. the model is
required to write the function call form including function name and
parameters , to achieve the stated purpose. the results are validated
against provided ground truth, to make sure that the generated function
call and the ground truth function call are syntactically and
semantically equivalent, by checking their AST .



## Test Plan

start server by 

```
llama stack run ./llama_stack/templates/ollama/run.yaml
```

then send traffic
```
 llama-stack-client eval run-benchmark "bfcl"  --model-id   meta-llama/Llama-3.2-3B-Instruct    --output-dir /tmp/gpqa    --num-examples   2
```




[//]: # (## Documentation)
2025-03-14 12:50:49 -07:00
..
apis fix: OpenAPI with provider get (#1627) 2025-03-13 19:56:32 -07:00
cli fix: Fix pre-commit check (#1628) 2025-03-13 18:57:42 -07:00
distribution feat: add support for logging config in the run.yaml (#1408) 2025-03-14 12:36:25 -07:00
models/llama refactor: move all datetime.now() calls to UTC (#1589) 2025-03-13 15:34:53 -07:00
providers feat: [new open benchmark] BFCL_v3 (#1578) 2025-03-14 12:50:49 -07:00
scripts refactor(test): introduce --stack-config and simplify options (#1404) 2025-03-05 17:02:02 -08:00
strong_typing Ensure that deprecations for fields follow through to OpenAPI 2025-02-19 13:54:04 -08:00
templates feat: [new open benchmark] BFCL_v3 (#1578) 2025-03-14 12:50:49 -07:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py feat: add support for logging config in the run.yaml (#1408) 2025-03-14 12:36:25 -07:00
schema_utils.py ci: add mypy for static type checking (#1101) 2025-02-21 13:15:40 -08:00