llama-stack

forked from phoenix-oss/llama-stack-mirror

History

yyymeta a626b7bce3 feat: [new open benchmark] BFCL_v3 (#1578 ) # What does this PR do? create a new dataset BFCL_v3 from https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html overall each question asks the model to perform a task described in natural language, and additionally a set of available functions and their schema are given for the model to choose from. the model is required to write the function call form including function name and parameters , to achieve the stated purpose. the results are validated against provided ground truth, to make sure that the generated function call and the ground truth function call are syntactically and semantically equivalent, by checking their AST . ## Test Plan start server by ``` llama stack run ./llama_stack/templates/ollama/run.yaml ``` then send traffic ``` llama-stack-client eval run-benchmark "bfcl" --model-id meta-llama/Llama-3.2-3B-Instruct --output-dir /tmp/gpqa --num-examples 2 ``` [//]: # (## Documentation)		2025-03-14 12:50:49 -07:00
..
apis	fix: OpenAPI with provider get (#1627 )	2025-03-13 19:56:32 -07:00
cli	fix: Fix pre-commit check (#1628 )	2025-03-13 18:57:42 -07:00
distribution	feat: add support for logging config in the run.yaml (#1408 )	2025-03-14 12:36:25 -07:00
models/llama	refactor: move all datetime.now() calls to UTC (#1589 )	2025-03-13 15:34:53 -07:00
providers	feat: [new open benchmark] BFCL_v3 (#1578 )	2025-03-14 12:50:49 -07:00
scripts	refactor(test): introduce --stack-config and simplify options (#1404 )	2025-03-05 17:02:02 -08:00
strong_typing	Ensure that deprecations for fields follow through to OpenAPI	2025-02-19 13:54:04 -08:00
templates	feat: [new open benchmark] BFCL_v3 (#1578 )	2025-03-14 12:50:49 -07:00
__init__.py	export LibraryClient	2024-12-13 12:08:00 -08:00
env.py	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
log.py	feat: add support for logging config in the run.yaml (#1408 )	2025-03-14 12:36:25 -07:00
schema_utils.py	ci: add mypy for static type checking (#1101 )	2025-02-21 13:15:40 -08:00