llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

ehhuang caffafd101 feat: update the default system prompt for 3.2/3.3 models (#1310 ) # Summary: The current prompt doesn't work well and tend to overindex on tool calling. This PR is not perfect, but should be an improvement over the current prompt. We can keep iterating. # Test Plan: Ran on a (small) eval with 20 HotpotQA examples. With current prompt: https://gist.github.com/ehhuang/9f967e62751907165eb13781ea968f5c { │ 'basic::equality': {'accuracy': {'accuracy': 0.2, 'num_correct': 4.0, 'num_total': 20}}, │ 'F1ScoringFn': { │ │ 'f1_average': 0.25333333333333335, │ │ 'precision_average': 0.23301767676767676, │ │ 'recall_average': 0.375 │ } } num_tool_calls=[5, 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 2, 2, 1, 1, 2, 1, 2, 2] num_examples_with_tool_call=20 num_examples_with_pythontag=0 ######################################################### With new prompt: https://gist.github.com/ehhuang/6e4a8ecf54db68922c2be8700056f962 { │ 'basic::equality': {'accuracy': {'accuracy': 0.25, 'num_correct': 5.0, 'num_total': 20}}, │ 'F1ScoringFn': { │ │ 'f1_average': 0.35579260478321006, │ │ 'precision_average': 0.32030238933180105, │ │ 'recall_average': 0.6091666666666666 │ } } num_tool_calls=[2, 1, 1, 5, 5, 5, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 3, 2] num_examples_with_tool_call=20 num_examples_with_pythontag=0 The answers have higher recall, and make fewer tool calls. Note that these were run with max_infer_iter=5, so the current prompt hits this limit more often, and without the limit, someitmes goes into infinite tool calling loop. The data here is with 3.3-70B. Results are equally poor with either prompt with 3.2-3B ~30 recall.		2025-02-27 23:05:42 -08:00
..
apis	ci: add mypy for static type checking (#1101 )	2025-02-21 13:15:40 -08:00
cli	fix: Incorrect import path for print_subcommand_description() (#1315 )	2025-02-27 18:50:41 -08:00
distribution	fix: ensure ollama embedding model is registered properly in the template	2025-02-27 22:49:06 -08:00
models/llama	feat: update the default system prompt for 3.2/3.3 models (#1310 )	2025-02-27 23:05:42 -08:00
providers	fix: [Litellm]Do not swallow first token (#1316 )	2025-02-27 20:53:47 -08:00
scripts	ci: add mypy for static type checking (#1101 )	2025-02-21 13:15:40 -08:00
strong_typing	Ensure that deprecations for fields follow through to OpenAPI	2025-02-19 13:54:04 -08:00
templates	fix: ensure ollama embedding model is registered properly in the template	2025-02-27 22:49:06 -08:00
__init__.py	export LibraryClient	2024-12-13 12:08:00 -08:00
schema_utils.py	ci: add mypy for static type checking (#1101 )	2025-02-21 13:15:40 -08:00