llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Xi Yan d3508c4c76 feat(1/n): scoring function registration for llm-as-judge (#1405 ) # What does this PR do? - add ability to register a llm-as-judge scoring function with custom judge prompts / params. - Closes https://github.com/meta-llama/llama-stack/issues/1395 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Via CLI ``` llama-stack-client scoring_functions register \ --scoring-fn-id "llm-as-judge::my-prompt" \ --description "my custom judge" \ --return-type '{"type": "string"}' \ --provider-id "llm-as-judge" \ --provider-scoring-fn-id "my-prompt" \ --params '{"type": "llm_as_judge", "judge_model": "meta-llama/Llama-3.2-3B-Instruct", "prompt_template": "always output 1.0"}' ``` <img width="1373" alt="image" src="https://github.com/user-attachments/assets/7c6fc0ae-64fe-4581-8927-a9d8d746bd72" /> - Unit test will be addressed with https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (## Documentation)		2025-03-05 10:00:34 -08:00
..
apis	docs: api documentation for agents/eval/scoring/datasets (#1400 )	2025-03-05 09:40:24 -08:00
cli	refactor(test): unify vector_io tests and make them configurable (#1398 )	2025-03-04 13:37:45 -08:00
distribution	chore: Make README code blocks more easily copy pastable (#1420 )	2025-03-05 09:11:01 -08:00
models/llama	refactor: move a few tests to top-level tests/ directory	2025-03-03 17:33:39 -08:00
providers	feat(1/n): scoring function registration for llm-as-judge (#1405 )	2025-03-05 10:00:34 -08:00
scripts	refactor: move tests/client-sdk to tests/api (#1376 )	2025-03-03 17:28:12 -08:00
strong_typing	Ensure that deprecations for fields follow through to OpenAPI	2025-02-19 13:54:04 -08:00
templates	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
__init__.py	export LibraryClient	2024-12-13 12:08:00 -08:00
env.py	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 )	2025-03-04 14:53:47 -08:00
logcat.py	feat: add a configurable category-based logger (#1352 )	2025-03-02 18:51:14 -08:00
schema_utils.py	ci: add mypy for static type checking (#1101 )	2025-02-21 13:15:40 -08:00