Commit graph

262 commits

Author SHA1 Message Date
Xi Yan
b4416b72fd
Folder restructure for evals/datasets/scoring (#419)
* rename evals related stuff

* fix datasetio

* fix scoring test

* localfs -> LocalFS

* refactor scoring

* refactor scoring

* remove 8b_correctness scoring_fn from tests

* tests w/ eval params

* scoring fn braintrust fixture

* import
2024-11-11 17:35:40 -05:00
Xi Yan
2b7d70ba86
[Evals API][11/n] huggingface dataset provider + mmlu scoring fn (#392)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* huggingface provider

* datasetdef files

* mmlu scoring fn

* test wip

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* huggingface provider

* wip huggingface register

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* fix

* mmlu loose

* refactor

* msg

* fix tests

* move benchmark task def to file

* msg

* gen openapi

* openapi gen

* move dataset to hf llamastack repo

* remove todo

* refactor

* add register model to unit test

* rename

* register to client

* delete preregistered dataset/eval task

* comments

* huggingface -> remote adapter

* openapi gen
2024-11-11 14:49:50 -05:00
Ashwin Bharambe
c1f7ba3aed
Split safety into (llama-guard, prompt-guard, code-scanner) (#400)
Splits the meta-reference safety implementation into three distinct providers:

- inline::llama-guard
- inline::prompt-guard
- inline::code-scanner

Note that this PR is a backward incompatible change to the llama stack server. I have added deprecation_error field to ProviderSpec -- the server reads it and immediately barfs. This is used to direct the user with a specific message on what action to perform. An automagical "config upgrade" is a bit too much work to implement right now :/

(Note that we will be gradually prefixing all inline providers with inline:: -- I am only doing this for this set of new providers because otherwise existing configuration files will break even more badly.)
2024-11-11 09:29:18 -08:00
Xi Yan
ba82021d4b precommit 2024-11-08 17:58:58 -08:00
Xi Yan
1ebf6447c5 add missing inits 2024-11-08 17:54:24 -08:00
Xi Yan
89c3129f0b add missing inits 2024-11-08 17:49:29 -08:00
Dinesh Yeduguru
ec644d3418
migrate model to Resource and new registration signature (#410)
* resource oriented object design for models

* add back llama_model field

* working tests

* register singature fix

* address feedback

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 16:12:57 -08:00
Dinesh Yeduguru
d800a16acd
Resource oriented design for shields (#399)
* init

* working bedrock tests

* bedrock test for inference fixes

* use env vars for bedrock guardrail vars

* add register in meta reference

* use correct shield impl in meta ref

* dont add together fixture

* right naming

* minor updates

* improved registration flow

* address feedback

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 12:16:11 -08:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* delete old tests

* fixture return impl
2024-11-07 21:24:12 -08:00
Ashwin Bharambe
694c142b89
Add provider deprecation support; change directory structure (#397)
* Add provider deprecation support; change directory structure

* fix a couple dangling imports

* move the meta_reference safety dir also
2024-11-07 13:04:53 -08:00
Xi Yan
8fc2d212a2
fix safety signature mismatch (#388)
* fix safety sig

* shield_type->identifier
2024-11-06 16:30:47 -08:00
Ashwin Bharambe
994732e2e0
impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00