Ashwin Bharambe
12947ac19e
Kill "remote" providers and fix testing with a remote stack properly ( #435 )
...
# What does this PR do?
This PR kills the notion of "pure passthrough" remote providers. You
cannot specify a single provider you must specify a whole distribution
(stack) as remote.
This PR also significantly fixes / upgrades testing infrastructure so
you can now test against a remotely hosted stack server by just doing
```bash
pytest -s -v -m remote test_agents.py \
--inference-model=Llama3.1-8B-Instruct --safety-shield=Llama-Guard-3-1B \
--env REMOTE_STACK_URL=http://localhost:5001
```
Also fixed `test_agents_persistence.py` (which was broken) and killed
some deprecated testing functions.
## Test Plan
All the tests.
2024-11-12 21:51:29 -08:00
Xi Yan
b4416b72fd
Folder restructure for evals/datasets/scoring ( #419 )
...
* rename evals related stuff
* fix datasetio
* fix scoring test
* localfs -> LocalFS
* refactor scoring
* refactor scoring
* remove 8b_correctness scoring_fn from tests
* tests w/ eval params
* scoring fn braintrust fixture
* import
2024-11-11 17:35:40 -05:00
Xi Yan
2b7d70ba86
[Evals API][11/n] huggingface dataset provider + mmlu scoring fn ( #392 )
...
* wip
* scoring fn api
* eval api
* eval task
* evaluate api update
* pre commit
* unwrap context -> config
* config field doc
* typo
* naming fix
* separate benchmark / app eval
* api name
* rename
* wip tests
* wip
* datasetio test
* delete unused
* fixture
* scoring resolve
* fix scoring register
* scoring test pass
* score batch
* scoring fix
* fix eval
* test eval works
* huggingface provider
* datasetdef files
* mmlu scoring fn
* test wip
* remove type ignore
* api refactor
* add default task_eval_id for routing
* add eval_id for jobs
* remove type ignore
* huggingface provider
* wip huggingface register
* only keep 1 run_eval
* fix optional
* register task required
* register task required
* delete old tests
* fix
* mmlu loose
* refactor
* msg
* fix tests
* move benchmark task def to file
* msg
* gen openapi
* openapi gen
* move dataset to hf llamastack repo
* remove todo
* refactor
* add register model to unit test
* rename
* register to client
* delete preregistered dataset/eval task
* comments
* huggingface -> remote adapter
* openapi gen
2024-11-11 14:49:50 -05:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration ( #379 )
...
* wip
* scoring fn api
* eval api
* eval task
* evaluate api update
* pre commit
* unwrap context -> config
* config field doc
* typo
* naming fix
* separate benchmark / app eval
* api name
* rename
* wip tests
* wip
* datasetio test
* delete unused
* fixture
* scoring resolve
* fix scoring register
* scoring test pass
* score batch
* scoring fix
* fix eval
* test eval works
* remove type ignore
* api refactor
* add default task_eval_id for routing
* add eval_id for jobs
* remove type ignore
* only keep 1 run_eval
* fix optional
* register task required
* register task required
* delete old tests
* delete old tests
* fixture return impl
2024-11-07 21:24:12 -08:00