Commit graph

5 commits

Author SHA1 Message Date
Xi Yan
50cc165077
fixes tests & move braintrust api_keys to request headers (#535)
# What does this PR do?

- braintrust scoring provider requires OPENAI_API_KEY env variable to be
set
- move this to be able to be set as request headers (e.g. like together
/ fireworks api keys)
- fixes pytest with agents dependency

## Test Plan

**E2E**
```
llama stack run 
```
```yaml
scoring:
  - provider_id: braintrust-0
    provider_type: inline::braintrust
    config: {}
```

**Client**
```python
self.client = LlamaStackClient(
    base_url=os.environ.get("LLAMA_STACK_ENDPOINT", "http://localhost:5000"),
    provider_data={
        "openai_api_key": os.environ.get("OPENAI_API_KEY", ""),
    },
)
```
- run `llama-stack-client eval run_scoring`

**Unit Test**
```
pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
```

```
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py --env OPENAI_API_KEY=$OPENAI_API_KEY
```
<img width="745" alt="image"
src="https://github.com/user-attachments/assets/68f5cdda-f6c8-496d-8b4f-1b3dabeca9c2">

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-26 13:11:21 -08:00
Xi Yan
50d539e6d7 update tests --inference-model to hf id 2024-11-18 17:36:58 -08:00
Xi Yan
ec4fcad5ca
fix eval task registration (#426)
* fix eval tasks

* fix eval tasks

* fix eval tests
2024-11-12 11:51:34 -05:00
Xi Yan
2b7d70ba86
[Evals API][11/n] huggingface dataset provider + mmlu scoring fn (#392)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* huggingface provider

* datasetdef files

* mmlu scoring fn

* test wip

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* huggingface provider

* wip huggingface register

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* fix

* mmlu loose

* refactor

* msg

* fix tests

* move benchmark task def to file

* msg

* gen openapi

* openapi gen

* move dataset to hf llamastack repo

* remove todo

* refactor

* add register model to unit test

* rename

* register to client

* delete preregistered dataset/eval task

* comments

* huggingface -> remote adapter

* openapi gen
2024-11-11 14:49:50 -05:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* delete old tests

* fixture return impl
2024-11-07 21:24:12 -08:00