Commit graph

5 commits

Author SHA1 Message Date
Xi Yan
63f1525165 precommit 2025-03-16 16:27:29 -07:00
Xi Yan
5cf7779b8f fix integeration 2025-03-15 17:36:39 -07:00
Xi Yan
a6fa3aa5a2
feat(dataset api): (1.6/n) fix all iterrows callsites (#1660)
# What does this PR do?
- as title

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
CI

```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
<img width="587" alt="image"
src="https://github.com/user-attachments/assets/4a25f493-501e-43f4-9836-d9802223a93a"
/>


[//]: # (## Documentation)
2025-03-15 17:24:16 -07:00
Xi Yan
bcb13c492f
test: revamp eval related integration tests (#1433)
# What does this PR do?
- revamp and clean up datasets/scoring/eval integration tests
- closes https://github.com/meta-llama/llama-stack/issues/1396

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
**dataset**
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/datasetio/
```
<img width="842" alt="image"
src="https://github.com/user-attachments/assets/88fc2b6a-b496-47bf-bc0c-8fea48ba36ff"
/>

**scoring**
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct
```
<img width="851" alt="image"
src="https://github.com/user-attachments/assets/50f46415-b44c-4c37-a6c3-076f2767adb3"
/>


**eval**
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/eval --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct
```
<img width="841" alt="image"
src="https://github.com/user-attachments/assets/8eb1c65c-3b39-4d66-8ff4-f471ca783e49"
/>


[//]: # (## Documentation)
2025-03-06 10:51:35 -08:00
Ashwin Bharambe
abfbaf3c1b
refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401)
All of the tests from `llama_stack/providers/tests/` are now moved to
`tests/integration`.

I converted the `tools`, `scoring` and `datasetio` tests to use API.
However, `eval` and `post_training` proved to be a bit challenging to
leaving those. I think `post_training` should be relatively
straightforward also.

As part of this, I noticed that `wolfram_alpha` tool wasn't added to
some of our commonly used distros so I added it. I am going to remove a
lot of code duplication from distros next so while this looks like a
one-off right now, it will go away and be there uniformly for all
distros.
2025-03-04 14:53:47 -08:00
Renamed from llama_stack/providers/tests/eval/test_eval.py (Browse further)