llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Xi Yan	c1d18283d2	feat(eval api): (2.2/n) delete eval / scoring / scoring_fn apis (#1700 ) # What does this PR do? - To make it easier, delete existing `eval/scoring/scoring_function` apis. There will be a bunch of broken impls here. The sequence is: 1. migrate benchmark graders 2. clean up existing scoring functions - Add a skeleton evaluation impl to make tests pass. ## Test Plan tested in following PRs [//]: # (## Documentation)	2025-03-19 11:04:23 -07:00
Xi Yan	f2d93324e9	pre	2025-03-15 17:08:32 -07:00
Xi Yan	28b8c1c815	scoring fix	2025-03-15 17:06:53 -07:00
Xi Yan	98811cc034	fix: clean up test imports (#1600 ) # What does this PR do? - Clean up dead SDK code in https://github.com/meta-llama/llama-stack-client-python/pull/198 - Regen for local cache key issue [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/ --text-model meta-llama/Llama-3.3-70B-Instruct ``` - CI: `1382351211` <img width="1658" alt="image" src="https://github.com/user-attachments/assets/1a2de383-35a2-47a0-8d80-d666d4970c34" /> [//]: # (## Documentation)	2025-03-13 11:01:52 -07:00
Xi Yan	a55aab5958	fix: fix scoring tests (#1487 ) # What does this PR do? - fix scoring test [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py --text-model meta-llama/Llama-3.3-70B-Instruct --judge-model meta-llama/Llama-3.3-70B-Instruct ``` <img width="1061" alt="image" src="https://github.com/user-attachments/assets/740f9e6e-a654-4265-9db1-61481515a852" /> [//]: # (## Documentation)	2025-03-07 13:13:41 -08:00
Xi Yan	bcb13c492f	test: revamp eval related integration tests (#1433 ) # What does this PR do? - revamp and clean up datasets/scoring/eval integration tests - closes https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan dataset ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/datasetio/ ``` <img width="842" alt="image" src="https://github.com/user-attachments/assets/88fc2b6a-b496-47bf-bc0c-8fea48ba36ff" /> scoring ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="851" alt="image" src="https://github.com/user-attachments/assets/50f46415-b44c-4c37-a6c3-076f2767adb3" /> eval ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/eval --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="841" alt="image" src="https://github.com/user-attachments/assets/8eb1c65c-3b39-4d66-8ff4-f471ca783e49" /> [//]: # (## Documentation)	2025-03-06 10:51:35 -08:00
Ashwin Bharambe	abfbaf3c1b	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 ) All of the tests from `llama_stack/providers/tests/` are now moved to `tests/integration`. I converted the `tools`, `scoring` and `datasetio` tests to use API. However, `eval` and `post_training` proved to be a bit challenging to leaving those. I think `post_training` should be relatively straightforward also. As part of this, I noticed that `wolfram_alpha` tool wasn't added to some of our commonly used distros so I added it. I am going to remove a lot of code duplication from distros next so while this looks like a one-off right now, it will go away and be there uniformly for all distros.	2025-03-04 14:53:47 -08:00

7 commits