mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 02:53:30 +00:00
test: revamp eval related integration tests (#1433)
# What does this PR do? - revamp and clean up datasets/scoring/eval integration tests - closes https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan **dataset** ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/datasetio/ ``` <img width="842" alt="image" src="https://github.com/user-attachments/assets/88fc2b6a-b496-47bf-bc0c-8fea48ba36ff" /> **scoring** ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="851" alt="image" src="https://github.com/user-attachments/assets/50f46415-b44c-4c37-a6c3-076f2767adb3" /> **eval** ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/eval --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="841" alt="image" src="https://github.com/user-attachments/assets/8eb1c65c-3b39-4d66-8ff4-f471ca783e49" /> [//]: # (## Documentation)
This commit is contained in:
parent
82e94fe22f
commit
bcb13c492f
7 changed files with 184 additions and 222 deletions
|
@ -59,7 +59,7 @@ def pytest_addoption(parser):
|
|||
)
|
||||
parser.addoption(
|
||||
"--judge-model",
|
||||
help="comma-separated list of judge models. Fixture name: judge_model_id",
|
||||
help="Specify the judge model to use for testing",
|
||||
)
|
||||
parser.addoption(
|
||||
"--embedding-dimension",
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue