forked from phoenix-oss/llama-stack-mirror
		
	# What does this PR do? - revamp and clean up datasets/scoring/eval integration tests - closes https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan **dataset** ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/datasetio/ ``` <img width="842" alt="image" src="https://github.com/user-attachments/assets/88fc2b6a-b496-47bf-bc0c-8fea48ba36ff" /> **scoring** ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="851" alt="image" src="https://github.com/user-attachments/assets/50f46415-b44c-4c37-a6c3-076f2767adb3" /> **eval** ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/eval --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="841" alt="image" src="https://github.com/user-attachments/assets/8eb1c65c-3b39-4d66-8ff4-f471ca783e49" /> [//]: # (## Documentation)
		
			
				
	
	
	
	
		
			724 B
		
	
	
	
	
	
	
	
			
		
		
	
	
			724 B
		
	
	
	
	
	
	
	
| 1 | input_query | generated_answer | expected_answer | chat_completion_input | 
|---|---|---|---|---|
| 2 | What is the capital of France? | London | Paris | [{"role": "user", "content": "What is the capital of France?"}] | 
| 3 | Who is the CEO of Meta? | Mark Zuckerberg | Mark Zuckerberg | [{"role": "user", "content": "Who is the CEO of Meta?"}] | 
| 4 | What is the largest planet in our solar system? | Jupiter | Jupiter | [{"role": "user", "content": "What is the largest planet in our solar system?"}] | 
| 5 | What is the smallest country in the world? | China | Vatican City | [{"role": "user", "content": "What is the smallest country in the world?"}] | 
| 6 | What is the currency of Japan? | Yen | Yen | [{"role": "user", "content": "What is the currency of Japan?"}] |