fix: test_datasets HF scenario in CI (#2090)

forked from phoenix-oss/llama-stack-mirror

# What does this PR do?
**Fixes** #1959 

HuggingFace provides several loading paths that the datasets library can
use. My theory on why the test would previously fail intermittently is
because when calling `load_dataset(...)`, it may be trying several
options such as local cache, Hugging Face Hub, or a dataset script, or
other. There's one of these options that seem to work inconsistently in
the CI.

The HuggingFace datasets library relies on the `transformers` package to
load certain datasets such as `llamastack/simpleqa`, and by adding the
package, we can see the dataset is loaded consistently via the Hugging
Face Hub.

Please see PR in my fork demonstrating over 7 consecutive passes:
https://github.com/ChristianZaccaria/llama-stack/pull/1 

**Some References:**
- https://github.com/huggingface/transformers/issues/8690
- https://huggingface.co/docs/datasets/en/loading 

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

This commit is contained in:

Christian Zaccaria

2025-05-06 13:09:15 +01:00

• committed by

GitHub

parent 2e807b38cc

commit 18d2312690

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

3 changed files with 71 additions and 1 deletions

									
										1

tests/integration/datasets/test_datasets.py
									
										View file
										
				@ -31,7 +31,6 @@ def data_url_from_file(file_path: str) -> str:

				    return data_url

				@pytest.mark.skip(reason="flaky. Couldn't find 'llamastack/simpleqa' on the Hugging Face Hub")

				@pytest.mark.parametrize(

				    "purpose, source, provider_id, limit",

				    [

Rows
Columns

fix: test_datasets HF scenario in CI (#2090)

1 tests/integration/datasets/test_datasets.py Unescape Escape View file

1

tests/integration/datasets/test_datasets.py

View file