llama-stack/tests/integration/datasets
Christian Zaccaria 18d2312690
fix: test_datasets HF scenario in CI (#2090)
# What does this PR do?
**Fixes** #1959 

HuggingFace provides several loading paths that the datasets library can
use. My theory on why the test would previously fail intermittently is
because when calling `load_dataset(...)`, it may be trying several
options such as local cache, Hugging Face Hub, or a dataset script, or
other. There's one of these options that seem to work inconsistently in
the CI.

The HuggingFace datasets library relies on the `transformers` package to
load certain datasets such as `llamastack/simpleqa`, and by adding the
package, we can see the dataset is loaded consistently via the Hugging
Face Hub.

Please see PR in my fork demonstrating over 7 consecutive passes:
https://github.com/ChristianZaccaria/llama-stack/pull/1 

**Some References:**
- https://github.com/huggingface/transformers/issues/8690
- https://huggingface.co/docs/datasets/en/loading 

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
2025-05-06 14:09:15 +02:00
..
__init__.py feat(api): (1/n) datasets api clean up (#1573) 2025-03-17 16:55:45 -07:00
test_dataset.csv feat(api): (1/n) datasets api clean up (#1573) 2025-03-17 16:55:45 -07:00
test_datasets.py fix: test_datasets HF scenario in CI (#2090) 2025-05-06 14:09:15 +02:00
test_rag_dataset.csv feat(api): (1/n) datasets api clean up (#1573) 2025-03-17 16:55:45 -07:00