Skip to content

fix: skip filename-based split detection for media files in get_data_…#8257

Open
Work4itnow wants to merge 1 commit into
huggingface:mainfrom
Work4itnow:fix-imagefolder-split-detection
Open

fix: skip filename-based split detection for media files in get_data_…#8257
Work4itnow wants to merge 1 commit into
huggingface:mainfrom
Work4itnow:fix-imagefolder-split-detection

Conversation

@Work4itnow

Copy link
Copy Markdown

Fixes #7201

Problem

When a user has an image file named train.png in a flat folder,
the library mistakenly treats it as a split name instead of an image,
resulting in only that one file being loaded.

Fix

Modified _get_data_files_patterns in data_files.py to skip
filename-based split detection when all matched files are media files
(images, audio, video). This causes it to fall through to
DEFAULT_PATTERNS_ALL which correctly loads all files.

Test

Added test_data_files_with_image_named_after_split to verify
that all images are loaded when one is named after a split.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

load_dataset() of images from a single directory where train.png image exists

1 participant