Lightning-Universe/lightning-transformers

Shuffling support

juliusfrost opened this issue ยท 0 comments

๐Ÿš€ Feature

Add option for data shuffling in core/data.py
Data shuffling is crucial for removing dataset structure bias.

Motivation

I noticed my model was not performing well when I was using a custom dataset with spikes in performance across the epoch.
I then realized it was because the class data was in sequence, and there was no shuffling performed by default.
I then looked into the code but couldn't find any option to add shuffling: core/data.py
I had to then overwrite 3 functions, train_dataloader, val_dataloader, test_dataloader in order to get this functionality.

Pitch

Add a boolean shuffling argument in the constructor that enables this.

Alternatives

Additional context