NVIDIA-Merlin/dataloader

[Question] OOM Is there a way not to load the whole dataset in the dataloader?

gaceladri opened this issue · 1 comments

Hello,
I have a very large parquet file that the Loader is trying to load on a 24 GB GPU. Is there any way not to load the whole dataset into the dataloader?

Solved following the NVTabular documentation for good practices.

I added train.to_parquet("../../data/processed/merlin_train", engine="pyarrow", **row_group_size=10000**) and I can load the dataset.

Thanks