Lightning-Universe/lightning-flash

MisconfigurationException("The `batch_size` should be provided to the DataModule on instantiation.")

JohannesK14 opened this issue ยท 4 comments

๐Ÿ› Bug

SemanticSegmentationData.from_datasets() functions throws MisconfigurationException error because of allegedly missing batch_size specification, although it is actually set.

To Reproduce

Code sample

from flash.image import SemanticSegmentationData
from torch.utils.data import Dataset

train_dataset = Dataset()

dm = SemanticSegmentationData.from_datasets(train_dataset=train_dataset, batch_size=4)

Expected behavior

I expect SemanticSegmentationData.from_datasets to consume the specified batch_size argument and to return successfully (or to throw an exception because of an empty dataset).

Environment

  • OS: Ubuntu 20.04
  • Python version: 3.10.6
  • PyTorch/Lightning/Flash Version (e.g., 1.10/1.5/0.7): 1.12.0 / 1.7.6 / 0.8
  • GPU models and configuration: NVIDIA GeForce RTX 3090
  • Any other relevant information:

Additional context

I stumbled over this bug while trying to fine-tune a semantic segmentation model on the Cityscapes dataset. For convenience I wanted to use the torchvision.datasets.Cityscapes module.

Hi, @JohannesK14 - Thank you for reporting this issue. I was able to reproduce this issue. The issue comes from PyTorch Lightning, and looks like it was fixed here, but has been added to 1.8 release.

Sorry for the inconvenience, but meanwhile, you can use our other helper methods (from_files, from_folders). This should be fixed with PL 1.8 release though, I'll get back to you with any workarounds for this to be solved though. :) Thanks!

Hi @krshrimali , thanks for verifying! I switched to from_folders for now and look forward to PL 1.8.

Borda commented

@JohannesK14 PL 1.8 is out, so could you verify it works fine? ๐Ÿฆฆ

Hi @Borda , I just tested it and it works fine for me with:

lightning-flash 0.8.1
pytorch-lightning 1.8.3
PyTorch 1.13.0