rasbt/stat453-deep-learning-ss21

Unable to load CelebA dataset. File is not zip file error.

Hackathorn opened this issue · 6 comments

More of a FYI... Tried to reproduce L17 4_VAE_celeba-inspect notebook. When loading dataset, got ERROR "Unable to load CelebA dataset. File is not zip file error" with "BadZipFile: File is not a zip file". Found TorchVision Issue #2262 that identified problem as exceeding daily max quote on GoogleDrive, punted issue back to dataset authors, and closed their issue. A future version of TorchVision should give a better descriptive error message.

So, FYI to your students. Work-around is to...

rasbt commented

Thanks for the note, Richard, and I agree, this is definitely frustrating. I was recently teaching a GAN tutorial and had similar issues. Downloading the dataset from the original website can be a bit tedious because it involves several steps. So, for this tutorial, I gathered the relevant files and uploaded it as a zip file to my Google Drive.

In case it's useful, it's 1.7 Gb and you only need to unzip it in the current notebook directory (or rather the directory the dataset/dataloader points to): https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing

Download from your Google Drive and extracted/replace into L17/data folder was simple and worked great.

I have the same issue but even after downloading from your link, I get an error from the _check_integrity() function saying that Dataset not found or corrupted. You can use download=True to download it.

rasbt commented

Have you checked that all the files are non 0 kb? If download=True it may try to overwrite existing files such that they become empty files. If I have the files as shown below it seems to work (tried it the other day, see https://github.com/rasbt/machine-learning-book/blob/main/ch12/ch12_part1.ipynb)

Unknown

I did set download=False after downloading the files manually and checked their size as well. I figured the problem was with the checkintegrity function where it returns False.

So, I wrote a simple workaround to resolve it

class MyCelebA(CelebA):
    """
    A work-around to address issues with pytorch's celebA dataset class.
    
    Download and Extract
    URL : https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing
    """
    
    def _check_integrity(self) -> bool:
        return True
rasbt commented

Thanks for sharing!