MIT-LCP/mimic-code

How to improve download of MIMIC-CXR Dataset

Closed this issue · 3 comments

For my study, I need to download a subset of the MIMIC-CXR. I have all the paths of the Dicom files i need to download, so i'm trying to use wget to download only the images i need directly to my hard disk. The process is very slow, due to the fact that wget has an average connection of 700k, despite I'm connected via ethernet to a gigabit connection.

Any advice to improve this process?

@danielemolino You can find answers to this in the discussion at: https://github.com/MIT-LCP/mimic-code/discussions. Short answer, it is best to download from the cloud!

Screenshot 2024-04-09 at 9 29 37 AM

I managed to use the free credit on google cloud storage and the process is much faster now. But the process seems to still needs several days. I'm using gsutil in my python env, is it the best option? I need to download the files on a local hard disk.

Are you using gsutil -m to multithread the download? Otherwise, you might just be bandwidth limited.