hkchengrex/XMem

Issues downloading BL30K dataset

bermudezarii opened this issue · 4 comments

Hello, we have issues downloading the dataset from the google drive and one drive, specially with wget, do you think is possible to update the ones from google drive to facilitate the downloading?

Thanks!

wget most likely would not work. You can try gdown https://github.com/wkentaro/gdown or rclone https://rclone.org/. I wish I can provide a direct link that is downloadable via wget, but I simply do not have that resource.

Basically, the issues we encountered are as follows:

  • If we try to download from the browser, we get a Network error a few hours in
  • If we try to use gdown, we get an error that the file access quota is filled and we need to come back later. This happens even if I copy the file to my own Drive and try to download from there

We found a slow mirror here, and it works, but takes days to download everything with wget.

What would you suggest? Perhaps for files of this size it would be possible to provide a torrent file / magnet link?

I personally used rclone for downloading and have not had issues downloading to a few different servers. You would have to "create shortcut" to copy it to your own drive, and use that link for downloading.

I am sorry for the inconvenience. I understand this dataset is not trivial to access/download, but I don't have better alternatives. Torrent requires a host (I think), but I don't have a server that can serve as one.

Fair enough. We'll take a look at rclone, and thanks for the quick reply!