Dataset too big!

Question

Dataset too big!

Opened this issue 6 years ago · 9 comments

This dataset is too big to download on my system. Is there anyway you can provide a smaller version of the dataset for test purposes.

Thanks.

Answer 1 · 2018-04-04T18:11:02.000Z

I was able to download it nonetheless. Thanks.

Answer 2 · 2020-08-25T22:29:44.000Z

Somebody upload the data to baidu drive, you can spend 3 dollor to get a prime membership and download it in 5 hours.

Answer 3 · 2020-08-25T22:37:02.000Z

You can only Register to baidu with an asian phone number Jiaming Hu <notifications@github.com> schrieb am Mi., 26. Aug. 2020, 00:29:

…

Somebody upload the data to baidu drive, you can spend 3 dollor to get a prime membership and download it in 5 hours. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABUF44JDBP2T3CQNXICWQYLSCQ3OLANCNFSM4EYUZ24A> .

Answer 4 · 2020-08-25T22:51:51.000Z

no. you do not have to. If you do it on computer then yes. What you need to do is download baidu app from google play, and register there, then you do not need to provide chinese phone number.

Answer 5 · 2020-10-29T17:53:36.000Z

If anyone is looking to download the full YCB-V dataset we have a hosted version in S3 and can give you access. Just reach out to info@sbxrobotics.com with subject : YCB-V download

Answer 6 · 2022-07-03T16:26:54.000Z

Hi @iandewancker is the S3 bucket still active?

Answer 7 · 2022-07-03T16:27:15.000Z

您的邮件我已收到，我会及时处理。祝好！

Answer 8 · 2023-09-01T06:39:47.000Z

For anyone still struggling with access to the dataset.

Here's how I managed to get it to work with accessible google drive link.

Have/get any google drive storage subscription so you won't get blocked from file access by "high file traffic" warning
Create symlink to dataset zipfile on your drive by clicking the icon in top right corner of shared file page
Create Google Colab instance and mount your google drive into file system

from google.colab import drive
drive.mount('/content/gdrive')

With console in front of your eyes and zipfile in file system, with less/more googling you should be fine from here.

In case you want to process the dataset in colab anyway, here's some more pitfalls to avoid.

The dataset zip seems to contain the same data twice. Usorted files in /data_syn and organized in /data/{video_num}.
Don't unzip the dataset into your google drive. Drive is extremely slow with accessing many small files. Either store it in uncompressed few-GB zip chunks, retrieve and unzip into colab instance as needed or do it the proper way with memory mapped files like HDF5.
Full extract will run out of RAM. Consider unzipping in chunks.

%%capture
#get list of all files contained in zipfile
all_files = !unzip -l {zipfile}
all_files = [line[30:] for line in all_files[3:][:-2]]

CHUNK = 100
n_chunks = math.ceil(len(all_files)/CHUNK)
for i in range(n_chunks):
    chunk = all_files[i*CHUNK:(i+1)*CHUNK]
    chunk = ' '.join(chunk)
    !unzip -n {zipfile} {chunk} -d {unzip_path} 1>/dev/null

Answer 9 · 2023-09-01T06:40:13.000Z

您的邮件我已收到，我会及时处理。祝好！