yuxng/YCB_Video_toolbox

Dataset too big!

Opened this issue · 9 comments

This dataset is too big to download on my system. Is there anyway you can provide a smaller version of the dataset for test purposes.

Thanks.

I was able to download it nonetheless. Thanks.

Somebody upload the data to baidu drive, you can spend 3 dollor to get a prime membership and download it in 5 hours.

no. you do not have to. If you do it on computer then yes. What you need to do is download baidu app from google play, and register there, then you do not need to provide chinese phone number.

If anyone is looking to download the full YCB-V dataset we have a hosted version in S3 and can give you access. Just reach out to info@sbxrobotics.com with subject : YCB-V download

Hi @iandewancker is the S3 bucket still active?

For anyone still struggling with access to the dataset.

Here's how I managed to get it to work with accessible google drive link.

  1. Have/get any google drive storage subscription so you won't get blocked from file access by "high file traffic" warning
  2. Create symlink to dataset zipfile on your drive by clicking the icon in top right corner of shared file page
  3. Create Google Colab instance and mount your google drive into file system
from google.colab import drive
drive.mount('/content/gdrive')

With console in front of your eyes and zipfile in file system, with less/more googling you should be fine from here.

In case you want to process the dataset in colab anyway, here's some more pitfalls to avoid.

  • The dataset zip seems to contain the same data twice. Usorted files in /data_syn and organized in /data/{video_num}.
  • Don't unzip the dataset into your google drive. Drive is extremely slow with accessing many small files. Either store it in uncompressed few-GB zip chunks, retrieve and unzip into colab instance as needed or do it the proper way with memory mapped files like HDF5.
  • Full extract will run out of RAM. Consider unzipping in chunks.
%%capture
#get list of all files contained in zipfile
all_files = !unzip -l {zipfile}
all_files = [line[30:] for line in all_files[3:][:-2]]
CHUNK = 100
n_chunks = math.ceil(len(all_files)/CHUNK)
for i in range(n_chunks):
    chunk = all_files[i*CHUNK:(i+1)*CHUNK]
    chunk = ' '.join(chunk)
    !unzip -n {zipfile} {chunk} -d {unzip_path} 1>/dev/null