
Code related to creating and using datasets for machine learning

Primary LanguagePython


Code related to creating and using datasets for machine learning. My datasets tend to be on the order of 10-100s of GB, too big to fit in memory, but able to fit on a single HD.

I store my data as flat files in /srv/data ex: /srv/data/shape_completion_data

I put my dataset in /srv/datasets The datasets have the following structure:

  - split0.txt
     x0.pcd, y0.pcd
     x1.pcd, y1.pcd
  - split1.txt
     x0.pcd, y0.pcd
     x1.pcd, y1.pcd
   - split2.txt
     x0.pcd, y0.pcd
     x1.pcd, y1.pcd
  - info.yaml
    patch_size: 40,