dataset

Code related to creating and using datasets for machine learning. My datasets tend to be on the order of 10-100s of GB, too big to fit in memory, but able to fit on a single HD.

I store my data as flat files in /srv/data ex: /srv/data/shape_completion_data

I put my dataset in /srv/datasets The datasets have the following structure:

DATSET_NAME:
  - split0.txt
     x0.pcd, y0.pcd
     x1.pcd, y1.pcd
     ...
     
  - split1.txt
     x0.pcd, y0.pcd
     x1.pcd, y1.pcd
     ...
     
   - split2.txt
     x0.pcd, y0.pcd
     x1.pcd, y1.pcd
     ...
     
  - info.yaml
    {
    patch_size: 40,
     ...
     }

jvarley/dataset

dataset