ayushkarnawat/profit

Load LMDB/TFRecords file into pytorch datasets

Closed this issue · 1 comments

For efficient loading when training using pytorch models, it is recommended that we use torch.utils.data.dataloader class for loading batched data on-the-fly when training. To do so, we need to convert the saved dataset into a data loader class that the load_dataset() method can use (see below).

def load_dataset(method, mutator_fmt, labels, rootdir='data/3gb1/processed/',
num_data=-1, filetype='h5', as_numpy=False) -> Union[np.ndarray, List[np.ndarray]]:

Related to #35 and #40.

For now, TFRecords files will not be loaded by pytorch as it requires using tensorflow to read the file, which is (a) cumbersome, and (b) defeats the purpose of using just one backend type (aka either pytorch or tensorflow).