ayushkarnawat/profit

Unable to read tfrecords file

Closed this issue · 0 comments

When attempting to read a (large) tfrecords file, (i.e. the preprocessed 3D embedded graph structure of a protein), the dataset cannot be properly read and parsed.

from torch.utils.data import DataLoader
from profit.utils.data_utils.datasets import TorchTFRecordsDataset

data = TorchTFRecordsDataset("data/3gb1/processed/egcn_fitness/tertiary5.tfrecords")
loader = DataLoader(data, batch_size=2)
for batch in loader:
    print([arr.shape for arr in batch.values()])

Current Behavior

Traceback (most recent call last):
  File "profit/utils/data_utils/datasets.py", line 312, in <module>
    for batch in loader:
  File "/Users/ayushkarnawat/miniconda3/envs/chem/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/Users/ayushkarnawat/miniconda3/envs/chem/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/Users/ayushkarnawat/miniconda3/envs/chem/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "profit/utils/data_utils/datasets.py", line 299, in __iter__
    for record in records:
  File "/Users/ayushkarnawat/Documents/dev/python_workspace/profit/profit/utils/data_utils/tfreader.py", line 123, in tfrecord_loader
    for record in record_iterator:
  File "/Users/ayushkarnawat/Documents/dev/python_workspace/profit/profit/utils/data_utils/tfreader.py", line 80, in tfrecord_iterator
    yield from read_records(offset)
  File "/Users/ayushkarnawat/Documents/dev/python_workspace/profit/profit/utils/data_utils/tfreader.py", line 65, in read_records
    raise RuntimeError("Failed to read the record.")
RuntimeError: Failed to read the record.

Expected Behavior

[torch.Size([2, 876, 63]), torch.Size([2, 876, 876]), torch.Size([2, 876, 876, 3]), torch.Size([2, 1])]
[torch.Size([2, 876, 63]), torch.Size([2, 876, 876]), torch.Size([2, 876, 876, 3]), torch.Size([2, 1])]
[torch.Size([1, 876, 63]), torch.Size([1, 876, 876]), torch.Size([1, 876, 876, 3]), torch.Size([1, 1])]