Unable to read tfrecords file
Closed this issue · 0 comments
ayushkarnawat commented
When attempting to read a (large) tfrecords file, (i.e. the preprocessed 3D embedded graph structure of a protein), the dataset cannot be properly read and parsed.
from torch.utils.data import DataLoader
from profit.utils.data_utils.datasets import TorchTFRecordsDataset
data = TorchTFRecordsDataset("data/3gb1/processed/egcn_fitness/tertiary5.tfrecords")
loader = DataLoader(data, batch_size=2)
for batch in loader:
print([arr.shape for arr in batch.values()])
Current Behavior
Traceback (most recent call last):
File "profit/utils/data_utils/datasets.py", line 312, in <module>
for batch in loader:
File "/Users/ayushkarnawat/miniconda3/envs/chem/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/Users/ayushkarnawat/miniconda3/envs/chem/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/Users/ayushkarnawat/miniconda3/envs/chem/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
data.append(next(self.dataset_iter))
File "profit/utils/data_utils/datasets.py", line 299, in __iter__
for record in records:
File "/Users/ayushkarnawat/Documents/dev/python_workspace/profit/profit/utils/data_utils/tfreader.py", line 123, in tfrecord_loader
for record in record_iterator:
File "/Users/ayushkarnawat/Documents/dev/python_workspace/profit/profit/utils/data_utils/tfreader.py", line 80, in tfrecord_iterator
yield from read_records(offset)
File "/Users/ayushkarnawat/Documents/dev/python_workspace/profit/profit/utils/data_utils/tfreader.py", line 65, in read_records
raise RuntimeError("Failed to read the record.")
RuntimeError: Failed to read the record.
Expected Behavior
[torch.Size([2, 876, 63]), torch.Size([2, 876, 876]), torch.Size([2, 876, 876, 3]), torch.Size([2, 1])]
[torch.Size([2, 876, 63]), torch.Size([2, 876, 876]), torch.Size([2, 876, 876, 3]), torch.Size([2, 1])]
[torch.Size([1, 876, 63]), torch.Size([1, 876, 876]), torch.Size([1, 876, 876, 3]), torch.Size([1, 1])]