sophos/SOREL-20M

lmdb issues

Closed this issue · 1 comments

When I train the neural network, around the end of Epoch 8, the issue happens. I tried several times and still faced the same issue at the same epoch time.

File "train.py", line 209, in <module>
    baker.run()
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/baker.py", line 888, in run
    value = self.apply(*self.parse(argv), instance=instance)
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/baker.py", line 866, in apply
    return cmd.fn(*newargs, **newkwargs)
  File "train.py", line 128, in train_network
    for i, (features, labels) in enumerate(generator):
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
lmdb.ReadersFullError: Caught ReadersFullError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/anaconda3/envs/sorel/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/bao/code/talos/SOREL-20M/dataset.py", line 145, in __getitem__
    features = self.features_lmdb_reader(key)
  File "/home/bao/code/talos/SOREL-20M/dataset.py", line 30, in __call__
    with self.env.begin() as txn:
lmdb.ReadersFullError: mdb_txn_begin: MDB_READERS_FULL: Environment maxreaders limit reached

Tip. The option lock=False for lmdb.open(...) fixes the error MDB_READERS_FULL: Environment maxreaders limit reached.

I've just found the Tip from the site here, and it helped to fix the issue, hopefully you can help to update it in the source code:

https://codeslake.github.io/research/pytorch/How-to-use-LMDB-with-PyTorch-DataLoader/