victorca25/traiNNer

Correct usage of lmdb

N0manDemo opened this issue · 11 comments

I used create_lmdb.py to create both my LR and HR datasets, and I was wondering how I should configure my options file.
Do the settings differ from using HR/LR image folders?

Hello! Technically you only need to point the dataroot_HR and dataroot_LR to the correct directories ending in '.lmdb' and they should be loaded correctly. I haven't used lmdb in a while, so let me know how it goes!

The directories should look like:
train_HR.lmdb
├── data.mdb
├── lock.mdb
├── meta_info.txt

Hi victorca25,

I receive this error when loading images from my lmdb directory.

My Log File
error_lmdb.log

My Config File:

train_esrgan.txt

21-02-12 17:31:47.474 - INFO: Random seed: 0
21-02-12 17:31:47.479 - INFO: Read lmdb keys from cache: ../../datasets/main/hr.lmdb/_keys_cache.p
21-02-12 17:31:47.479 - INFO: Dataset [LRHRDataset - DIV2K] is created.
21-02-12 17:31:47.479 - INFO: Number of train images: 44, iters: 6
21-02-12 17:31:47.479 - INFO: Total epochs needed: 83334 for iters 500,000
21-02-12 17:31:47.479 - INFO: Read lmdb keys from cache: ../../datasets/main/val/hr.lmdb/_keys_cache.p
21-02-12 17:31:47.479 - INFO: Dataset [LRHRDataset - val_set14_part] is created.
21-02-12 17:31:47.479 - INFO: Number of val images in [val_set14_part]: 44
21-02-12 17:31:47.631 - INFO: AMP library available
21-02-12 17:31:48.803 - INFO: Initialization method [kaiming]
21-02-12 17:31:49.020 - INFO: Initialization method [kaiming]
21-02-12 17:31:49.931 - INFO: AMP enabled
21-02-12 17:31:49.939 - INFO: Network G structure: DataParallel - RRDBNet, with parameters: 16,697,987
21-02-12 17:31:49.939 - INFO: Network D structure: DataParallel - Discriminator_VGG, with parameters: 14,502,281
21-02-12 17:31:49.939 - INFO: Model [SRRaGANModel] is created.
21-02-12 17:31:49.939 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 416, in
main()
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 412, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 215, in fit
for n, train_data in enumerate(dataloaders['train'], start=1):
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/ext4-storage/Training/BasicSR/codes/data/LRHRC_dataset.py", line 224, in getitem
img_HR = util.read_img(self.HR_env, HR_path, out_nc=image_channels)
File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 129, in read_img
img = fix_img_channels(img, out_nc)
File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 139, in fix_img_channels
if img.ndim == 2:
AttributeError: 'NoneType' object has no attribute 'ndim'

Can you try adding in line 100 here: https://github.com/victorca25/BasicSR/blob/master/codes/dataops/common.py

print("env: ", env)

And let me know what it prints in the console?

21-02-13 13:08:32.573 - INFO: Random seed: 0
21-02-13 13:08:32.594 - INFO: Read lmdb keys from cache: ../../datasets/main/hr.lmdb/_keys_cache.p
21-02-13 13:08:32.595 - INFO: Dataset [LRHRDataset - DIV2K] is created.
21-02-13 13:08:32.595 - INFO: Number of train images: 44, iters: 6
21-02-13 13:08:32.595 - INFO: Total epochs needed: 83334 for iters 500,000
21-02-13 13:08:32.596 - INFO: Read lmdb keys from cache: ../../datasets/main/val/hr.lmdb/_keys_cache.p
21-02-13 13:08:32.597 - INFO: Dataset [LRHRDataset - val_set14_part] is created.
21-02-13 13:08:32.597 - INFO: Number of val images in [val_set14_part]: 44
21-02-13 13:08:33.009 - INFO: AMP library available
21-02-13 13:08:36.369 - INFO: Initialization method [kaiming]
21-02-13 13:08:36.587 - INFO: Initialization method [kaiming]
21-02-13 13:08:38.641 - INFO: AMP enabled
21-02-13 13:08:38.648 - INFO: Network G structure: DataParallel - RRDBNet, with parameters: 16,697,987
21-02-13 13:08:38.649 - INFO: Network D structure: DataParallel - Discriminator_VGG, with parameters: 14,502,281
21-02-13 13:08:38.649 - INFO: Model [SRRaGANModel] is created.
21-02-13 13:08:38.649 - INFO: Start training from epoch: 0, iter: 0
env: None
env: None
env: None
env: None
env: None
Traceback (most recent call last):
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 416, in
main()
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 412, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 215, in fit
for n, train_data in enumerate(dataloaders['train'], start=1):
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/n0man/Envs/main/lib64/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/ext4-storage/Training/BasicSR/codes/data/LRHRC_dataset.py", line 224, in getitem
img_HR = util.read_img(self.HR_env, HR_path, out_nc=image_channels)
File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 129, in read_img
img = fix_img_channels(img, out_nc)
File "/mnt/ext4-storage/Training/BasicSR/codes/dataops/common.py", line 139, in fix_img_channels
if img.ndim == 2:
AttributeError: 'NoneType' object has no attribute 'ndim'

(main) [n0man@fedora-desktop-n0man codes]$

Ok, so the problem is that for some reason the enviroment variables for lmdb are not being correctly passed to the read function:

env: None
env: None
env: None
env: None```

I'm working on something else at the moment, but I'll try to take a look to see if I find where the issue is.

I may have found a solution, but it will take a while to commit, since I have been modifying the dataloaders and they are not in a state I can commit at the moment

Awesome! It won't be on hold, I just need one or two more days to finish testing it and modifying the dataloaders to a state that can be committed, I'll let you know when it's up

@N0manDemo the updated datasets and lmdb codes have now been commited. Please refer to the wiki for more details about the updated lmdb, you will have to recreate the database with the script, but it should work much better now

Thank you.

lmdb is working now.