Image Deblur - Custom dataset Error
kimtaehyeong opened this issue · 6 comments
Thanks for writing a good paper.
I have data[input,target] that I have.
Now
./datasets/
./datasets/GoPro/
./datasets/GoPro/train/
./datasets/GoPro/train/input/
./datasets/GoPro/train/target/
./datasets/GoPro/test/
./datasets/GoPro/test/input/
./datasets/GoPro/test/target/
Create a folder in the same way as
python scripts/data_preparation/gopro.py
Through preprocessing, blur_crops, blur_crops.imdb / sharp_crops, sharp_crops.imdb datasets were created.
Finally
python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 basicsr/train.py -opt options/train/GoPro/HINet.yml --launcher pytorch
I tried to learn through the above command.
But I got the following error:
ValueError: Keys in lq_folder and gt_folder are different.
...
...
subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/HINet/bin/python', '-u', 'basicsr/train.py', '--local_rank=7', '-opt', 'options/train/GoPro/HINet.yml', '--launcher', 'pytorch']' returned non-zero exit status 1.
How can I fix the error?
Thanks.
Hi, kimtaehyeong,
Thanks for your attention to HINet!
The error seems like the inconsistency between the input and target data, as shown in https://github.com/megvii-model/HINet/blob/main/basicsr/data/data_util.py#L151-L153 .
Make sure the count of the original images in ./datasets/GoPro/train/input/ and ./datasets/GoPro/train/target/ are same,
in ./datasets/GoPro/test/input/ and ./datasets/GoPro/test/target/ are same.
You can check the meta_info.txt in the input and target .lmdb folder to see whether they are identical.
There are our meta_info.txt for the training data and testing data, hope they could help you:
meta_info.txt for the cropped training data:
https://drive.google.com/file/d/1G2lI_QX9iSKDQ7Ub2-frDxUe9IJaZczA/view?usp=sharing
meta_info.txt for the testing data:
https://drive.google.com/file/d/1Oj1wB8dAhxy-Cawymy2uhPDeSqypW8EI/view?usp=sharing
Thanks.
Thank you so much,
With the help, learning succeeded normally.
If you have a 24g GPU that I have,
What should the ideal GPU training be?
Thank you.
Hi, kimtaehyeong,
Glad to help!
It's hard to say the optimal setting in your environment, a common practice would be
- make sure the gpu usage (not gpu memory) is fully exploited (near 100%) to speed up the training process.
- choose the batch_size, crop_size, and iterations of your model
- I recommend the total pixels the model "see" close to the baseline, which is 8(gpus) x 8 (batch_size) x 256 x 256 (crop_size) x 400000 (iters).
- Adjust a stable learning rate. You could test the model in ie. 1000, 2000 iters to see whether it is in your expectation.
- I recommend set the testing crop size the same as the training crop size you chose.
Thanks.
Thank you very much.
With your help, I have succeeded in my current learning, and I want to test my image.
Here is my test code:
python basicsr/demo.py -opt options/demo/demo.yml
Also, the error is:
My test environment trained with 3 gpu when training, so I modified gpu to 3 in the .yml file.
Disable distributed.
Traceback (most recent call last):
File "basicsr/demo.py", line 46, in <module>
main()
File "basicsr/demo.py", line 40, in main
model = create_model(opt)
File "/home/ubuntu/project/HINet/basicsr/models/__init__.py", line 44, in create_model
model = model_cls(opt)
File "/home/ubuntu/project/HINet/basicsr/models/image_restoration_model.py", line 37, in __init__
self.opt['path'].get('strict_load_g', True), param_key=self.opt['path'].get('param_key', 'params'))
File "/home/ubuntu/project/HINet/basicsr/models/base_model.py", line 277, in load_network
load_path, map_location=lambda storage, loc: storage)
File "/home/ubuntu/anaconda3/envs/hinet/lib/python3.6/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/hinet/lib/python3.6/site-packages/torch/serialization.py", line 780, in _legacy_load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 1607442 more bytes. The file might be corrupted.
How can I solve this ?
Hi, kimtaehyeong,
demo.py is designed to inference one image in one gpu, even if you train with 3 gpus.
Thanks.
Thank you very much.
Thank you it worked out well.