AssertionError: Caught AssertionError in DataLoader worker process 0.

Question

AssertionError: Caught AssertionError in DataLoader worker process 0.

Marieology opened this issue 2 years ago · 0 comments

Hello~, thanks for code!!
I am trying to run train code with my custom dataset, but I faced such error messages again and agian.
my dataset has 2 classes (water and background), jpg images are 3 bands images and label images are 2 band images(0,1)
I modified the yaml file to fit my dataset
`DATASET:
root_dataset: ".data/"
list_train: "./data/training.odgt"
list_val: "./data/validation.odgt"
num_class: 2
imgSizes: (256,)
imgMaxSize: 256
padding_constant: 32
segm_downsampling_rate: 4
random_flip: True

MODEL:
arch_encoder: "hrnetv2"
arch_decoder: "c1"
fc_dim: 720

TRAIN:
batch_size_per_gpu: 2 #2
num_epoch: 30
start_epoch: 0
epoch_iters: 5000
optim: "SGD"
lr_encoder: 0.02
lr_decoder: 0.02
lr_pow: 0.9
beta1: 0.9
weight_decay: 1e-4
deep_sup_scale: 0.4
fix_bn: False
workers: 8
disp_iter: 20
seed: 304

VAL:
visualize: False
checkpoint: "epoch_30.pth"

TEST:
checkpoint: "epoch_30.pth"
result: "./"

DIR: "ckpt/floods2-hrnetv2-c1"
`

And defaults.py

from yacs.config import CfgNode as CN

# -----------------------------------------------------------------------------
# Config definition
# -----------------------------------------------------------------------------

_C = CN()
_C.DIR = "ckpt/floods-hrnetv2-c1"

# -----------------------------------------------------------------------------
# Dataset
# -----------------------------------------------------------------------------
_C.DATASET = CN()
_C.DATASET.root_dataset = "./data"
_C.DATASET.list_train = "./data/training.odgt"
_C.DATASET.list_val = "./data/validation.odgt"
_C.DATASET.num_class = 2
# multiscale train/test, size of short edge (int or tuple)
_C.DATASET.imgSizes = (256,)
# maximum input image size of long edge
_C.DATASET.imgMaxSize = 256
# maxmimum downsampling rate of the network
_C.DATASET.padding_constant = 32
# downsampling rate of the segmentation label
_C.DATASET.segm_downsampling_rate = 4
# randomly horizontally flip images when train/test
_C.DATASET.random_flip = True

# -----------------------------------------------------------------------------
# Model
# -----------------------------------------------------------------------------
_C.MODEL = CN()
# architecture of net_encoder
_C.MODEL.arch_encoder = "hrnetv2"
# architecture of net_decoder
_C.MODEL.arch_decoder = "c1"
# weights to finetune net_encoder
_C.MODEL.weights_encoder = ""
# weights to finetune net_decoder
_C.MODEL.weights_decoder = ""
# number of feature channels between encoder and decoder
_C.MODEL.fc_dim = 720

# -----------------------------------------------------------------------------
# Training
# -----------------------------------------------------------------------------
_C.TRAIN = CN()
_C.TRAIN.batch_size_per_gpu = 2
# epochs to train for
_C.TRAIN.num_epoch = 8
# epoch to start training. useful if continue from a checkpoint
_C.TRAIN.start_epoch = 0
# iterations of each epoch (irrelevant to batch size)
_C.TRAIN.epoch_iters = 5000

_C.TRAIN.optim = "SGD"
_C.TRAIN.lr_encoder = 0.02
_C.TRAIN.lr_decoder = 0.02
# power in poly to drop LR
_C.TRAIN.lr_pow = 0.9
# momentum for sgd, beta1 for adam
_C.TRAIN.beta1 = 0.9
# weights regularizer
_C.TRAIN.weight_decay = 1e-4
# the weighting of deep supervision loss
_C.TRAIN.deep_sup_scale = 0.4
# fix bn params, only under finetuning
_C.TRAIN.fix_bn = False
# number of data loading workers
_C.TRAIN.workers = 16

# frequency to display
_C.TRAIN.disp_iter = 20
# manual seed
_C.TRAIN.seed = 304

# -----------------------------------------------------------------------------
# Validation
# -----------------------------------------------------------------------------
_C.VAL = CN()
# currently only supports 1
_C.VAL.batch_size = 1
# output visualization during validation
_C.VAL.visualize = False
# the checkpoint to evaluate on
_C.VAL.checkpoint = "epoch_20.pth"

# -----------------------------------------------------------------------------
# Testing
# -----------------------------------------------------------------------------
_C.TEST = CN()
# currently only supports 1
_C.TEST.batch_size = 1
# the checkpoint to test on
_C.TEST.checkpoint = "epoch_20.pth"
# folder to output visualization results
_C.TEST.result = "./"
```
`
Then 
I also modified 
1) model.py -> class SegmentationModule -> def forward
following [https://hackmd.io/wNGlmMq2RC-lY3l8JhO4SA?view]

2) dataset.py
`def segm_transform(self, segm):
        # to tensor, -1 to 149 for the default dataset
        # for ours, -1 background, 0 and 1 for classes
        # the segm input format is (0, 38, 75)
        # we need to change it to (-1, 0, 1)
        segm = np.array(segm)
        segm = np.where(segm==0, 1, segm)
        segm = np.where(segm==1, 2, segm)
        # print(np.unique(segm))
        # Original
        # segm = torch.from_numpy(segm).long() - 1
        segm = torch.from_numpy(segm).long()
        return segm`

following [https://hackmd.io/wNGlmMq2RC-lY3l8JhO4SA?view]


Training using adek20 with original code was successful.
So, Im totally lost here... 

`Traceback (most recent call last):
  File "C:\Users\USER\anaconda3\envs\pytorch18_py39\lib\site-packages\torch\utils\data\dataloader.py", line 517, in __next__
    data = self._next_data()
  File "C:\Users\USER\anaconda3\envs\pytorch18_py39\lib\site-packages\torch\utils\data\dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "C:\Users\USER\anaconda3\envs\pytorch18_py39\lib\site-packages\torch\utils\data\dataloader.py", line 1225, in _process_data
    data.reraise()
  File "C:\Users\USER\anaconda3\envs\pytorch18_py39\lib\site-packages\torch\_utils.py", line 429, in reraise
    raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\USER\anaconda3\envs\pytorch18_py39\lib\site-packages\torch\utils\data\_utils\worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\USER\anaconda3\envs\pytorch18_py39\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\USER\anaconda3\envs\pytorch18_py39\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "E:\e\code\python\11.hrnet\semantic-segmentation-pytorch-master\semantic-segmentation-pytorch-master\mit_semseg\dataset.py", line 173, in __getitem__
    assert(segm.mode == "L")
AssertionError`

Really appereicate for your any advice

Thank you in advance.