MIC-DKFZ/nnUNet

RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

Opened this issue · 0 comments

I encountered some problems during training.
(brats) (base) user1@5e374944978b:~/BraTS$ OMP_NUM_THREADS=1 nnUNetv2_train 1 3d_fullres 0 --npz

############################
INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md
############################

Using device: cuda:0
/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:164: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead.
self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None

#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################

2024-09-21 02:47:26.391587: do_dummy_2d_data_aug: False
2024-09-21 02:47:26.395703: Using splits from existing split file: /home/user1/BraTS/graphs/models/nnUNet/nnUNet_preprocessed/Dataset001_BraTS/splits_final.json
2024-09-21 02:47:26.396373: The split file contains 5 splits.
2024-09-21 02:47:26.396436: Desired fold for training: 0
2024-09-21 02:47:26.396486: This split has 433 training and 109 validation cases.
using pin_memory on device 0
Exception in thread Thread-1:
**Traceback (most recent call last):
File "/home/user1/miniconda3/envs/macau/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/user1/miniconda3/envs/macau/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, self._kwargs)
File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

Traceback (most recent call last):
File "/home/user1/miniconda3/envs/macau/bin/nnUNetv2_train", line 9, in
sys.exit(run_training_entry())
File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/run/run_training.py", line 211, in run_training
nnunet_trainer.run_training()
File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1362, in run_training
self.on_train_start()
File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 903, in on_train_start
self.dataloader_train, self.dataloader_val = self.get_dataloaders()
File "/home/user1/BraTS/graphs/models/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 696, in get_dataloaders
_ = next(mt_gen_train)
File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in next
item = self.__get_next_item()
File "/home/user1/miniconda3/envs/macau/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I tried using "OMP_NUM_THREADS=1" to handle this, but it didn't work at all. Can anyone give me some advice?PLEASE!