RunTime Error DataLoader worker (pid 2616) is killed by signal: Killed.
Closed this issue · 2 comments
chan-kh commented
Hi @jakelawcheukwun ,
I ran into a RunTime Error. The error message is the following:
`No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
No CUDA devices found, falling back to CPU
load checkpoint from models/model_weights.pth.tar, epoch:1
output dir exists: examples/utterance_1. Video processing skipped.
0%| | 0/5 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1587428091666/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of add is deprecated:
add(Tensor input, Number alpha, Tensor other, *, Tensor out)
Consider using one of the following signatures instead:
add(Tensor input, Tensor other, *, Number alpha, Tensor out)
0%| | 0/5 [00:49<?, ?it/s]
Traceback (most recent call last):
File "run_example.py", line 9, in <module>
results = tester.test(example_video)
File "/home/ubuntu/MIMAMO-Net/api/tester.py", line 62, in test
self.resnet50_extractor.run(opface_output_dir, feature_dir)
File "/home/ubuntu/MIMAMO-Net/api/resnet50_extractor.py", line 68, in run
output = self.get_vec(ims)
File "/home/ubuntu/MIMAMO-Net/api/resnet50_extractor.py", line 80, in get_vec
h_x = self.model(image)
File "/tmp/yes/envs/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/MIMAMO-Net/api/pytorch-benchmarks/ferplus/resnet50_ferplus_dag.py", line 245, in forward
conv3_3x = self.conv3_3_relu(conv3_3)
File "/tmp/yes/envs/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/tmp/yes/envs/myenv/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 94, in forward
return F.relu(input, inplace=self.inplace)
File "/tmp/yes/envs/myenv/lib/python3.6/site-packages/torch/nn/functional.py", line 1063, in relu
result = torch.relu(input)
File "/tmp/yes/envs/myenv/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 2616) is killed by signal: Killed. `
It seems like relating to the memory of the server.
pytorch/pytorch#4507
wtomin commented
In 'resnet50_extractor.py' 42-59 lines:
def run(self, input_dir, output_dir, batch_size=64):
'''
input_dir: string,
The input_dir should have one subdir containing all cropped and aligned face images for
a video (extracted by OpenFace). The input_dir should be named after the video name.
output_dir: string
All extracted feature vectors will be stored in output directory.
'''
assert os.path.exists(input_dir), 'input dir must exsit!'
assert len(os.listdir(input_dir)) != 0, 'input dir must not be empty!'
video_name = os.path.basename(input_dir)
dataset = Image_Sampler(video_name, input_dir, test_mode = True, transform=self.transform)
data_loader = torch.utils.data.DataLoader(
dataset,
batch_size = batch_size,
shuffle=False, drop_last=False,
num_workers=8, pin_memory=False )
I have two suggestions:
(1) The default batch size is 64, you can change it to some smaller number that fits into the memory.
(2) Sometimes the multiprocessing in dataloader will cause the problem. You can try to set num_workers=0
.