DongSky/few-shot-vit

RuntimeError: DataLoader worker (pid 7572) is killed by signal: Killed.

lgx12345678 opened this issue · 0 comments

(py38) wl_ligexian@9f9da370a0bd:~/few-shot-vit-main/meta_tuning_sun_d$ python train_meta.py -deepemd grid -patch_list 2,3 -shot 1 -way 5 -solver opencv -gpu 0 -save_all
{'backbone': 'visformer',
'bs': 1,
'data_dir': '/public/home/wl_ligexian/few-shot-vit-main/test_phase/materials/',
'dataset': 'miniimagenet',
'deepemd': 'grid',
'extra_dir': None,
'feature_pyramid': None,
'form': 'L2',
'gamma': 0.5,
'gpu': '0',
'l2_strength': 1e-06,
'lr': 0.0005,
'max_epoch': 100,
'metric': 'cosine',
'norm': 'center',
'num_patch': 9,
'patch_list': '2,3',
'patch_ratio': 2,
'pretrain_dir': 'visformer_mini_1shot_ckpt.pth',
'query': 15,
'random_val_task': False,
'save_all': True,
'seed': 12345,
'set': 'val',
'sfc_bs': 4,
'sfc_lr': 0.1,
'sfc_update_step': 100,
'sfc_wd': 0,
'shot': 1,
'solver': 'opencv',
'step_size': 10,
'temperature': 12.5,
'test_episode': 2000,
'val_episode': 2000,
'val_frequency': 50,
'way': 5}
manual seed: 12345
use gpu: [0]
odict_keys(['encoder.pos_embed1', 'encoder.pos_embed2', 'encoder.pos_embed3', 'encoder.stem.conv1.weight', 'encoder.stem.bn1.weight', 'encoder.stem.bn1.bias', 'encoder.stem.bn1.running_mean', 'encoder.stem.bn1.running_var', 'encoder.stem.bn1.num_batches_tracked', 'encoder.stem.conv2.weight', 'encoder.stem.bn2.weight', 'encoder.stem.bn2.bias', 'encoder.stem.bn2.running_mean', 'encoder.stem.bn2.running_var', 'encoder.stem.bn2.num_batches_tracked', 'encoder.stem.conv3.weight', 'encoder.stem.bn3.weight', 'encoder.stem.bn3.bias', 'encoder.stem.bn3.running_mean', 'encoder.stem.bn3.running_var', 'encoder.stem.bn3.num_batches_tracked', 'encoder.stem.downsample.0.weight', 'encoder.stem.downsample.1.weight', 'encoder.stem.downsample.1.bias', 'encoder.stem.downsample.1.running_mean', 'encoder.stem.downsample.1.running_var', 'encoder.stem.downsample.1.num_batches_tracked', 'encoder.stage1.0.norm2.bn.weight', 'encoder.stage1.0.norm2.bn.bias', 'encoder.stage1.0.norm2.bn.running_mean', 'encoder.stage1.0.norm2.bn.running_var', 'encoder.stage1.0.norm2.bn.num_batches_tracked', 'encoder.stage1.0.mlp.conv1.weight', 'encoder.stage1.0.mlp.conv2.weight', 'encoder.stage1.0.mlp.conv3.weight', 'encoder.stage1.1.norm2.bn.weight', 'encoder.stage1.1.norm2.bn.bias', 'encoder.stage1.1.norm2.bn.running_mean', 'encoder.stage1.1.norm2.bn.running_var', 'encoder.stage1.1.norm2.bn.num_batches_tracked', 'encoder.stage1.1.mlp.conv1.weight', 'encoder.stage1.1.mlp.conv2.weight', 'encoder.stage1.1.mlp.conv3.weight', 'encoder.stage1.2.norm2.bn.weight', 'encoder.stage1.2.norm2.bn.bias', 'encoder.stage1.2.norm2.bn.running_mean', 'encoder.stage1.2.norm2.bn.running_var', 'encoder.stage1.2.norm2.bn.num_batches_tracked', 'encoder.stage1.2.mlp.conv1.weight', 'encoder.stage1.2.mlp.conv2.weight', 'encoder.stage1.2.mlp.conv3.weight', 'encoder.stage1.3.norm2.bn.weight', 'encoder.stage1.3.norm2.bn.bias', 'encoder.stage1.3.norm2.bn.running_mean', 'encoder.stage1.3.norm2.bn.running_var', 'encoder.stage1.3.norm2.bn.num_batches_tracked', 'encoder.stage1.3.mlp.conv1.weight', 'encoder.stage1.3.mlp.conv2.weight', 'encoder.stage1.3.mlp.conv3.weight', 'encoder.patch_embed2.proj.weight', 'encoder.patch_embed2.proj.bias', 'encoder.patch_embed2.norm.bn.weight', 'encoder.patch_embed2.norm.bn.bias', 'encoder.patch_embed2.norm.bn.running_mean', 'encoder.patch_embed2.norm.bn.running_var', 'encoder.patch_embed2.norm.bn.num_batches_tracked', 'encoder.stage2.0.norm1.bn.weight', 'encoder.stage2.0.norm1.bn.bias', 'encoder.stage2.0.norm1.bn.running_mean', 'encoder.stage2.0.norm1.bn.running_var', 'encoder.stage2.0.norm1.bn.num_batches_tracked', 'encoder.stage2.0.attn.qkv.weight', 'encoder.stage2.0.attn.proj.weight', 'encoder.stage2.0.norm2.bn.weight', 'encoder.stage2.0.norm2.bn.bias', 'encoder.stage2.0.norm2.bn.running_mean', 'encoder.stage2.0.norm2.bn.running_var', 'encoder.stage2.0.norm2.bn.num_batches_tracked', 'encoder.stage2.0.mlp.conv1.weight', 'encoder.stage2.0.mlp.conv3.weight', 'encoder.stage2.1.norm1.bn.weight', 'encoder.stage2.1.norm1.bn.bias', 'encoder.stage2.1.norm1.bn.running_mean', 'encoder.stage2.1.norm1.bn.running_var', 'encoder.stage2.1.norm1.bn.num_batches_tracked', 'encoder.stage2.1.attn.qkv.weight', 'encoder.stage2.1.attn.proj.weight', 'encoder.stage2.1.norm2.bn.weight', 'encoder.stage2.1.norm2.bn.bias', 'encoder.stage2.1.norm2.bn.running_mean', 'encoder.stage2.1.norm2.bn.running_var', 'encoder.stage2.1.norm2.bn.num_batches_tracked', 'encoder.stage2.1.mlp.conv1.weight', 'encoder.stage2.1.mlp.conv3.weight', 'encoder.patch_embed3.proj.weight', 'encoder.patch_embed3.proj.bias', 'encoder.patch_embed3.norm.bn.weight', 'encoder.patch_embed3.norm.bn.bias', 'encoder.patch_embed3.norm.bn.running_mean', 'encoder.patch_embed3.norm.bn.running_var', 'encoder.patch_embed3.norm.bn.num_batches_tracked', 'encoder.stage3.0.norm1.bn.weight', 'encoder.stage3.0.norm1.bn.bias', 'encoder.stage3.0.norm1.bn.running_mean', 'encoder.stage3.0.norm1.bn.running_var', 'encoder.stage3.0.norm1.bn.num_batches_tracked', 'encoder.stage3.0.attn.qkv.weight', 'encoder.stage3.0.attn.proj.weight', 'encoder.stage3.0.norm2.bn.weight', 'encoder.stage3.0.norm2.bn.bias', 'encoder.stage3.0.norm2.bn.running_mean', 'encoder.stage3.0.norm2.bn.running_var', 'encoder.stage3.0.norm2.bn.num_batches_tracked', 'encoder.stage3.0.mlp.conv1.weight', 'encoder.stage3.0.mlp.conv3.weight', 'encoder.stage3.1.norm1.bn.weight', 'encoder.stage3.1.norm1.bn.bias', 'encoder.stage3.1.norm1.bn.running_mean', 'encoder.stage3.1.norm1.bn.running_var', 'encoder.stage3.1.norm1.bn.num_batches_tracked', 'encoder.stage3.1.attn.qkv.weight', 'encoder.stage3.1.attn.proj.weight', 'encoder.stage3.1.norm2.bn.weight', 'encoder.stage3.1.norm2.bn.bias', 'encoder.stage3.1.norm2.bn.running_mean', 'encoder.stage3.1.norm2.bn.running_var', 'encoder.stage3.1.norm2.bn.num_batches_tracked', 'encoder.stage3.1.mlp.conv1.weight', 'encoder.stage3.1.mlp.conv3.weight', 'encoder.stage3.2.norm1.bn.weight', 'encoder.stage3.2.norm1.bn.bias', 'encoder.stage3.2.norm1.bn.running_mean', 'encoder.stage3.2.norm1.bn.running_var', 'encoder.stage3.2.norm1.bn.num_batches_tracked', 'encoder.stage3.2.attn.qkv.weight', 'encoder.stage3.2.attn.proj.weight', 'encoder.stage3.2.norm2.bn.weight', 'encoder.stage3.2.norm2.bn.bias', 'encoder.stage3.2.norm2.bn.running_mean', 'encoder.stage3.2.norm2.bn.running_var', 'encoder.stage3.2.norm2.bn.num_batches_tracked', 'encoder.stage3.2.mlp.conv1.weight', 'encoder.stage3.2.mlp.conv3.weight', 'encoder.norm.bn.weight', 'encoder.norm.bn.bias', 'encoder.norm.bn.running_mean', 'encoder.norm.bn.running_var', 'encoder.norm.bn.num_batches_tracked'])
loading model from : visformer_mini_1shot_ckpt.pth
detect temp variable, delete it
odict_keys(['encoder.pos_embed1', 'encoder.pos_embed2', 'encoder.pos_embed3', 'encoder.stem.conv1.weight', 'encoder.stem.bn1.weight', 'encoder.stem.bn1.bias', 'encoder.stem.bn1.running_mean', 'encoder.stem.bn1.running_var', 'encoder.stem.bn1.num_batches_tracked', 'encoder.stem.conv2.weight', 'encoder.stem.bn2.weight', 'encoder.stem.bn2.bias', 'encoder.stem.bn2.running_mean', 'encoder.stem.bn2.running_var', 'encoder.stem.bn2.num_batches_tracked', 'encoder.stem.conv3.weight', 'encoder.stem.bn3.weight', 'encoder.stem.bn3.bias', 'encoder.stem.bn3.running_mean', 'encoder.stem.bn3.running_var', 'encoder.stem.bn3.num_batches_tracked', 'encoder.stem.downsample.0.weight', 'encoder.stem.downsample.1.weight', 'encoder.stem.downsample.1.bias', 'encoder.stem.downsample.1.running_mean', 'encoder.stem.downsample.1.running_var', 'encoder.stem.downsample.1.num_batches_tracked', 'encoder.stage1.0.norm2.bn.weight', 'encoder.stage1.0.norm2.bn.bias', 'encoder.stage1.0.norm2.bn.running_mean', 'encoder.stage1.0.norm2.bn.running_var', 'encoder.stage1.0.norm2.bn.num_batches_tracked', 'encoder.stage1.0.mlp.conv1.weight', 'encoder.stage1.0.mlp.conv2.weight', 'encoder.stage1.0.mlp.conv3.weight', 'encoder.stage1.1.norm2.bn.weight', 'encoder.stage1.1.norm2.bn.bias', 'encoder.stage1.1.norm2.bn.running_mean', 'encoder.stage1.1.norm2.bn.running_var', 'encoder.stage1.1.norm2.bn.num_batches_tracked', 'encoder.stage1.1.mlp.conv1.weight', 'encoder.stage1.1.mlp.conv2.weight', 'encoder.stage1.1.mlp.conv3.weight', 'encoder.stage1.2.norm2.bn.weight', 'encoder.stage1.2.norm2.bn.bias', 'encoder.stage1.2.norm2.bn.running_mean', 'encoder.stage1.2.norm2.bn.running_var', 'encoder.stage1.2.norm2.bn.num_batches_tracked', 'encoder.stage1.2.mlp.conv1.weight', 'encoder.stage1.2.mlp.conv2.weight', 'encoder.stage1.2.mlp.conv3.weight', 'encoder.stage1.3.norm2.bn.weight', 'encoder.stage1.3.norm2.bn.bias', 'encoder.stage1.3.norm2.bn.running_mean', 'encoder.stage1.3.norm2.bn.running_var', 'encoder.stage1.3.norm2.bn.num_batches_tracked', 'encoder.stage1.3.mlp.conv1.weight', 'encoder.stage1.3.mlp.conv2.weight', 'encoder.stage1.3.mlp.conv3.weight', 'encoder.patch_embed2.proj.weight', 'encoder.patch_embed2.proj.bias', 'encoder.patch_embed2.norm.bn.weight', 'encoder.patch_embed2.norm.bn.bias', 'encoder.patch_embed2.norm.bn.running_mean', 'encoder.patch_embed2.norm.bn.running_var', 'encoder.patch_embed2.norm.bn.num_batches_tracked', 'encoder.stage2.0.norm1.bn.weight', 'encoder.stage2.0.norm1.bn.bias', 'encoder.stage2.0.norm1.bn.running_mean', 'encoder.stage2.0.norm1.bn.running_var', 'encoder.stage2.0.norm1.bn.num_batches_tracked', 'encoder.stage2.0.attn.qkv.weight', 'encoder.stage2.0.attn.proj.weight', 'encoder.stage2.0.norm2.bn.weight', 'encoder.stage2.0.norm2.bn.bias', 'encoder.stage2.0.norm2.bn.running_mean', 'encoder.stage2.0.norm2.bn.running_var', 'encoder.stage2.0.norm2.bn.num_batches_tracked', 'encoder.stage2.0.mlp.conv1.weight', 'encoder.stage2.0.mlp.conv3.weight', 'encoder.stage2.1.norm1.bn.weight', 'encoder.stage2.1.norm1.bn.bias', 'encoder.stage2.1.norm1.bn.running_mean', 'encoder.stage2.1.norm1.bn.running_var', 'encoder.stage2.1.norm1.bn.num_batches_tracked', 'encoder.stage2.1.attn.qkv.weight', 'encoder.stage2.1.attn.proj.weight', 'encoder.stage2.1.norm2.bn.weight', 'encoder.stage2.1.norm2.bn.bias', 'encoder.stage2.1.norm2.bn.running_mean', 'encoder.stage2.1.norm2.bn.running_var', 'encoder.stage2.1.norm2.bn.num_batches_tracked', 'encoder.stage2.1.mlp.conv1.weight', 'encoder.stage2.1.mlp.conv3.weight', 'encoder.patch_embed3.proj.weight', 'encoder.patch_embed3.proj.bias', 'encoder.patch_embed3.norm.bn.weight', 'encoder.patch_embed3.norm.bn.bias', 'encoder.patch_embed3.norm.bn.running_mean', 'encoder.patch_embed3.norm.bn.running_var', 'encoder.patch_embed3.norm.bn.num_batches_tracked', 'encoder.stage3.0.norm1.bn.weight', 'encoder.stage3.0.norm1.bn.bias', 'encoder.stage3.0.norm1.bn.running_mean', 'encoder.stage3.0.norm1.bn.running_var', 'encoder.stage3.0.norm1.bn.num_batches_tracked', 'encoder.stage3.0.attn.qkv.weight', 'encoder.stage3.0.attn.proj.weight', 'encoder.stage3.0.norm2.bn.weight', 'encoder.stage3.0.norm2.bn.bias', 'encoder.stage3.0.norm2.bn.running_mean', 'encoder.stage3.0.norm2.bn.running_var', 'encoder.stage3.0.norm2.bn.num_batches_tracked', 'encoder.stage3.0.mlp.conv1.weight', 'encoder.stage3.0.mlp.conv3.weight', 'encoder.stage3.1.norm1.bn.weight', 'encoder.stage3.1.norm1.bn.bias', 'encoder.stage3.1.norm1.bn.running_mean', 'encoder.stage3.1.norm1.bn.running_var', 'encoder.stage3.1.norm1.bn.num_batches_tracked', 'encoder.stage3.1.attn.qkv.weight', 'encoder.stage3.1.attn.proj.weight', 'encoder.stage3.1.norm2.bn.weight', 'encoder.stage3.1.norm2.bn.bias', 'encoder.stage3.1.norm2.bn.running_mean', 'encoder.stage3.1.norm2.bn.running_var', 'encoder.stage3.1.norm2.bn.num_batches_tracked', 'encoder.stage3.1.mlp.conv1.weight', 'encoder.stage3.1.mlp.conv3.weight', 'encoder.stage3.2.norm1.bn.weight', 'encoder.stage3.2.norm1.bn.bias', 'encoder.stage3.2.norm1.bn.running_mean', 'encoder.stage3.2.norm1.bn.running_var', 'encoder.stage3.2.norm1.bn.num_batches_tracked', 'encoder.stage3.2.attn.qkv.weight', 'encoder.stage3.2.attn.proj.weight', 'encoder.stage3.2.norm2.bn.weight', 'encoder.stage3.2.norm2.bn.bias', 'encoder.stage3.2.norm2.bn.running_mean', 'encoder.stage3.2.norm2.bn.running_var', 'encoder.stage3.2.norm2.bn.num_batches_tracked', 'encoder.stage3.2.mlp.conv1.weight', 'encoder.stage3.2.mlp.conv3.weight', 'encoder.norm.bn.weight', 'encoder.norm.bn.bias', 'encoder.norm.bn.running_mean', 'encoder.norm.bn.running_var', 'encoder.norm.bn.num_batches_tracked'])
/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 3, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
fix val set for all epochs

Traceback (most recent call last):
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/queue.py", line 179, in get
self.not_empty.wait(remaining)
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/threading.py", line 306, in wait
gotit = waiter.acquire(True, timeout)
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 7572) is killed by signal: Killed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train_meta.py", line 119, in
val_loader=[x for x in val_loader]
File "train_meta.py", line 119, in
val_loader=[x for x in val_loader]
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1329, in _next_data
idx, data = self._get_data()
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1285, in _get_data
success, data = self._try_get_data()
File "/public/home/wl_ligexian/anaconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1146, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 7572, 7629) exited unexpectedly
terminate called without an active exception
Aborted