[BUG] loss will suddenly increase very high in the train process
Oratacth opened this issue · 0 comments
Does anyone else have this problem when training?
like this :
Epoch 2/50 |# | (25/500) | Total: 0:00:14 | ETA: 0:04:09 | loss: 2.4039 | loss_kp_2d: 1.47 | loss_kp_3d: 0.98 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.27 | d_m_disc_loss: 0.31 | data:
Epoch 2/50 |# | (26/500) | Total: 0:00:14 | ETA: 0:04:08 | loss: 2.4214 | loss_kp_2d: 1.76 | loss_kp_3d: 0.76 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.28 | d_m_disc_loss: 0.31 | data:
Epoch 2/50 |# | (27/500) | Total: 0:00:15 | ETA: 0:04:07 | loss: 2.4028 | loss_kp_2d: 0.75 | loss_kp_3d: 0.83 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.28 | d_m_disc_loss: 0.32 | data:
Epoch 2/50 |# | (28/500) | Total: 0:00:15 | ETA: 0:04:07 | loss: 2.3830 | loss_kp_2d: 0.72 | loss_kp_3d: 0.79 | e_m_disc_loss: 0.04 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.27 | d_m_disc_loss: 0.30 | data:
Epoch 2/50 |# | (29/500) | Total: 0:00:16 | ETA: 0:04:05 | loss: 2.3815 | loss_kp_2d: 0.97 | loss_kp_3d: 1.05 | e_m_disc_loss: 0.05 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.25 | d_m_disc_loss: 0.28 | data:
Epoch 2/50 |# | (30/500) | Total: 0:00:16 | ETA: 0:04:05 | loss: 2.3664 | loss_kp_2d: 0.82 | loss_kp_3d: 0.66 | e_m_disc_loss: 0.08 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.20 | d_m_disc_loss: 0.23 | data:
Epoch 2/50 |# | (31/500) | Total: 0:00:17 | ETA: 0:04:04 | loss: 16.4265 | loss_kp_2d: 433.47 | loss_kp_3d: 0.84 | e_m_disc_loss: 0.32 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.09 | da
Epoch 2/50 |## | (32/500) | Total: 0:00:17 | ETA: 0:03:58 | loss: 26.1461 | loss_kp_2d: 323.38 | loss_kp_3d: 1.10 | e_m_disc_loss: 0.49 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.01 | d_m_disc_loss: 0.05 | da
Epoch 2/50 |## | (33/500) | Total: 0:00:18 | ETA: 0:03:57 | loss: 41.6130 | loss_kp_2d: 530.82 | loss_kp_3d: 1.06 | e_m_disc_loss: 0.71 | d_m_disc_real: 0.06 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (34/500) | Total: 0:00:18 | ETA: 0:03:56 | loss: 52.5822 | loss_kp_2d: 409.64 | loss_kp_3d: 1.20 | e_m_disc_loss: 0.78 | d_m_disc_real: 0.14 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.20 | da
Epoch 2/50 |## | (35/500) | Total: 0:00:19 | ETA: 0:03:56 | loss: 64.3204 | loss_kp_2d: 457.43 | loss_kp_3d: 2.06 | e_m_disc_loss: 0.80 | d_m_disc_real: 0.13 | d_m_disc_fake: 0.07 | d_m_disc_loss: 0.19 | da
Epoch 2/50 |## | (36/500) | Total: 0:00:19 | ETA: 0:03:55 | loss: 70.2876 | loss_kp_2d: 273.15 | loss_kp_3d: 3.72 | e_m_disc_loss: 0.64 | d_m_disc_real: 0.08 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |## | (37/500) | Total: 0:00:20 | ETA: 0:03:55 | loss: 75.5481 | loss_kp_2d: 255.48 | loss_kp_3d: 7.11 | e_m_disc_loss: 0.92 | d_m_disc_real: 0.07 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (38/500) | Total: 0:00:20 | ETA: 0:03:54 | loss: 80.9875 | loss_kp_2d: 269.64 | loss_kp_3d: 10.41 | e_m_disc_loss: 0.77 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.02 | d_m_disc_loss: 0.06 | d
Epoch 2/50 |## | (39/500) | Total: 0:00:21 | ETA: 0:03:51 | loss: 83.9370 | loss_kp_2d: 185.57 | loss_kp_3d: 9.21 | e_m_disc_loss: 0.45 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.01 | d_m_disc_loss: 0.05 | da
Epoch 2/50 |## | (40/500) | Total: 0:00:21 | ETA: 0:03:50 | loss: 85.7856 | loss_kp_2d: 150.03 | loss_kp_3d: 7.09 | e_m_disc_loss: 0.20 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (41/500) | Total: 0:00:22 | ETA: 0:03:55 | loss: 90.0160 | loss_kp_2d: 251.98 | loss_kp_3d: 5.89 | e_m_disc_loss: 0.16 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.11 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |## | (42/500) | Total: 0:00:22 | ETA: 0:03:54 | loss: 93.1862 | loss_kp_2d: 216.82 | loss_kp_3d: 5.28 | e_m_disc_loss: 0.11 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.14 | d_m_disc_loss: 0.17 | da
Epoch 2/50 |## | (43/500) | Total: 0:00:23 | ETA: 0:03:54 | loss: 95.2027 | loss_kp_2d: 172.50 | loss_kp_3d: 6.58 | e_m_disc_loss: 0.16 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.12 | d_m_disc_loss: 0.15 | da
Epoch 2/50 |## | (44/500) | Total: 0:00:23 | ETA: 0:03:51 | loss: 96.1961 | loss_kp_2d: 130.74 | loss_kp_3d: 7.57 | e_m_disc_loss: 0.25 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (45/500) | Total: 0:00:24 | ETA: 0:03:51 | loss: 96.5522 | loss_kp_2d: 104.14 | loss_kp_3d: 7.58 | e_m_disc_loss: 0.32 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (46/500) | Total: 0:00:24 | ETA: 0:03:50 | loss: 98.0207 | loss_kp_2d: 156.55 | loss_kp_3d: 6.67 | e_m_disc_loss: 0.43 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.05 | d_m_disc_loss: 0.08 | da
Epoch 2/50 |### | (47/500) | Total: 0:00:25 | ETA: 0:03:50 | loss: 97.9087 | loss_kp_2d: 85.54 | loss_kp_3d: 6.79 | e_m_disc_loss: 0.37 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.09 | dat
Epoch 2/50 |### | (48/500) | Total: 0:00:25 | ETA: 0:03:49 | loss: 97.6625 | loss_kp_2d: 78.66 | loss_kp_3d: 6.91 | e_m_disc_loss: 0.47 | d_m_disc_real: 0.05 | d_m_disc_fake: 0.07 | d_m_disc_loss: 0.12 | dat
Epoch 2/50 |### | (49/500) | Total: 0:00:26 | ETA: 0:03:48 | loss: 98.7095 | loss_kp_2d: 142.36 | loss_kp_3d: 5.67 | e_m_disc_loss: 0.55 | d_m_disc_real: 0.08 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.12 | da
Epoch 2/50 |### | (50/500) | Total: 0:00:26 | ETA: 0:03:47 | loss: 98.3281 | loss_kp_2d: 72.09 | loss_kp_3d: 6.76 | e_m_disc_loss: 0.75 | d_m_disc_real: 0.11 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.14 | dat
Epoch 2/50 |### | (51/500) | Total: 0:00:27 | ETA: 0:03:47 | loss: 98.9621 | loss_kp_2d: 122.80 | loss_kp_3d: 6.97 | e_m_disc_loss: 0.59 | d_m_disc_real: 0.11 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |### | (52/500) | Total: 0:00:27 | ETA: 0:03:45 | loss: 98.5644 | loss_kp_2d: 71.65 | loss_kp_3d: 5.90 | e_m_disc_loss: 0.67 | d_m_disc_real: 0.12 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.16 | dat
Epoch 2/50 |### | (53/500) | Total: 0:00:28 | ETA: 0:03:44 | loss: 98.9029 | loss_kp_2d: 109.82 | loss_kp_3d: 5.81 | e_m_disc_loss: 0.65 | d_m_disc_real: 0.09 | d_m_disc_fake: 0.05 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |### | (54/500) | Total: 0:00:28 | ETA: 0:03:44 | loss: 98.7054 | loss_kp_2d: 81.66 | loss_kp_3d: 5.94 | e_m_disc_loss: 0.56 | d_m_disc_real: 0.07 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.13 | dat
Epoch 2/50 |### | (55/500) | Total: 0:00:29 | ETA: 0:03:43 | loss: 98.1078 | loss_kp_2d: 58.58 | loss_kp_3d: 6.82 | e_m_disc_loss: 0.48 | d_m_disc_real: 0.06 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.12 | dat
This is my cfg:
2023-01-05 19:37:06,989 GPU name -> NVIDIA GeForce RTX 3060
2023-01-05 19:37:06,990 GPU feat -> _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060', major=8, minor=6, total_memory=12287MB, multi_processor_count=28)
2023-01-05 19:37:06,990 {'CUDNN': CfgNode({'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True}),
'DATASET': CfgNode({'SEQLEN': 16, 'OVERLAP': 0.5}),
'DEBUG': False,
'DEBUG_FREQ': 5,
'DEVICE': 'cuda',
'EXP_NAME': 'vibe',
'LOGDIR': 'results/vibe_tests\05-01-2023_19-37-06_vibe',
'LOSS': {'D_MOTION_LOSS_W': 0.5,
'KP_2D_W': 300.0,
'KP_3D_W': 300.0,
'POSE_W': 60.0,
'SHAPE_W': 0.06},
'MODEL': {'TEMPORAL_TYPE': 'gru',
'TGRU': {'ADD_LINEAR': True,
'BIDIRECTIONAL': False,
'HIDDEN_SIZE': 1024,
'NUM_LAYERS': 2,
'RESIDUAL': True}},
'NUM_WORKERS': 0,
'OUTPUT_DIR': 'results/vibe_tests',
'SEED_VALUE': -1,
'TRAIN': {'BATCH_SIZE': 64,
'DATASETS_2D': ['Insta'],
'DATASETS_3D': ['MPII3D'],
'DATASET_EVAL': 'ThreeDPW',
'DATA_2D_RATIO': 0.6,
'END_EPOCH': 50,
'GEN_LR': 5e-05,
'GEN_MOMENTUM': 0.9,
'GEN_OPTIM': 'Adam',
'GEN_WD': 0.0,
'LR_PATIENCE': 5,
'MOT_DISCR': {'ATT': {'DROPOUT': 0.2,
'LAYERS': 3,
'SIZE': 1024},
'DIM': 1024,
'FEATURE_POOL': 'attention',
'HIDDEN_SIZE': 1024,
'LR': 0.0001,
'MOMENTUM': 0.9,
'NUM_LAYERS': 2,
'OPTIM': 'Adam',
'UPDATE_STEPS': 1,
'WD': 0.0001},
'NUM_ITERS_PER_EPOCH': 500,
'PRETRAINED': '',
'PRETRAINED_REGRESSOR': 'data/vibe_data/spin_model_checkpoint.pth.tar',
'RESUME': '',
'START_EPOCH': 0}}