tianzhi0549/FCOS

use my dataset to train FCOS net,why the loss is nan?

funny000 opened this issue · 2 comments

my dataset is remote dataset DIOR that come from Northwestern Polytechnical University, so when i use the dataset to train FCOS net, the loss is nan,

who can ask the reason?

the follow is error informatoin:

2022-06-17 07:54:53.057 | INFO | main🚋229 - cls loss : 5515.9775390625, reg loss : 7.476679801940918, ness loss : 0.7106261253356934, sum loss : 279.5187072753906
2022-06-17 07:54:54.175 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:54:55.242 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:54:56.419 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:54:57.572 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:54:58.608 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:54:59.634 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:00.939 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:02.146 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:03.483 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:04.700 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:05.925 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:06.957 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:08.256 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:09.364 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:10.461 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:11.527 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:12.584 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:13.665 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:14.936 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan
2022-06-17 07:55:16.000 | INFO | main🚋229 - cls loss : nan, reg loss : nan, ness loss : nan, sum loss : nan

i see #322 and try do the solution in my work.

use torch.nn.utils.clip_grad_norm_ function can solve the problem and add the a small learning rate