About NAN during the training stage...
JulioCeFer opened this issue · 5 comments
Dear Zhysora,
As you can see below, the variable discrim_loss is assuming "nan" at epoch 50:
progress epoch 50 step 602 image/sec 35.7 remaining 0m
discrim_loss nan
gen_loss_GAN 1.16629475e-36
gen_loss_L1 380.16928
In fact, the variable takes "nan" at step 400 (approximately) of epoch 1.
Here my parameters:
batch_size = 8
beta1 = 0.5
blk = 64
checkpoint = None
display_freq = 0
gan_weight = 1.0
gpus = 0
l1_weight = 100.0
lr = 0.0001
max_epochs = 50
max_steps = None
mode = train
ndf = 32
output_dir = /content/drive/MyDrive/Research/Psgan/Datasets/PSOutput/QB_train_64_psgan
progress_freq = 200
save_freq = 1000
summary_freq = 0
test_count = 81
test_tfrecord = /content/my_data/QB_test_64.tfrecords
trace_freq = 0
train_count = 4821
train_tfrecord = /content/my_data/QB_train_64.tfrecords
Would you be able to guide me with this issue, please?
Best Regards,
Dear Zhysora,
As you can see below, the variable discrim_loss is assuming "nan" at epoch 50:
progress epoch 50 step 602 image/sec 35.7 remaining 0m
discrim_loss nan
gen_loss_GAN 1.16629475e-36
gen_loss_L1 380.16928In fact, the variable takes "nan" at step 400 (approximately) of epoch 1.
Here my parameters:
batch_size = 8
beta1 = 0.5
blk = 64
checkpoint = None
display_freq = 0
gan_weight = 1.0
gpus = 0
l1_weight = 100.0
lr = 0.0001
max_epochs = 50
max_steps = None
mode = train
ndf = 32
output_dir = /content/drive/MyDrive/Research/Psgan/Datasets/PSOutput/QB_train_64_psgan
progress_freq = 200
save_freq = 1000
summary_freq = 0
test_count = 81
test_tfrecord = /content/my_data/QB_test_64.tfrecords
trace_freq = 0
train_count = 4821
train_tfrecord = /content/my_data/QB_train_64.tfrecordsWould you be able to guide me with this issue, please?
Best Regards,
It seems like a common problem when training GANs. If there are no problems with the code, maybe try another time and use a different random seed. Hope it will help you.
Thank you very much.
With the above parameters (dataset=QB and blk=64), how do I set the "col" and "row" parameters in the eval.py function please?
According to the code, data/gen_dataset.py in line 137 and line 193, these two parameters can be found in XXX/record.txt.
Thank you very much.