Question about train_pcl2pcl_gan_3D-EPN.py
Condor-G opened this issue · 4 comments
Hi, Chen~
Sorry to bother you. I have a problem when I run train_pcl2pcl_gan_3D-EPN.py.
train GAN: python train_pcl2pcl_gan_3D-EPN.py
After the training, I use Meshlab to see the result in pc2pc/run_3D-EPN/run_car/pcl2pcl/log_car_pcl2pcl_gan_3D-EPN_default_hausdorff/fake_cleans. However, the ply point clouds in file(reconstr_x) are all like wool ball. GAN didn't seem to work.
I don't know what the problem is. Maybe the dataset.
Here is what I did:
I use the dataset "shape_net_core_uniform_samples_2048" (from other projects) . And I use matlab to make incomplete point set.
And use pc2pc/data_processing to make pickle file.
python train_ae_ShapeNet-v1.py
python train_ae_3D-EPN.py
(the ply point clouds (ShapeNet-v1 and 3D-EPN) in reconstr are successful.)
But after I run "python train_pcl2pcl_gan_3D-EPN.py", the ply clouds in reconstr are a mess.
I also reduce the number of batch_size to run it. Does it affect the outcome?
If I don't clarify my question, tell me what I should show.
I'm a novice. I really hope to get your help.
Hi @Condor-G ,thanks for trying out the code.
Can you please show the training log from the tensorboard?
The way you prepare the data should not cause this GAN failure, I tried other types of data too.
Looks like the GAN is not yet started to work in the training.
Thank you very much for your reply!
Here is the gan's log in "log_train.txt":
{'kk': 1, 'd_fc_sizes': [256, 512], 'random_seed': None, 'batch_size': 12, 'latent_dim': 128, 'save_interval': 10, 'point_cloud_shape': [2048, 3], 'beta1': 0.5, 'd_activation_fn': <function leaky_relu at 0x7f9cf94ff840>, 'lambda': 1.0, '3D-EPN_train_point_cloud_dir': '/home/condor/gyf/GAN/pcl/pc2pc/data/3D-EPN_dataset/shapenet_dim32_sdf_pc/02958343/point_cloud', 'clean_ae_ckpt': '/home/condor/gyf/GAN/pcl/pc2pc/run_synthetic/run_car/ae/log_ae_car_ShapeNet-V1_c2c/ckpts/model_0.ckpt', '3D-EPN_test_point_cloud_dir': '/home/condor/gyf/GAN/pcl/pc2pc/data/3D-EPN_dataset/test-images_dim32_sdf_pc/02958343/point_cloud', 'g_bn': False, 'k': 1, 'epoch': 2001, 'noisy_ae_ckpt': '/home/condor/gyf/GAN/pcl/pc2pc/run_3D-EPN/run_car/ae/log_3DEPN_ae_car/ckpts/model_0.ckpt', 'recover_ckpt': None, 'output_interval': 1, 'eval_loss': 'hausdorff', 'g_activation_fn': <function relu at 0x7f9cf99bdf28>, 'loss': 'hausdorff', 'd_bn': False, 'lr': 0.0001, 'point_cloud_dir': '/home/condor/gyf/GAN/pcl/pc2pc/data/ShapeNet_v1_point_cloud/02958343/point_cloud_clean', 'g_fc_sizes': [128], 'exp_name': 'car_pcl2pcl_gan_3D-EPN_default'}
{'activation_fn': <function relu at 0x7f9cf99bdf28>, 'n_filters': [64, 128, 128, 256], 'fc_sizes': [256, 256], 'latent_code_dim': 128, 'point_cloud_shape': [2048, 3], 'encoder_bn': True, 'filter_size': 1, 'stride': 1, 'decoder_bn': False}
pid: 3235
Net layers:
G/fc_0/kernel:0
G/fc_0/bias:0
G/fc_output/kernel:0
G/fc_output/bias:
D/fc_0/kernel:0
D/fc_0/bias:0
D/fc_1/kernel:0
D/fc_1/bias:0
D/output/kernel:0
D/output/bias:
2020-03-10-17-45-05 training 0 snapshot:
G loss: 1.387982 = (g)0.954996, (r)0.432987
D loss: 0.264273 = (f)0.000979, (r)0.527566
Eval loss (hausdorff) on test set: 0.432021
Model saved in file: run_3D-EPN/run_car/pcl2pcl/log_car_pcl2pcl_gan_3D-EPN_default_hausdorff/ckpts/model_0.ckpt
2020-03-10-17-45-16 training 1 snapshot:
G loss: 1.232149 = (g)0.802055, (r)0.430094
D loss: 0.035887 = (f)0.011880, (r)0.059893
2020-03-10-17-45-25 training 2 snapshot:
G loss: 1.171390 = (g)0.744821, (r)0.426569
D loss: 0.011561 = (f)0.021100, (r)0.002023
......
2020-03-10-23-01-38 training 1999 snapshot:
G loss: 0.643860 = (g)0.259829, (r)0.384031
D loss: 0.249035 = (f)0.245988, (r)0.252083
2020-03-10-23-01-47 training 2000 snapshot:
G loss: 0.645519 = (g)0.258738, (r)0.386781
D loss: 0.244853 = (f)0.247335, (r)0.242371
Eval loss (hausdorff) on test set: 0.387690
Model saved in file: run_3D-EPN/run_car/pcl2pcl/log_car_pcl2pcl_gan_3D-EPN_default_hausdorff/ckpts/model_2000.ckpt
Is that what you what? I don't understand the meaning of "the training log from the tensorboard".
If not, do you mean the file called "events.out.tfevents...." under "summary"? Should I zip the file to you?
BTW, I found I made a mistake in processing the data. The number of 3D-EPN datasets was about 1000(total number is about 7000). I split the car dataset firstly. But the file called gen_point_cloud_split.py split again. So I try again with 7k datasets. Now I am waiting the running results. But it doesn't seems to work until now. Maybe that's not the mistake.
Thanks a ton for you help~
Yes, this log file should contains information for debuggin,
it would also be better to show the losses from the tensorboard.
From the log file you provided,
I guess the problem is that the two AEs are set to the wrong ones somehow in the config.py.
See "'clean_ae_ckpt': '/home/condor/gyf/GAN/pcl/pc2pc/run_synthetic/run_car/ae/log_ae_car_ShapeNet-V1_c2c/ckpts/model_0.ckpt'". The clean AE is set to model_0.ckpt, simply change it to model_2000.ckpt in config.py should work out. ;-)
Yes!!! You are right!
I modify the number of ckpt in config.py. And now evething is working like a charm!
I will close this issue.
Thank you! :)
Best wishes for you!