Model not converging.

Question

Model not converging.

yiyuzhuang opened this issue a year ago · 1 comments

Dear Hong,
Thank you for making your work open source. I am currently attempting to reproduce the results. I trained the model using the recommended command, but even after 1,000,000 iterations, the results remained unsatisfactory. Could you please suggest some possible reasons for this issue?
The results appear as follows:

My command is:
python -m torch.distributed.launch --nproc_per_node ${NUM_GPU} --master_port=${MASTER_PORT} train_deepfashion.py \ --batch 1 --chunk 1 --expname train_deepfashion_512x256_2 --dataset_path ./DeepFashion/ --depth 5 --width 128 --style_dim 128 --renderer_spatial_output_dim 512 256 --input_ch_views 3 --white_bg --r1 300 --voxhuman_name eva3d_deepfashion --random_flip --eikonal_lambda 0.5 --small_aug --iter 1000000 --adjust_gamma --gamma_lb 20 --min_surf_lambda 1.5 --deltasdf --gaussian_weighted_sampler --sampler_std 15 --N_samples 28

Answer 1 · 2023-07-04T03:03:06.000Z

Thanks! I discovered that the issue was caused by the PyTorch version. After downgrading to an earlier version (1.12->1.9), the problem was resolved.