train results
tang-y-q opened this issue · 13 comments
D:\anaconda\anaconda\envs\score\python.exe D:/code/score-denoise-main/score-denoise-main/test.py
[2022-07-16 20:04:07,207::test::INFO] [ARGS::ckpt] './pretrained/ckpt.pt'
[2022-07-16 20:04:07,207::test::INFO] [ARGS::input_root] './data/examples'
[2022-07-16 20:04:07,208::test::INFO] [ARGS::output_root] './data/results'
[2022-07-16 20:04:07,208::test::INFO] [ARGS::dataset_root] './data'
[2022-07-16 20:04:07,208::test::INFO] [ARGS::dataset] 'PCNet'
[2022-07-16 20:04:07,208::test::INFO] [ARGS::tag] ''
[2022-07-16 20:04:07,208::test::INFO] [ARGS::resolution] '10000_poisson'
[2022-07-16 20:04:07,208::test::INFO] [ARGS::noise] '0.01'
[2022-07-16 20:04:07,208::test::INFO] [ARGS::device] 'cuda'
[2022-07-16 20:04:07,208::test::INFO] [ARGS::seed] 2020
[2022-07-16 20:04:07,208::test::INFO] [ARGS::ld_step_size] None
[2022-07-16 20:04:07,208::test::INFO] [ARGS::ld_step_decay] 0.95
[2022-07-16 20:04:07,209::test::INFO] [ARGS::ld_num_steps] 30
[2022-07-16 20:04:07,209::test::INFO] [ARGS::seed_k] 3
[2022-07-16 20:04:07,209::test::INFO] [ARGS::niters] 1
[2022-07-16 20:04:07,209::test::INFO] [ARGS::denoise_knn] 4
[2022-07-16 20:04:10,298::test::INFO] ld_step_size = 0.20000000
[2022-07-16 20:04:10,345::test::INFO] boxunion
[2022-07-16 20:04:18,952::test::INFO] box_push
[2022-07-16 20:04:24,497::test::INFO] column_head
[2022-07-16 20:04:30,082::test::INFO] cylinder
[2022-07-16 20:04:35,609::test::INFO] dragon
[2022-07-16 20:04:41,134::test::INFO] galera
[2022-07-16 20:04:46,660::test::INFO] happy
[2022-07-16 20:04:52,189::test::INFO] icosahedron
[2022-07-16 20:04:57,717::test::INFO] netsuke
[2022-07-16 20:05:03,248::test::INFO] star_smooth
Loading: 100%|██████████| 10/10 [00:00<00:00, 33.74it/s]
Loading: 100%|██████████| 10/10 [00:00<00:00, 34.59it/s]
Loading: 100%|██████████| 10/10 [00:06<00:00, 1.48it/s]
Evaluate: 100%|██████████| 10/10 [00:11<00:00, 1.18s/it]
[2022-07-16 20:05:27,919::test::INFO]
cd_sph p2f
boxunion 0.000311 0.000019
box_push 0.000357 0.000229
column_head 0.000530 0.000344
cylinder 0.000321 0.000067
dragon 0.000317 0.000208
galera 0.000383 0.000300
happy 0.000302 0.000217
icosahedron 0.000331 0.000065
netsuke 0.000242 0.000148
star_smooth 0.000276 0.000110
[2022-07-16 20:05:27,919::test::INFO]
Mean
cd_sph 0.000336946515
p2f 0.000170847624
Process finished with exit code 0
I used the hyperparameter you saved, but the P2M accuracy is abnormal. P2M in the output are quite different from those given in your paper.May I ask why this is?
What’s the version of your PyTorch3D?
When I used the P2M function provided by pytorch3d, it was still under development. The P2M today has changed a lot and is very different from the one I used, so the values are different.
I tried different versions of pytorch3d, and I didn't get the desired results.The result on the PU dataset is correct. But when i test on the PC dataset ,the result P2M is far from with the desired results.
Hi!
This is also weird to me. I have dug deeper and have some findings. The conclusion is: there seems to be a bias caused by PCNet meshes at the time I did evaluation (perhaps due to difference in implementation or hardware, I am still investigating). However, the bias appears for ALL the methods so it does not affect the result.
As you have noticed, the P2M scores on PC-dataset output by the program are approximate 2x of the numbers reported in the paper. To see why, I draw a scatter-plot and do a linear regression between CD and P2M according to the numbers in Table 1 of the paper (the aggregated data is attached as paper_results.csv
). I distinguish the points by colors and shapes (blue stands for PU-testset and orange stands for PC-testset):
From the above figure, we can see that given a fixed CD score, the P2M score of PU is constantly ~2 times of the P2M score of PC. This applies for all the methods. I conclude that there is some strange bias caused by PCNet meshes at the time I did the evaluation reported in the paper. The bias led to lower P2M scores.
For some reasons, the bias disappears today and the P2M scores of PC calculated go back to the normal level, resulting in a higher value (this is what you see from your own evaluation, 2x higher than the numbers reported in the paper).
Though the numbers you get recently are higher, I think it does not affect the results of the paper. As CD has strong linear correlation to P2M. The reproducible CD scores are sufficient as a metric.
I am still investigating the reason of the bias. As I have left the institute where I did the project, it might take some time for me to find the raw experiment results. I will get back to you when there is some update.
Attachment
thank you very much for your reply. now I have a rough idea of the reason for this result.
在PC数据集上面测试时,你可以在使用pytorch-0.4.0版本,这样的话能够得出一个相近的值,更加的符合原文
在PC数据集上面测试时,你可以在使用pytorch-0.4.0版本,这样的话能够得出一个相近的值,更加的符合原文
我猜有可能和GPU有关系。组里其他人最近在2080ti上测试结果是吻合的,但是我在新型号的GPU上测试结果就有系统性的偏差。
Hi @luost26,
Great work on troubleshooting this issue. It really is valuable and I think using PyTorch3D's implementation of P2M is quite useful for good/fair comparisons. I also spent some time on this issue and I thought I would share my insights. The P2M results are biased because of two reasons:
-
If the GPU used to create the pre-built binary is different from the GPU used to run inference, then unfortunately we get different results. So, I think if someone uses a RTX 3090, they need to rebuild PyTorch3D (whichever version, I am using 0.7.x) using that GPU.
-
From PyTorch3D 0.6.2, pytorch3d.loss.point_mesh_face_distance() admits a new parameter min_triangle_area. By default, this is set to 0.005. To reproduce the results of ScoreDenoise, we need to set min_triangle_area=0 for any PyTorch3D version after >=0.6.2. This is assuming this version of PyTorch3D was compiled using the same GPU that is being used to run the inference.
These two steps can help reproduce results from the original paper. For PCNet, it would be best to either release the original Poisson disk sampled ground truth test point clouds or create a new test set after sampling the ground truth point clouds from the meshes. The reason I say this is the CD metric for PCNet is dependent on the GT point cloud which is not part of the data in the .zip file.
I hope this helps and thanks again Shitong for the great work!
Best,
D.
Hi @ddsediri
I really appreciate your in-depth and insightful analysis. I am impressed by your great effort in locating the root of the issue. It does help much and I learn a lot from it! Thanks so much!
By the way, for others who need the PCNet meshes, they can be obtained here: mrakotosaon/pointcleannet#8
Best,
Shitong
Hi @luost26,
No worries at all, I'm glad I could help. Thankyou for keeping this repo up-to-date, it is an excellent resource!
I have a question I hope you could answer: do you have the original PCNet 10K and 50K Poisson disk sampled point clouds? When I resample the PCNet meshes (using both Open3D and Point Cloud Utils), and add 1%, 2% and 3% noise, the CD and P2M results are a bit different to the paper results, especially at 10K resolution. I think this is due to either a small inconsistency with the sampling settings or the noise scale.
If I use the original PCNet meshes with the noisy point clouds you provided, I can get consistent P2M results (as the PCNet meshes have not changed) but CD results differ because the ground truth point clouds are different (because I have to sample the meshes again and this is not deterministic).
Thanks so much!
Dasith.
Hi @ddsediri
Original noise-free PCNet point clouds are here: https://drive.google.com/file/d/1RCmwC401IZWgXsUE_DiMG7_HjaI-mGHQ/view?usp=drive_link