problems about the test results of 207.ckpt on luna16 subset 9

Question

problems about the test results of 207.ckpt on luna16 subset 9

Senyh opened this issue 3 years ago · 7 comments

Dear Doc. Li,

I tested the 207.ckpt you provided on the luna16 subset9, but got poor CPM scores. The results as follows,

CAD Analysis: predanno0.3

Candidate detection results:
True positives: 98
False positives: 1316
False negatives: 7
True negatives: 0
Total number of candidates: 1702
Total number of nodules: 105
Ignored candidates on excluded nodules: 271
Ignored candidates which were double detections on a nodule: 17
Sensitivity: 0.933333333
Average number of candidates per scan: 19.340909091

So strange! The sensitivity is about 93.3 and the average number of candidates are also only 19 but the CPM is very poor (0.18???).
I also re-trained the methods on luna16 subset0-8 and tested on subset9. However, the evaluation results are also odd!

CAD Analysis: predanno0.3

Candidate detection results:
True positives: 100
False positives: 1792
False negatives: 5
True negatives: 0
Total number of candidates: 2342
Total number of nodules: 105
Ignored candidates on excluded nodules: 415
Ignored candidates which were double detections on a nodule: 35
Sensitivity: 0.952380952
Average number of candidates per scan: 26.613636364

I wonder if you could give me some advice. I cannot find your email on the paper. I wonder if you could provided your email for me in your convenience ?

Thank you in advance!

xxszqyy@gmail.com

Answer 1 · 2021-12-06T17:20:59.000Z

I think something wrong with your nodule evaluation code.

As you can see from your reported results, the true positive is 98, number of nodules is 105, so the sensitivity shouldn't be that low.

If you are using the script provided by luna website, you would make sure the code runs smoothly on their provided example dataset. They used to provide the examplar data for testing the evaluation script three years ago.

Answer 2 · 2021-12-07T07:26:54.000Z

Thank you.
I used the official evaluation code. Besides, I obtained a normal CPM when I evaluated the CSV detection file provided by the Deeplung (WACV2018 Zhu et al.).
The sensitivity is normal (about 93+) as shown above, but the CPM scores is low. Specifically, the sensitivity is very low in the 0.125, 0.25, and 1 false positive rate.

Answer 3 · 2021-12-07T15:41:02.000Z

Have you tested the results on my provided csv?

One possible reason could be the indexing mismatch between the predicted script and the evaluation script, I am not sure whether you run the evaluation using my provided evaluation script, the only difference is: for official script, if the index is 001, it will regard it as 1, however for ground truth it will read the index as 001. In my script I corrected this issue.

Answer 4 · 2021-12-08T04:00:41.000Z

I tried to tested the csv you provided.
But the file predanno0.3 do not contains the results on luna16 1186 nodules but only on a part of test set and I can not find the annotation and series uids csv file on this set.

Answer 5 · 2023-01-16T02:28:05.000Z

I also ran a test with the provided 177.ckpt, the result came out similar.
I trained data 150 epochs, the result came out similar.
I think this code is wrong.

Answer 6 · 2023-01-16T02:29:14.000Z

In this code, false positives cannot be distinguished.

Answer 7 · 2023-02-17T23:27:10.000Z

Hi all, I updated the evaluation code if you want to check it out. The major issue is label mismatch between the predicted and ground truth. For example id 56 in the predicted file is '56' but in gt file it's '056', from which the script didn't pick up the predicted '56' as a matched nodule to case '056'. This issue just gets fixed.