M3DV/FracNet

Any tricks to reproduce the performance?

mssjtxwd opened this issue · 4 comments

Hello, I am trying to reproduce your work with ribFrac data. From your README, I put the data in the specified directory and ran train.py directly for training,then I ran predict.py + eval.py to get the performance. But We can only achieve about 50 Recall @ 8FA on the validation set, which is significant different from the performance figure claimed by the 3rd place in the competition (the performance form on PPT shows they archived 90Recall @ 8FA on val set). Of course, the performance can be impacted by many factors, so I would like to ask if you can tell me what performance your method can achieve based on the ribFrac training set (can be the result of post-processing and TTA)

Hi @mssjtxwd ,

This project aims at a prototype to be a baseline method for RibFrac Challenge; However, as we are the data provider for the challenge, we would like to avoid unintended data leakage. Therefore, we did not provide full details for the models, including those in both training and inference stages.

Basically, the FracNet in our EBioMedicine is a one-stage model without false positive reduction stage. The performance in the main text is trained on RibFrac training set + 300 in-house cases, but we also provide the performance trained with public training set in the supplementary materials (pls refer to the paper). U may find it still works very well, which could be a top-ranking solution.

As for the reproducible issue, I guess 2 possible options:

  1. In the training stage, the setting in the main.py is cleaned for open-source purpose, but it is not the full training procedure in our actual implementation. We have trained the model with multi-stage training phases, i.e., training with a group of hyper-parameters and then finetune with another group. U should take care of batch size, data sampling strategy, learning rate, etc.
  2. In the inference stage, we use extra post-processing steps. In the public release version, the post-processing only includes removing the spine regions; however, in our actual implementation, we also remove the false positive far from the cage of lung parenchyma. It is not difficult to implement, but it involves our in-house lung analysis code. We are considering whether to separate these code.

Sorry to be disapointing you, but if you want to reproduce the performance in the paper, u have to tune your model a little. However, it is gauranteed that the performance could be reproduced with this one-stage model using 3D UNet as backbone.

Good luck!
Jiancheng

Hi @mssjtxwd

As per @duducheng, the training configuration in our actual implementation is different from the one in the open-sourced code. There are a few details you may try:

  • Larger batch size. This should be super important since we use a larger batch size with multiple GPUs in our training. In this repo we set the batch size to 4 to fit in one single 11G GPU;
  • Finetuning. We use a multi-stage training strategy to take the performance up a notch;
  • Better patch assembling strategy. This repo uses a simple average when dealing with overlaps across patches. Keeps only the center regions should give a better performance.
JXQI commented

Hello, I am trying to reproduce your work with ribFrac data. From your README, I put the data in the specified directory and ran train.py directly for training,then I ran predict.py + eval.py to get the performance. But We can only achieve about 50 Recall @ 8FA on the validation set, which is significant different from the performance figure claimed by the 3rd place in the competition (the performance form on PPT shows they archived 90Recall @ 8FA on val set). Of course, the performance can be impacted by many factors, so I would like to ask if you can tell me what performance your method can achieve based on the ribFrac training set (can be the result of post-processing and TTA)

my result is the same as yours ,do you reslove it?

Hi @mssjtxwd

As per @duducheng, the training configuration in our actual implementation is different from the one in the open-sourced code. There are a few details you may try:

  • Larger batch size. This should be super important since we use a larger batch size with multiple GPUs in our training. In this repo we set the batch size to 4 to fit in one single 11G GPU;
  • Finetuning. We use a multi-stage training strategy to take the performance up a notch;
  • Better patch assembling strategy. This repo uses a simple average when dealing with overlaps across patches. Keeps only the center regions should give a better performance.

Excuse me,
How to use multi stage training strategy?