Can't reproduce the results of your paper

Question

Can't reproduce the results of your paper

ziwei-cui opened this issue 9 months ago · 4 comments

I attempted to reproduce the results of your paper using an NVIDIA RTX 3090, and I followed the configs exactly as described in your paper. The paper reported results of 0.6464 and 0.5 for bPQ and mPQ, respectively, on the PanNuke dataset using the CellViT256 model. However, when I reproduced the experiment, I obtained results of 0.6157 for bPQ and 0.4805 for mPQ. I'm wondering what factors you think might have influenced this discrepancy？

Answer 1 · 2023-12-19T08:20:09.000Z

Could you be more specific about the configuration?

For the CellViT256 Model, we reported the following results: 0.4846 mPQ, 0.6696 bPQ. Have you used the provided config files? Have you used the right split (provided here: https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke)?

One reason could be non-deterministic training behaviour and seeding of each worker. We tried to get the code as much reproducible as possible. However, some implementations are non-deterministic and therefore TORCH.USE_DETERMINISTIC_ALGORITHMScould not be applied.

For transparency, we published all training logs and and config files. You can find the respective training logs here:
logs_paper/PanNuke/CellViTHV/ViT256/Best-Setting/Fold-1/inference_results.json

Answer 2 · 2023-12-19T09:17:45.000Z

logs.log
This is my training log

您能更具体地介绍一下配置吗？

对于 CellViT256 模型，我们报告了以下结果：0.4846 mPQ, 0.6696 bPQ。您使用过提供的配置文件吗？您是否使用了正确的分割方式（此处提供：https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke）？< /span>

原因之一可能是每个工作人员的不确定性培训行为和播种。我们试图让代码尽可能具有可重复性。但是，某些实现是不确定的，因此TORCH.USE_DETERMINISTIC_ALGORITHMS无法应用。

为了透明起见，我们发布了所有训练日志和配置文件。您可以在此处找到相应的训练日志： logs_paper/PanNuke/CellViTHV/ViT256/Best-Setting/Fold-1/inference_results.json

logs.log
I have checked my configuration based on your suggestions, and I trained the model following your configuration file exactly. This is my training log, and it looks very similar to yours. However, what troubles me is that I could only find this table2 regarding the description of '0.4846 mPQ, 0.6696 bPQ' in your paper.

Answer 3 · 2023-12-19T09:44:34.000Z

We published a resvised version, as we had some minor inconsistencies regarding the official splitting and metrics in the first version. Please take a look here: https://arxiv.org/abs/2306.15350

To get the results from the paper, you need to run the inference on the test dataset, then average them for each fold.

Answer 4 · 2023-12-19T11:13:30.000Z

我们发布了修订版，因为我们在第一个版本中的官方划分和指标方面存在一些细微的不一致。请看这里：https://arxiv.org/abs/2306.15350
要从论文中获得结果，您需要在测试数据集上运行推理，然后对每次折叠进行平均。

OK.I got it!Thanks!