vaquerizaslab/chess

Should the users be concerned about the problem raised in the new Contradictory Results bioRxiv preprint?

Opened this issue · 2 comments

Hi CHESS,

First of all, thanks so much for letting the users updated about this beautiful software! Like many, I was also recently trying to use CHESS to analyze Hi-C data that our lab has generated using mutagenized zebrafish embryo, but today I came across with this bioRxiv preprint which seems to raise a major concern about this software (Lee, H., Blumberg, B., Lawrence, M. S., and Shoida, T. "Revisiting the Use of Structural Similarity Index in Hi-C" bioRxiv (2021).).

"...here we show that the primary outputs of CHESS–namely, the structural similarity index (SSIM) profiles–are nearly identical regardless of the input matrices, even when query and reference reads were shuffled to destroy any significant differences. This issue stems from the dominance of the regional counting noise arising from stochastic sampling in chromatin-contact maps, reflecting a fundamentally incorrect assumption of the CHESS algorithm. Therefore, biological interpretation of SSIM profiles generated by CHESS requires considerable caution."

I am not a bioinformatician and therefore do not fully understand the technical details presented in their preprint...

Should the users be concerned about this problem? It seems like #34 and #48 are quite related to the concerns raised by the authors of the preprint, but my impression was that the authors were arguing that ssim is unable to measure similarities between Hi-C matrices from the same genomic locus and is worsening the differential contact analysis that is actually done instead by the signal-to-noise ratio.

Is there any method that users can use this software without confronting the concern raised by H. Lee et al.? Or do we might have to wait for further major updates on either the software or the manuscript?

Thanks in advance,

Hi, and thanks for the question. We are aware of the preprint and are drafting a detailed response to it. With appropriate parameters and thresholds, the CHESS output can be used to identify real differences between Hi-C datasets. We're also in the process of drafting more in-depth guidance on how to choose these parameters and thresholds, which we'll add to the FAQ when it's ready and let you know!

Hello,
Our preprint is available online now: https://www.biorxiv.org/content/10.1101/2021.10.18.464422v1 We've also updated the CHESS docs and FAQ with some additional guidance on best practices for using CHESS. I hope that clears up any concerns. We'd be happy to answer any follow-up questions you have!