seqcode/miniMDS

QC of HiC data

Closed this issue · 2 comments

Hi @Lila14,

I have a general question about the 3d model generating. Is there any prior QC parameter(s) that can be used to decide if a certain dataset is suitable for miniMDS?

I guess one of them is the decay distance plot you showed me. For example, the dateset GSE104814_WT_N3_40000 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE104814) seems ok based on the heatmap and the detected TADs:
chr1_gse104814_tads
But miniMDS didn't get a good result for it (-m 0.01 -p 0.01).
chr1_gse104814_minimds
However the decay plot looks ok (except for the highest bins)
chr1_gse104814_decay

Should I decrease the resolution for this dataset?

Thanks for your help.
Bests,
Attila

It's always worth trying to decrease the resolution. 40kb is very high resolution for structural inference. Looking at this dataset, I was able to get ok results for 160kb. (I used -m 0.01 -p 0.01 for the larger chromosomes.) Decreasing the resolution more could further improve the results.

In general: high-sparsity datasets are difficult to do structural inference on. A quick heuristic is to check the number of lines in the BED file (which shouldn't include zeros). The chr1 40kb file from this dataset is 2.8 million lines. I like to look at "gold standard" data for comparison. Rao GM12878 chr3 (similar genomic size) 50kb is 6.9 million lines.

Thanks, @Lila14! That's exactly what I wanted to know.