Duplicated rows in the TAD boundary file given by fac boundaries
ziyin96 opened this issue · 4 comments
Hi,
I'm trying to call TAD boundaries using fanc insulation
followed by fanc boundaries
. The results looks fine but I found that several lines are duplicated in the output TAD boundary BED like this:
chr12 133430001 133440000 . 0.498882532119751 +
chr12 133430001 133440000 . 0.498882532119751 +
Details:
I used fan-c 0.9.25 and started with a published hic file downloaded from GSE116862.
I first calculated insulation score under 10-kb resolution, trying different window size.
fanc insulation data/hESC_D05_Rep1.hic@10kb \
tmp/hESC_D05_Rep1.insulation \
-w 100000 200000 500000 1000000 2000000
After visually checking the insulation scores with the contact frequency map, I decided to identify the TAD boundaries using window size as 500 kb.
fanc boundaries tmp/hESC_D05_Rep1.insulation \
results/hESC_D05_Rep1.TAD_boundaries.bed \
-w 500kb
The boundaries in the BED file fits with the contact frequency heatmap well, but 80 lines are duplicated as I shown on the above.
boundary_file=results/hESC_D05_Rep1.TAD_boundaries.bed
wc -l ${boundary_file} # 8901
uniq ${boundary_file} | wc -l # 8821
cut -f 1-3 ${boundary_file} | uniq | wc -l # 8821
hESC_D05_Rep1.TAD_boundaries.bed.zip
By the way, I also checked the corresponding 500 kb insulation score file and all the rows in this file are unique.
wc -l results/hESC_D05_Rep1.insulation_500kb.bed # 309581
uniq results/hESC_D05_Rep1.insulation_500kb.bed | wc -l # 309581
cut -f 1-3 results/hESC_D05_Rep1.insulation_500kb.bed | uniq | wc -l # 309581
I'm wondering how this happened. Did I use the fan-C in a correct way?
thanks,
Ziyin
Hi, thanks for reporting this! It looks like a bug.
A large portion of the boundary calling code has not been written by me and I am currently on holiday, so it will take me a while to reproduce and fix this, I'm afraid.
Hey, can you quickly confirm that it is this file you have been using? https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3262960
Okay, I think I have a fix. This seems to be related to "shallow" insulation signal, I think. But I'm pretty sure I found the piece of code that led to the duplication. can you try the fixed version here?
fanc-0.9.26 indeed resolved my issue! Thank you very much.