hasindu2008/squigulator

Support for DNA methylation signal

Opened this issue · 6 comments

Thank you for developing this amazing tool. I am wondering if there is a way to simulate reads with DNA methylation signals. If not, do you have any plan to include the function in the near future? Thanks again.

Hello,

This is something that is straightforward to implement given we already have methylation current level tables in f5c, yet, haven't found time to really implement it. Given that you asked now, I can get into implementing this sometime in the coming months.

Great! Looking forward to it! Thanks.

Hi, I am looking forward to it as well!

Implementing support for this new RNA004 kit to squigulator. Will do the methylation as soon as that is finished.

@HLHsieh and @xies4
Thanks for your patience. A very preliminary methylation simulation feature has been now implemented under the "meth" branch at https://github.com/hasindu2008/squigulator/tree/meth. Note that I have yet tested only using a tiny corona-virus genome and there could be bugs/issues. I am planning to do some testing in the next few weeks, however, if you are interested you can give it a try.

You need to create a 3-column tsv file to have the chr, 0-based position and methylation frequency. For example:

MN908947.3	99	1.0
MN908947.3	1184	0.0
MN908947.3	1456	0.5
MN908947.3	20031 	0.7

Note that they must be tab separated. This example file is in test/methfreq.tsv

If you have a f5c/nanopolish methylation frequency TSV, you can use the following command to generate the necessary file for squigulator:

tail -n+2 methfreq.tsv | cut -f 1,2,7 > sqmeth.tsv

Then you can provide this file to squigulator like below:

./squigulator -x dna-r9-min -n 4000 -o a.blow5 test/nCoV-2019.reference.fasta --meth-freq test/methfreq.tsv

I tested this generated BLOW5 by basecalling (using buttery-eel) and running f5c:

eel -x cuda:all -i a.blow5 -o a.fastq --config dna_r9.4.1_450bps_sup.cfg
minimap2 -ax map-ont test/nCoV-2019.reference.fasta a.fastq  --secondary=no | samtools sort - -o a.bam && samtools index a.bam
f5c index --slow5 a.blow5 a.fastq
f5c call-methylation --slow5 a.blow5 -r a.fastq -b a.bam -g test/nCoV-2019.reference.fasta -o a.tsv
f5c meth-freq -i a.tsv -s -o afreq.tsv

The frequencies in afreq.tsv quite matched the ones in test/methfreq.tsv.

Thank you @hasindu2008