Methylpy in plant
zzh4399 opened this issue · 7 comments
Dear yupenghe,
I would like to know whether methylpy can be used for methylation analysis of plant genome because the results I get using methylpy are quite different from those of Bismark.
Thanks.
Yes, methylpy works on plant genome. Do you mind to describe the difference you referred to?
I am very glad to receive your reply. Methylpy calculates a lower rate of methylation for the three types than Bismark (about half as much), and I tried to modify the comparison parameters, but it didn't seem to work.
That is interesting. It would be helpful to have some cases (e.g. methylated and unmethylated counts of a few Cs from methylpy and bismark). Also, is the library typical directional bisulfite sequencing library? is it pbat?
It is really interesting. We found that methylation rates of CpG, CHG and CHH types calculated with Bismark are 40%, 20% and 2% respectively, while those calculated with methylpy are 20%, 10% and 0.5% respectively. We randomly found a single site and found that the methylation rate and the number of reads covering the site were different between the two software.
Methylpy seems to be more rigorous in determining the methylation of individual sites, which may be the reason for its lower methylation rate. For example, we found that methylpy calculated methylation rate of 0.8 and bismark calculated methylation rate of 1 for the same site.
That is interesting. What are the parameters you used to run bismark and methylpy? I did some comparison a while back and the results from these two methods are very close. I am wondering if any specific setting is used.
Methylpy has a significant advantage in running speed and is easy to understand. In addition to comparing our own sequencing files, we also used methylpy to analyze the documented data (all from plants: Oryza sativa). We found that when methylpy was used in plant genomes, the methylation rate was significantly reduced, about half. These are the parameters of the two software we use:
methylpy paired-end-pipeline --read1-files M1-D_FDLM220001805-1a_1.clean.fq.gz --read2-files M1-D_FDLM220001805-1a_2.clean.fq.gz --forward-ref ~/bq/methy/db/rice_f --reverse-ref ~/bq/methy/db/rice_r --ref-fasta ~/bq/methy/db/IRGSP-1.0_genome.fasta --path-to-output zzh --num-procs 40 --sample M1_D
bismark --bowtie2 -N 0 -L 20 --quiet --un --ambiguous --sam -o output ~/bq/methy/db/ -1 M1-D_FDLM220001805-1a_1.clean.fq.gz -2 M1-D_FDLM220001805-1a_2.clean.fq.gz #sequences alignment
deduplicate_bismark M1-D_FDLM220001805-1a_1.clean_bismark_bt2_pe.bam # Dropping deduplication
bismark_methylation_extractor --no_overlap --paired-end --bedGraph --comprehensive --counts --remove_spaces --cytosine_report --genome_folder ~/bq/methy/db/ --buffer_size 10G --CX ../output/M1-D_FDLM220001805-1a_1.clean_bismark_bt2_pe.deduplicated.bam # Calling methylation
Nothing looks outstanding. If you manually check a few CG/CHH/CHG sites, what are the counts of reads and methylated reads you got from methylpy and bismark? For me to understand this, I would need some example data to reproduce the difference you found.