all context methylation visualization
PaulaRomeroLozano opened this issue · 21 comments
Hello,
I am using megalodon to identify methylations in the whole genome, not only in the CpG islands.
I was wondering if it is possible to use Methplotlib to visualise also those regions outside the CpG context with the megalodon outputs.
Do you think it is compatible?
Thank you,
Paula
Hi Paula,
Yes, that shouldn't be a problem. methplotlib doesn't really care about the type of modification it shows. Which file format did you get from megalodon?
It is possible that I need to add an additional file format as input type, but that shouldn't be too much trouble.
Cheers,
Wouter
Hi Wouter,
Thank you very much for the quick response.
That would be incredibly useful!
I am still getting familiar with all the megalodon outputs and which ones are required and "compatible" for downstream analysis.
Please find below the file formats I got when running megalodon:
- mod_mapping.bam: I was thinking in sort it and convert it to a bedgraph using bedtools?
2.modified_bases.5mc.bed: (0-based)
3.per_read_modified_bases: as a .txt and .bd file (natural log)
I was thinking in trying to make those files "nanopolish like" so I could use your scripts, but if you could add an extra input type that would be very helpful.
Thank you so much in advance,
Best,
Paula
Please keep me posted how things go for any of these formats. It may not be necessary to make them too similar to nanopolish files.
-
methplotlib can take cram files with nucleotide modifications in the Mm tag, so your bam file should not need too many changes. Converting to cram should be enough. That said, I could also add an option to use bam directly.
-
That looks like a bedmethyl format. You can quite easily make a bedgraph out of that, which is currently one of the compatible input formats. I could in the future also add bedmethyl input directly.
-
Could you share a larger example of that format? I'll add it as a compatible input too. You can find my email address in my profile.
I'm afraid I won't have time to add the code for these formats this week.
Thank you very much for your help and feedback.
-
I will go with this option first then.
-
I will send you a small subset of the data by email.
Thank you again and I will keep you posted for sure :)
I found an R script in the METEORE repo that takes the megalodon .bed file and computes methylation frequencies like the nanopolish calculate_methylation_frequency.py
script does. In its original form the output doesn't work with methplotlib yet, but it shouldn't take much to get the output close enough to nanopolish format. I don't know enough R though to do it.
I can help on some of the megalodon options here. Megalodon has a lot of outputs in order to facilitate porting into tools such as methplotlib. Megalodon is designed with two main modified base output types 1) per-read and 2) aggregated.
Per-read outputs include the mod_mappings.bam
above. Megalodon can also output cram directly (via --mappings-format cram
) and sort the mappings (via --sort-mappings
). The mod_mappings output takes the per-read modified base calls and anchors them to the reference sequence (so these reads will artificially contain exactly the reference sequence for each read). mod_basecalls
are anchored only to the basecalls and not mapped, so not directly applicable to methplotlib (I think).
The per-read modified base text format is meant as a dump of the per-read scores database mostly for debugging purposes. I would not recommend this as a standard format that should used as input and is subject to change without notice.
Aggregated formats returns results aggregated across reads at each applicable reference position. For the aggregated formats, Megalodon provides 3 options 1) bedmethyl 2) modvcf and 3) wiggle (mostly like a bedgraph but more compact). The modVCF is not really a standard format, but there is no standard format (that I'm aware of) to store multiple modifications in a single file. VCF was the closest, but required adding some fields for strandedness.
@JWDebler that METEORE script does not compute the methylation frequencies, but instead just converts from the format from bedmethyl to a consistent text format for comparison to other algorithms within the METEORE framework. I believe this could essentially be accomplished with a simple awk script to move columns around.
I'm happy to help mediate the porting of megalodon outputs into methplotlib. Let me know if any adjustments or recommendations would be helpful.
Hello Marcus,
Thank you so much for your help.
It would be super useful to be able to port megalodon outputs into methplotlib! :)
Since I got mod_mapping.bam as output, could it be possible to simply convert that bam to cram?
I tried:
samtools sort mod_mappings.bam -o mod_mappings.sorted.bam
samtools view -C -T genome.fa mod_mappings.bam > mod_mappings.cram
samtools index -c mod_mappings.sorted.cram
I did so, and then I tried to run methplotlib but I got an error:
methplotlib -m /mnt/fs01/RUNS/minion_run/DNA_meth/megalodon_results/subset_1_5G_norm/mod_mappings.cram -n calls -w chr17:17,722,291-17,725,186 -g /media/data/reference/Homo_sapiens.GRCh38.90.chrplus.gtf --simplify
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/a8499ca51d6fb77332c2d242923994eb": Protocol not supported
Traceback (most recent call last):
File "/home/paula/.local/bin/methplotlib", line 8, in
sys.exit(main())
File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 26, in main
meth_browser(meth_data=meth_data,
File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 52, in meth_browser
meth_traces = plots.methylation(meth_data, dotsize=dotsize, binary=binary)
File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/plots.py", line 120, in methylation
make_per_read_meth_traces_phred(table=meth.table,
File "/home/paula/.local/lib/python3.8/site-packages/methplotlib/plots.py", line 148, in make_per_read_meth_traces_phred
table = table.join(df_heights, on="read_name")
File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 7874, in join
return self._join_compat(
File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 7890, in _join_compat
return merge(
File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 74, in merge
op = _MergeOperation(
File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 656, in init
self._maybe_coerce_merge_keys()
File "/home/paula/.local/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1165, in _maybe_coerce_merge_keys
raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
The methplotlib logs reported:
2021-04-08 14:51:00,334 methplotlib 0.17.0 started.
2021-04-08 14:51:00,334 Python version is: 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0]
2021-04-08 14:51:00,334 Arguments are: Namespace(bed=None, binary=False, dotsize=4, example=False, fasta=None, gtf='/media/data/reference/Homo_sapiens.GRCh38.90.chrplus.gtf', methylation=['/mnt/fs01/RUNS/minion_run/DNA_meth/megalodon_results/subset_1_5G_norm/mod_mappings.cram'], names=['calls'], outfile=None, qcfile=None, simplify=True, smooth=5, split=False, static=None, store=False, window='chr17:17,722,291-17,725,186')
2021-04-08 14:51:00,334 Processing chr17_17722291_17725186
2021-04-08 14:51:00,335 File /mnt/fs01/RUNS/minion_run/DNA_meth/megalodon_results/subset_1_5G_norm/mod_mappings.cram is of type ont-cram
2021-04-08 14:51:00,557 Collected methylation data for 1 datasets
2021-04-08 14:51:00,658 Created QC plots
I am not sure why is not working and how I can fix it...
Another option would be to convert the bedmethyl file into bedgraph but I haven't been able to succeed on this either.
Any feedback or recommendation would be very much appreciated.
Kind regards,
Paula
I think the error is the same bug @wdecoster noted in #31. Hopefully a resolution can be found there.
The bedgraph input to methplotlib is slightly different than the mod_mappings. The mod_mappings contain per-read methylation calls, so can provide more information (phasing/linked methylation) while the bedgraph/bedmethyl is aggregated over reads (so links between calls between each read are lost). I'm not sure what outputs methplotlib has in terms of per-read modified base calls, so this may be a moot point here, but thought the distinction was important.
For converting from bedmethyl to bedgraph the following one-liner should work: awk 'BEGIN{OFS="\t"} {$4=$11; NF=4; print}' sample.bedmethyl > sample.bedgraph
. You may need to add a header to load this file in some genome browsers.
hi all,
I am working on methylation outputs to visualize that. I have megalodon(files: modified_bases.5mc.bed, per_read_modified_base_calls.db and per_read_modified_base_calls.txt) and ı am wondering to use these file in Methplotlib. Could you say me the steps basically because ı am a freshman in this field? Many thanks. Kind Regards.
The megalodon input is in BAM/CRAM format, but you don't have those?
Hi @wdecoster
ı am using comman below to run megalodon and these (fmodified_bases.5mc.bed, per_read_modified_base_calls.db and per_read_modified_base_calls.txt) files were generated. thanks
code: megalodon debarcodedmultifast5/barcode06/ --outputs mods --reference wholegenome.fasta --mod-motif m CG 0 --write-mods-text --processes 40 --guppy-server-path /cluster/lrcfs/2397405/bin/ont-guppy-cpu-6.0.1/bin/guppy_basecall_server --guppy-params "-d /cluster/lrcfs/2397405/bin/rerio/basecall_models/ --num_callers 40" --guppy-timeout 400 --guppy-config res_dna_r941_min_modbases_5mC_CpG_v001.cfg --overwrite --output-directory megalodonresultbarcode06
I think --outputs with mod_mappings
or mod_basecalls
is what you need, but I don't use megalodon myself
okey, @wdecoster ı will configure my code and run with a small data set to generate the required files. thanks
Hi @wdecoster
Earlier you wrote this:
- methplotlib can take cram files with nucleotide modifications in the Mm tag, so your bam file should not need too many changes. Converting to cram should be enough. That said, I could also add an option to use bam directly.
- That looks like a bedmethyl format. You can quite easily make a bedgraph out of that, which is currently one of the compatible input formats. I could in the future also add bedmethyl input directly.
- Could you share a larger example of that format? I'll add it as a compatible input too. You can find my email address in my profile.
As for 1. i have a cram file with modifications in the Mm tag and would really like to use methplotlib. However, i get an error regarding the -n argument.
code:
methplotlib -m merged_bam_sup_modbases_sed.cram -w chr15:23000000-25500000
usage: methplotlib [-h] [-v] -m METHYLATION [METHYLATION ...] -n NAMES [NAMES ...] -w WINDOW [-g GTF] [-b BED] [-f FASTA] [--simplify] [--split]
[--static STATIC] [--binary] [--smooth SMOOTH] [--dotsize DOTSIZE] [--minqual MINQUAL] [--example] [-o OUTFILE] [-q QCFILE]
methplotlib: error: the following arguments are required: -n/--names
How to deal with this?
Have you tried setting the -n argument?
methplotlib -m merged_bam_sup_modbases_sed.cram -w chr15:23000000-25500000 -n test
Sorry for my ignorance.
Yes i have but get an error. Thought it might be something with wrong arguments but now i see that my error is similar to what other people have reported earlier.
methplotlib -m merged_bam_sup_modbases_sed.cram -n test -w chr15:23000000-25500000
[E::easy_errno] Libcurl reported error 77 (Problem with the SSL CA cert (path? access rights?))
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/f036bd11158407596ca6bf3581454706": Input/output error
Traceback (most recent call last):
File "/home/prom/anaconda3/envs/methplotlib/bin/methplotlib", line 8, in
sys.exit(main())
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 26, in main
meth_browser(meth_data=meth_data,
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 60, in meth_browser
fig = utils.create_subplots(num_methrows,
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/utils.py", line 250, in create_subplots
return plotly.subplots.make_subplots(
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/_plotly_utils/importers.py", line 39, in getattr
raise AttributeError(
AttributeError: module 'plotly' has no attribute 'subplots'
2022-05-18 06:24:15,208 methplotlib 0.20.1 started.
2022-05-18 06:24:15,208 Python version is: 3.10.2 | packaged by conda-forge | (main, Jan 14 2022, 08:02:09) [GCC 9.4.0]
2022-05-18 06:24:15,208 Arguments are: Namespace(methylation=['merged_bam_sup_modbases_sed.cram'], names=['test'], window='chr15:23000000-25500000', gtf=None, bed=None, fasta=None, simplify=False, split=False, static=None, binary=False, smooth=5, dotsize=4, minqual=20, example=False, outfile=None, qcfile=None, store=False)
2022-05-18 06:24:15,208 Processing chr15_23000000_23833334
2022-05-18 06:24:15,350 File merged_bam_sup_modbases_sed.cram is of type cram
2022-05-18 06:24:22,103 Collected methylation data for 1 datasets
2022-05-18 06:24:22,628 Created QC plots
2022-05-18 06:24:36,421 Prepared methylation traces.
2022-05-18 06:24:36,421 Making browser in split mode, with 1 modification rows.
When i try to use the bam file it looks like this:
methplotlib -m merged_bam_sup_modbases_sed.bam -n test -w chr15:23000000-25500000
Traceback (most recent call last):
File "/home/prom/anaconda3/envs/methplotlib/bin/methplotlib", line 8, in
sys.exit(main())
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 26, in main
meth_browser(meth_data=meth_data,
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 60, in meth_browser
fig = utils.create_subplots(num_methrows,
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/methplotlib/utils.py", line 250, in create_subplots
return plotly.subplots.make_subplots(
File "/home/prom/anaconda3/envs/methplotlib/lib/python3.10/site-packages/_plotly_utils/importers.py", line 39, in getattr
raise AttributeError(
AttributeError: module 'plotly' has no attribute 'subplots'
Is my input file the problem or something else?
AttributeError: module 'plotly' has no attribute 'subplots'
is not related to your input file. Could you check the version of your plotly module? How did you install methplotlib?
I used pip install methplotlib
My plotly module is 5.8.0
I just installed plotly 5.8.0, and it works here. Please try in an interactive python session if it works, and print the version as well, to make sure that not an older version is imported.
Hi, I have get the same error when I run methplotlib, but it output an .html results that I don´t understand. I don´t have gff ad gtf file for this organism
methplotlib -m modified_bases.5mC.bedgraph -n calls -f Paspalum_notatum_genome.fasta -w CM039683.1:231000-234000
Reading modified_bases.5mC.bedgraph would be faster with bgzip and 'tabix -p bed'.
Traceback (most recent call last):
File "/home/diego/Documents/INTA/Nanopore/env/bin/methplotlib", line 8, in <module>
sys.exit(main())
File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 26, in main
meth_browser(meth_data=meth_data,
File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/methplotlib/methplotlib.py", line 89, in meth_browser
fig = utils.create_subplots(num_methrows, split=False, annotation=bool(bed or gtf))
File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/methplotlib/utils.py", line 261, in create_subplots
return plotly.subplots.make_subplots(
File "/home/diego/Documents/INTA/Nanopore/env/lib/python3.8/site-packages/_plotly_utils/importers.py", line 39, in __getattr__
raise AttributeError(
AttributeError: module 'plotly' has no attribute 'subplots'
My plotly version is also 5.8.0.
And I run it on an env/ in which I installed methplotlib
Do you have any clue?
I think the error is the same bug @wdecoster noted in #31. Hopefully a resolution can be found there.
The bedgraph input to methplotlib is slightly different than the mod_mappings. The mod_mappings contain per-read methylation calls, so can provide more information (phasing/linked methylation) while the bedgraph/bedmethyl is aggregated over reads (so links between calls between each read are lost). I'm not sure what outputs methplotlib has in terms of per-read modified base calls, so this may be a moot point here, but thought the distinction was important.
For converting from bedmethyl to bedgraph the following one-liner should work:
awk 'BEGIN{OFS="\t"} {$4=$11; NF=4; print}' sample.bedmethyl > sample.bedgraph
. You may need to add a header to load this file in some genome browsers.
I think the error is the same bug @wdecoster noted in #31. Hopefully a resolution can be found there.
The bedgraph input to methplotlib is slightly different than the mod_mappings. The mod_mappings contain per-read methylation calls, so can provide more information (phasing/linked methylation) while the bedgraph/bedmethyl is aggregated over reads (so links between calls between each read are lost). I'm not sure what outputs methplotlib has in terms of per-read modified base calls, so this may be a moot point here, but thought the distinction was important.
For converting from bedmethyl to bedgraph the following one-liner should work:
awk 'BEGIN{OFS="\t"} {$4=$11; NF=4; print}' sample.bedmethyl > sample.bedgraph
. You may need to add a header to load this file in some genome browsers.
Hi, I want to convert my bedmethyl file to bedgraph file. I used the one-liner and got an empty output. Would you please help check my problem? Many thanks!
My bedmethyl format is like this:
chr15 17170182 17170183 5mC 1000 - 17170182 17170183 0,0,0 1 0.00 1 0 0
chr15 17170225 17170226 5mC 0 - 17170225 17170226 0,0,0 1 0.00 0 0 0
chr15 17170391 17170392 5mC 0 - 17170391 17170392 0,0,0 1 0.00 0 0 0
chr15 17170477 17170478 5mC 0 - 17170477 17170478 0,0,0 1 0.00 0 0 0
chr15 17170495 17170496 5mC 1000 - 17170495 17170496 0,0,0 1 0.00 1 0 0