yupenghe/methylpy

add-methylation level not working

knoedlerj opened this issue · 12 comments

When I run add-methylation-level with an input tsv (genes that are differentially expressed, for which I'm interested in getting total methylation level) methylpy generates a blank output file.

Do you mind to share the top few lines of your input file?

Sure thing! The input tsv file is starts with:

chr1 59764278 59878081 NM_007561 0 +
chr1 193301993 193343878 NM_008484 0 +
chr1 42695767 42700209 NM_008900 0 +
chr1 95587681 95667594 NM_009183 0 -
chr1 84036292 84284645 NM_001003948 0 -
chr1 93478992 93509732 NM_010891 0 +

The format looks fine. There are a few possible causes.

  • I notice that you are using "chr1". Please make sure the chromosome in allc file is named in the same way.
  • Please double check that the input file is tab-separated.
  • You will need to add a header to the file; otherwise the first line of the file will be treated as header.

Thanks, that seems to have worked! However, now only some of the entries actually get their methylation levels calculated - currently trying to figure out why.

Update - it's only calculating levels for about 10% of the intervals listed and nothing obvious seems different about those intervals (these samples have about 25x coverage so there should be information on most of them). Has this behavior been reported before?

I don't think so. It will be great help for me to debug if you can share a subset of the data for reproducing this issue.

Can do - I can supply a subset of one of the allc files and the tsv. Even the reduced allc is pretty big (414 MB compressed) - how would you like me to send it? Thank you very much for your assistance!

If you can set up a link for me to download the data, it will be fine. FTP, google drive etc will work for me.

Thanks. I think the problem is that the input tsv file is not sorted. You can use the below command to sort the file.

head -n 1 POA_allpairwisegenes.tsv > header
tail -n +2 POA_allpairwisegenes.tsv|sort -k 1,1 -k 2,2g -k 3,3g |cat header - > POA_allpairwisegenes.reformatted.tsv
rm header

The problem should be solved with the sorted file. Please let me know if it works.

Looks like it worked, thank you!! Now to figure out what it all means . .

I have a similar issue. But my output tsv file has only the bed file
This is the test bed file
chromosome start end
2 1 40001
2 40001 80001
2 80001 120001
2 120001 160001
This is the output I get
chromosome start end methylation_level_ACA methylation_level_ACB methylation_level_pCMT3-RNAiA methylation_level_pCMT3-RNAiB
2 1 40001
2 40001 80001
2 80001 120001
2 120001 160001

I checked that my files are using proper tab as delimiter and the allc files are not empty....
2 1647 - CAT 2 20 1
2 1649 + CGT 10 12 1
2 1650 - CGT 16 21 1
2 1653 + CCT 0 12 0