yupenghe/methylpy

DMRfind use two samples is ok but a lot of samples is wrong

Closed this issue · 24 comments

Hi yupenghe:
I currently have a problem that I cannot find the error and cannot solve. When I only use two samples, the DMRfind is very smooth and I can get the desired results (the code is as follow and result files can see the picture1).

methylpy DMRfind --allc-files /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481637.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2986363.tsv --sample-category g1 g2 --min-cluster 2 --mc-type CGN --num-procs 2 --keep-temp-files FALSE --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result/g1_g2
image

But when I use a lot of samples (73 samples that 48 samples' category is g1 and other 25 samples' category is g2) ,the DMRfind's result is strange( the code is as follow and result can be seen as picture2).

methylpy DMRfind --allc-files allc_GSM2481637.tsv allc_GSM2986371.tsv allc_GSM2481635.tsv allc_GSM2481641.tsv allc_GSM2986389.tsv allc_GSM2986380.tsv allc_GSM2481638.tsv allc_GSM2986381.tsv allc_GSM2481633.tsv allc_GSM2481634.tsv allc_GSM2481629.tsv allc_GSM2986382.tsv allc_GSM2481626.tsv allc_GSM2481640.tsv allc_GSM2986368.tsv allc_GSM2986379.tsv allc_GSM2481643.tsv allc_GSM2481618.tsv allc_GSM2986386.tsv allc_GSM2481636.tsv allc_GSM2481631.tsv allc_GSM2481625.tsv allc_GSM2986378.tsv allc_GSM2986383.tsv allc_GSM2481622.tsv allc_GSM2986376.tsv allc_GSM2481623.tsv allc_GSM2986387.tsv allc_GSM2986373.tsv allc_GSM2481624.tsv allc_GSM2986384.tsv allc_GSM2481639.tsv allc_GSM2986377.tsv allc_GSM2481619.tsv allc_GSM2986388.tsv allc_GSM2986375.tsv allc_GSM2481628.tsv allc_GSM2986374.tsv allc_GSM2986369.tsv allc_GSM2481627.tsv allc_GSM2481620.tsv allc_GSM2481621.tsv allc_GSM2481642.tsv allc_GSM2986370.tsv allc_GSM2481630.tsv allc_GSM2986372.tsv allc_GSM2481632.tsv allc_GSM2986385.tsv allc_GSM2986360.tsv allc_GSM2481549.tsv allc_GSM2986362.tsv allc_GSM2481538.tsv allc_GSM2481550.tsv allc_GSM2986365.tsv allc_GSM2481548.tsv allc_GSM2481551.tsv allc_GSM2986366.tsv allc_GSM2986363.tsv allc_GSM2481539.tsv allc_GSM2481544.tsv allc_GSM2481545.tsv allc_GSM2481552.tsv allc_GSM2986361.tsv allc_GSM2481541.tsv allc_GSM2481536.tsv allc_GSM2481543.tsv allc_GSM2986367.tsv allc_GSM2481537.tsv allc_GSM2481547.tsv allc_GSM2481542.tsv allc_GSM2481540.tsv allc_GSM2481546.tsv allc_GSM2986364.tsv --sample-category g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 --min-cluster 2 --mc-type CGN --num-procs 2 --keep-temp-files FALSE --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result/g1_g2
image

Could you help me find the error? 
Many thanks!

Hey did you get any error? It is weird that the two BED files were not generated but the results collapsed tsv file should be sufficient for follow analysis.

Yupeng

I opened the file g1_g2_rms_results_collapsed.tsv , it looks like this
image

and the test_output like this
image

Is it that the collapsed results file only has header?

Yes, only header

Can you send me the g1_g2_rms_results.tsv.gz file?

Do you mind to also share with me the g1_g2_rms_results.tsv.gz you got with just two files?

resullt.zip
I just got these files.

Sorry I was not clear. I am wondering if you can send me the g1_g2_rms_results.tsv.gz based on the data of two samples.

Thanks. Can you confirm that this is the command you ran? If you run this again, do you still get empty output (i.e. header only) files?

methylpy DMRfind --allc-files allc_GSM2481637.tsv allc_GSM2986371.tsv allc_GSM2481635.tsv allc_GSM2481641.tsv allc_GSM2986389.tsv allc_GSM2986380.tsv allc_GSM2481638.tsv allc_GSM2986381.tsv allc_GSM2481633.tsv allc_GSM2481634.tsv allc_GSM2481629.tsv allc_GSM2986382.tsv allc_GSM2481626.tsv allc_GSM2481640.tsv allc_GSM2986368.tsv allc_GSM2986379.tsv allc_GSM2481643.tsv allc_GSM2481618.tsv allc_GSM2986386.tsv allc_GSM2481636.tsv allc_GSM2481631.tsv allc_GSM2481625.tsv allc_GSM2986378.tsv allc_GSM2986383.tsv allc_GSM2481622.tsv allc_GSM2986376.tsv allc_GSM2481623.tsv allc_GSM2986387.tsv allc_GSM2986373.tsv allc_GSM2481624.tsv allc_GSM2986384.tsv allc_GSM2481639.tsv allc_GSM2986377.tsv allc_GSM2481619.tsv allc_GSM2986388.tsv allc_GSM2986375.tsv allc_GSM2481628.tsv allc_GSM2986374.tsv allc_GSM2986369.tsv allc_GSM2481627.tsv allc_GSM2481620.tsv allc_GSM2481621.tsv allc_GSM2481642.tsv allc_GSM2986370.tsv allc_GSM2481630.tsv allc_GSM2986372.tsv allc_GSM2481632.tsv allc_GSM2986385.tsv allc_GSM2986360.tsv allc_GSM2481549.tsv allc_GSM2986362.tsv allc_GSM2481538.tsv allc_GSM2481550.tsv allc_GSM2986365.tsv allc_GSM2481548.tsv allc_GSM2481551.tsv allc_GSM2986366.tsv allc_GSM2986363.tsv allc_GSM2481539.tsv allc_GSM2481544.tsv allc_GSM2481545.tsv allc_GSM2481552.tsv allc_GSM2986361.tsv allc_GSM2481541.tsv allc_GSM2481536.tsv allc_GSM2481543.tsv allc_GSM2986367.tsv allc_GSM2481537.tsv allc_GSM2481547.tsv allc_GSM2481542.tsv allc_GSM2481540.tsv allc_GSM2481546.tsv allc_GSM2986364.tsv --sample-category g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 --min-cluster 2 --mc-type CGN --num-procs 2 --keep-temp-files FALSE --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result/g1_g2

Yes, still get empty output.

I find that when I use two samples, the output is this
image

and when I use more samples, the output is this, and it does not look righ
image

Something is not right. Do you mind to share with me all of the allc files you are using? It would be useful for me to reproduce the error and debug.

Could you give me your E-mail and I just give you some files? (because this file is too big) I found that as long as there are more than two samples, no results can be obtained.
This file is too big and I cannot upload it directly here.

Sure. You can send the files to yupeng.he.bioinfo@gmail.com. Alternatively, the files can be hosted on google drive/dropbox/FTP and I can download them from there. You mentioned that the job failed as we use more than 3 samples. Maybe let's start with sharing the files for 3 samples.

Thanks, I shared my file using email, and the subject of the email is DMRfind use two samples is ok but a lot of samples is wrong.
Thank you very much.

I tried the allc files from you. It works ok for me. Here is the command I ran after putting allc files in the same directory.

methylpy DMRfind --allc-files allc_GSM29863*.tsv --output-prefix test

Does this work for you?

Don't need to use --sample-category?

It is used to specify replicates (sample group) and is not required. What was the command you ran on the files you shared?

Here is the command I ran, I still get empty output
methylpy DMRfind --allc-files /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481637.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2986371.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481635.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481641.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2986389.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2986360.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2481549.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2986362.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2481538.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2481550.tsv --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result5/g1_g2
image

Is it possible that my workspace is not large enough to get the correct output?

That is weird. Maybe I already asked you. Did you get any error message?

It should be the reason for the lack of space. After I cleared the space, it can run normally and get the result. Thank you very much for your help!