DMRfind use two samples is ok but a lot of samples is wrong
Closed this issue · 24 comments
Hi yupenghe:
I currently have a problem that I cannot find the error and cannot solve. When I only use two samples, the DMRfind is very smooth and I can get the desired results (the code is as follow and result files can see the picture1).
methylpy DMRfind --allc-files /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481637.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2986363.tsv --sample-category g1 g2 --min-cluster 2 --mc-type CGN --num-procs 2 --keep-temp-files FALSE --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result/g1_g2
But when I use a lot of samples (73 samples that 48 samples' category is g1 and other 25 samples' category is g2) ,the DMRfind's result is strange( the code is as follow and result can be seen as picture2).
methylpy DMRfind --allc-files allc_GSM2481637.tsv allc_GSM2986371.tsv allc_GSM2481635.tsv allc_GSM2481641.tsv allc_GSM2986389.tsv allc_GSM2986380.tsv allc_GSM2481638.tsv allc_GSM2986381.tsv allc_GSM2481633.tsv allc_GSM2481634.tsv allc_GSM2481629.tsv allc_GSM2986382.tsv allc_GSM2481626.tsv allc_GSM2481640.tsv allc_GSM2986368.tsv allc_GSM2986379.tsv allc_GSM2481643.tsv allc_GSM2481618.tsv allc_GSM2986386.tsv allc_GSM2481636.tsv allc_GSM2481631.tsv allc_GSM2481625.tsv allc_GSM2986378.tsv allc_GSM2986383.tsv allc_GSM2481622.tsv allc_GSM2986376.tsv allc_GSM2481623.tsv allc_GSM2986387.tsv allc_GSM2986373.tsv allc_GSM2481624.tsv allc_GSM2986384.tsv allc_GSM2481639.tsv allc_GSM2986377.tsv allc_GSM2481619.tsv allc_GSM2986388.tsv allc_GSM2986375.tsv allc_GSM2481628.tsv allc_GSM2986374.tsv allc_GSM2986369.tsv allc_GSM2481627.tsv allc_GSM2481620.tsv allc_GSM2481621.tsv allc_GSM2481642.tsv allc_GSM2986370.tsv allc_GSM2481630.tsv allc_GSM2986372.tsv allc_GSM2481632.tsv allc_GSM2986385.tsv allc_GSM2986360.tsv allc_GSM2481549.tsv allc_GSM2986362.tsv allc_GSM2481538.tsv allc_GSM2481550.tsv allc_GSM2986365.tsv allc_GSM2481548.tsv allc_GSM2481551.tsv allc_GSM2986366.tsv allc_GSM2986363.tsv allc_GSM2481539.tsv allc_GSM2481544.tsv allc_GSM2481545.tsv allc_GSM2481552.tsv allc_GSM2986361.tsv allc_GSM2481541.tsv allc_GSM2481536.tsv allc_GSM2481543.tsv allc_GSM2986367.tsv allc_GSM2481537.tsv allc_GSM2481547.tsv allc_GSM2481542.tsv allc_GSM2481540.tsv allc_GSM2481546.tsv allc_GSM2986364.tsv --sample-category g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 --min-cluster 2 --mc-type CGN --num-procs 2 --keep-temp-files FALSE --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result/g1_g2
Could you help me find the error?
Many thanks!
Hey did you get any error? It is weird that the two BED files were not generated but the results collapsed tsv file should be sufficient for follow analysis.
Yupeng
Is it that the collapsed results file only has header?
Yes, only header
Can you send me the g1_g2_rms_results.tsv.gz
file?
Do you mind to also share with me the g1_g2_rms_results.tsv.gz
you got with just two files?
resullt.zip
I just got these files.
Sorry I was not clear. I am wondering if you can send me the g1_g2_rms_results.tsv.gz
based on the data of two samples.
Thanks. Can you confirm that this is the command you ran? If you run this again, do you still get empty output (i.e. header only) files?
methylpy DMRfind --allc-files allc_GSM2481637.tsv allc_GSM2986371.tsv allc_GSM2481635.tsv allc_GSM2481641.tsv allc_GSM2986389.tsv allc_GSM2986380.tsv allc_GSM2481638.tsv allc_GSM2986381.tsv allc_GSM2481633.tsv allc_GSM2481634.tsv allc_GSM2481629.tsv allc_GSM2986382.tsv allc_GSM2481626.tsv allc_GSM2481640.tsv allc_GSM2986368.tsv allc_GSM2986379.tsv allc_GSM2481643.tsv allc_GSM2481618.tsv allc_GSM2986386.tsv allc_GSM2481636.tsv allc_GSM2481631.tsv allc_GSM2481625.tsv allc_GSM2986378.tsv allc_GSM2986383.tsv allc_GSM2481622.tsv allc_GSM2986376.tsv allc_GSM2481623.tsv allc_GSM2986387.tsv allc_GSM2986373.tsv allc_GSM2481624.tsv allc_GSM2986384.tsv allc_GSM2481639.tsv allc_GSM2986377.tsv allc_GSM2481619.tsv allc_GSM2986388.tsv allc_GSM2986375.tsv allc_GSM2481628.tsv allc_GSM2986374.tsv allc_GSM2986369.tsv allc_GSM2481627.tsv allc_GSM2481620.tsv allc_GSM2481621.tsv allc_GSM2481642.tsv allc_GSM2986370.tsv allc_GSM2481630.tsv allc_GSM2986372.tsv allc_GSM2481632.tsv allc_GSM2986385.tsv allc_GSM2986360.tsv allc_GSM2481549.tsv allc_GSM2986362.tsv allc_GSM2481538.tsv allc_GSM2481550.tsv allc_GSM2986365.tsv allc_GSM2481548.tsv allc_GSM2481551.tsv allc_GSM2986366.tsv allc_GSM2986363.tsv allc_GSM2481539.tsv allc_GSM2481544.tsv allc_GSM2481545.tsv allc_GSM2481552.tsv allc_GSM2986361.tsv allc_GSM2481541.tsv allc_GSM2481536.tsv allc_GSM2481543.tsv allc_GSM2986367.tsv allc_GSM2481537.tsv allc_GSM2481547.tsv allc_GSM2481542.tsv allc_GSM2481540.tsv allc_GSM2481546.tsv allc_GSM2986364.tsv --sample-category g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g1 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 g2 --min-cluster 2 --mc-type CGN --num-procs 2 --keep-temp-files FALSE --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result/g1_g2
Yes, still get empty output.
Something is not right. Do you mind to share with me all of the allc files you are using? It would be useful for me to reproduce the error and debug.
Could you give me your E-mail and I just give you some files? (because this file is too big) I found that as long as there are more than two samples, no results can be obtained.
This file is too big and I cannot upload it directly here.
Sure. You can send the files to yupeng.he.bioinfo@gmail.com. Alternatively, the files can be hosted on google drive/dropbox/FTP and I can download them from there. You mentioned that the job failed as we use more than 3 samples. Maybe let's start with sharing the files for 3 samples.
Thanks, I shared my file using email, and the subject of the email is DMRfind use two samples is ok but a lot of samples is wrong.
Thank you very much.
I tried the allc files from you. It works ok for me. Here is the command I ran after putting allc files in the same directory.
methylpy DMRfind --allc-files allc_GSM29863*.tsv --output-prefix test
Does this work for you?
Don't need to use --sample-category?
It is used to specify replicates (sample group) and is not required. What was the command you ran on the files you shared?
Here is the command I ran, I still get empty output
methylpy DMRfind --allc-files /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481637.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2986371.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481635.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2481641.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g1/allc_GSM2986389.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2986360.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2481549.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2986362.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2481538.tsv /share/data/zhouy/Methylpy/methylpy4c8c/g2/allc_GSM2481550.tsv --output-prefix /share/data/zhouy/Methylpy/methylpy4c8c/result5/g1_g2
Is it possible that my workspace is not large enough to get the correct output?
That is weird. Maybe I already asked you. Did you get any error message?
It should be the reason for the lack of space. After I cleared the space, it can run normally and get the result. Thank you very much for your help!