mhalushka/miRge3.0

mirge3 killde

Opened this issue · 5 comments

image
when I tried to process 37 fastq files, Collapsing time increased and miRge3 was killed when 20 files Collapsed. Could I process files separately? Or there is any other way to process multiple files?
this is my code:
miRge3.0 -s ls 2.CleanFq/*gz | sed ':label;N;s/\n/,/;b label' -lib ~/paper/reference/annotation/miRge -on human -db miRBase -o 3.Results/tmp -nmir -ai -cpu 2 -ie -gff -tcf -spl -NX

In addition, what is the difference between processing multiple data separately and processing multiple data at once?

Hi @voluptatis,

The process was killed due to load on memory. At each step of collapsing process, miRge3.0 combines the collapsed reads and read counts from each sample in a Pandas dataframe.

After miRge3.0 run you will get the miRNA counts and RPM values (expression matrix) for each sample. Now, if you run individual samples you will end up with individual expression matrices. However, if you combine more samples (lets say 10), you will get the expression matrix of all 10 samples in one file. That is the one advantage for the secondary analysis, however, you can combine individual samples later in excel file.

Also I don't get the sed command, I hope it is not interupting the run. Can you try one sample and let me know how it goes?

Thank you,
Arun.

Hi @voluptatis,

Running multiple samples depends on the systems memory. If you run individual samples and then merge them later, the results will still be consistent.

Thank you for the suggestion. Creating intermediate file is possible (like a pickle file object of the dataframe), but it takes the same amount of time when one wants to combine them all later and may fail because it exceeds the capasity of the RAM. This also reduces the speed of the software overall. However, I will keep this in mind and come up with an alternative in the future.

Thank you,
Arun.