mhalushka/miRge3.0

miRNA filtration question

Closed this issue · 2 comments

Hello again,

This is a question rather than an issue. Could you please explain what the difference between the columns "All.miRNA.Reads" and "Filtered.miRNA.Reads"? I can't find any documentation on any filtering procedures except for isomer identification and this run was performed without that flag so I don't know what filtering procedures are being applied.

Thank you!

Hi @Glfrey,

The All.miRNA.Reads counts points to reads that map to both canonnical and isomiRs (mapped.csv). There are two filters we have in place here, as explained below for Filtered.miRNA.Reads:

  1. If canonical reads (first column with strict allignment) has < two hits, then the hit is replaced with zero in both canonical and isomiRs (last column). This will eliminate very small fraction of noise.
  2. The second is the option of --crThreshold i.e., the threshold of the proportion of canonical reads for the miRNAs to retain. Range for ex (0 - 0.5), (Default: 0.1). The number of reads mapping to canonnical over isomirs should be > 0.1, any read counts with ratio less than specified will be removed by default.

These above filtering make up the revised All.miRNA.Reads, now known as Filtered.miRNA.Reads. I hope I have not missed anything, @mhalushka, can suggest/confirm.

Thank you,
Arun.

Hi @arunhpatil,

Thanks for your help with this, I'll close the issue now.

Best wishes,

Gill