mhalushka/miRge3.0

Trimming repeated nucleotides

Opened this issue · 2 comments

Is there an option in miRge to trim off nucleotides that are repeated more than k times? e.g. ACGT[A* >= k]TGCA gets trimmed to ACGTTGCA. I know this can be done prior to running the miRge pipeline but it would be nice to just include as an argument in the miRge run script.

Hi @adamcatto,

Thank you for your suggestion, I don't see why this makes an improvement in the current pipeline and/or its benifits over all. We don't currently have the option of removing internal repeated nucleotides. (Please expect delays due to travel and will be back on May 04 EST).

Thank you,
Arun.

I think for some reads there may be strings of identical nucleotides that are technical artifacts which should be removed. In any case, I have forked the repository and added an option to remove repeated nucleotides ≥ a given length. You can view the changes here if it sounds interesting: adamcatto@8709dfe