UMI output files
Closed this issue · 1 comments
Hi — is there an output file that is equivalent to miR.Counts.csv
but contains UMI counts instead of read counts for each miRNA?
Yes and no. There is no direct way to link with miRNA, but if you know the sequence (from miRBase or miRGeneDB) there is a way to link it, described below:
Command used here for an input file, with 4 bp UMI sequence across reads:
miRge3.0 -s SRR5233961.fastq -lib /mnt/d/Halushka_lab/Arun/GTF_Repeats_miRge2to3/miRge3_Lib/revised_hsa -on human -db mirgenedb -umi 4,4 -a illumina -udd -o output_dir
Note: option -udd
is important to remove PCR duplicates and this option will write the UMI sequences, corresponding miRNA sequence, and the number of times it is occurring.
For example, I am showing UMI of 4 bases on both ends of the reads and the one below is for let-7a:
grep -w "TGAGGTAGTAGGTTGTATAGTT" output_dir/miRge.2021-12-12_17-20-41/mapped.csv
TGAGGTAGTAGGTTGTATAGTT,1,Hsa-Let-7-P2a1_5p,,,,,,,,,6741
and
grep ",TGAGGTAGTAGGTTGTATAGTT," output_dir/miRge.2021-12-12_17-20-41/SRR5233961_umiCounts
.csv | head
AGTGCTAC,TGAGGTAGTAGGTTGTATAGTT,15
GTTCCTAC,TGAGGTAGTAGGTTGTATAGTT,8
CCATCTAC,TGAGGTAGTAGGTTGTATAGTT,15
TTAGTGGG,TGAGGTAGTAGGTTGTATAGTT,2
TACCCTAC,TGAGGTAGTAGGTTGTATAGTT,835
AAACCTGA,TGAGGTAGTAGGTTGTATAGTT,7
NACCCTAC,TGAGGTAGTAGGTTGTATAGTT,19
NTAGCTCA,TGAGGTAGTAGGTTGTATAGTT,1
GAGCCTAC,TGAGGTAGTAGGTTGTATAGTT,45
GGACCCTA,TGAGGTAGTAGGTTGTATAGTT,5
Details: the first sequence AGTGCTAC,TGAGGTAGTAGGTTGTATAGTT,15
, AGTG is the first four bases and CTAC is the last four bases of the UMI, and is repeated 15 times.
grep -c ",TGAGGTAGTAGGTTGTATAGTT," output_dir/miRge.2021-12-12_17-20-41/SRR5233961_umiCounts.csv
6741
You could use this file to determine the UMI counts. Let me know if you need more clarification.
Thank you,
Arun.