output nanopore_pipeline test dataproduces 3 UMI consensus sequences
Closed this issue · 2 comments
Output data as run according to instructions produces 3 UMI consensus seqs instead of 4 (UMI2 is missing) - I suspect it is filtered out in the final stage of the umi_binning.sh script.
The binning itself seems okay as:
- ssUMI_test/umi_binning/umi_ref/umi_ref.fa contains UMIs for all 4 umi bins.
- ssUMI_test/umi_binning/read_binning/umi_bin_map.txt and ssUMI_test/umi_binning/read_binning/umi_binning_stats.txt includes UMI2
The final binning-filtering step output umi_binning/read_binning/pass_bins.txt does not contain UMI2.
the mapping folder in raconx3 also does not contain a subfolder "umi2bins"
contents of key files below
###################################
ssUMI_test/umi_binning/read_binning/pass_bins.txt:
umi1bins
umi4bins
umi3bins
#####################################
ssUMI_test/consensus_raconx3.fa:
>umi3bins;ubs=16
GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCGTATCGTGTAGAGACTGCGTAGGCTACA[...]
>umi1bins;ubs=23
GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCGTATCGTGTAGAGACTGCGTAGGGCTT[...]
>umi4bins;ubs=5
GTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCAGTGATCGAGTCAGTGCGAGTGCTTCGT[...]
Run on WSL Ubuntu 22.04
I traced the issue to the last line (765) of the umi_binning.sh script and replaced tail -n +2 by tail -n +1
tail -n +1 > $BINNING_DIR/pass_bins.txt
after adjusting, the nanopore pipeline produces 4 UMI consensus seqs
Hey there, sorry for the delay (summer holidays). Thanks for catching this! The tail +2 was originally in place to skip a header line, but that was ultimately changed. The result of that was losing 1 UMI sequence. I have updated the umi_binning.sh script to fix this issue.