ZielsLab/ssUMI

output nanopore_pipeline test dataproduces 3 UMI consensus sequences

Closed this issue · 2 comments

Output data as run according to instructions produces 3 UMI consensus seqs instead of 4 (UMI2 is missing) - I suspect it is filtered out in the final stage of the umi_binning.sh script.

The binning itself seems okay as:

  • ssUMI_test/umi_binning/umi_ref/umi_ref.fa contains UMIs for all 4 umi bins.
  • ssUMI_test/umi_binning/read_binning/umi_bin_map.txt and ssUMI_test/umi_binning/read_binning/umi_binning_stats.txt includes UMI2

The final binning-filtering step output umi_binning/read_binning/pass_bins.txt does not contain UMI2.
the mapping folder in raconx3 also does not contain a subfolder "umi2bins"

contents of key files below
###################################
ssUMI_test/umi_binning/read_binning/pass_bins.txt:

umi1bins
umi4bins
umi3bins

#####################################
ssUMI_test/consensus_raconx3.fa:

>umi3bins;ubs=16
GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCGTATCGTGTAGAGACTGCGTAGGCTACA[...]
>umi1bins;ubs=23
GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCGTATCGTGTAGAGACTGCGTAGGGCTT[...]
>umi4bins;ubs=5
GTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCAGTGATCGAGTCAGTGCGAGTGCTTCGT[...]

Run on WSL Ubuntu 22.04

I traced the issue to the last line (765) of the umi_binning.sh script and replaced tail -n +2 by tail -n +1

tail -n +1 > $BINNING_DIR/pass_bins.txt

after adjusting, the nanopore pipeline produces 4 UMI consensus seqs

ziels commented

Hey there, sorry for the delay (summer holidays). Thanks for catching this! The tail +2 was originally in place to skip a header line, but that was ultimately changed. The result of that was losing 1 UMI sequence. I have updated the umi_binning.sh script to fix this issue.