algbio/themisto

Output is not sorted with --sort-output

AntonBogun opened this issue · 2 comments

Reproduction:

  • Place the release themisto binary in the repository root
  • Build my_index like shown on the README:
    ./themisto_binary build -k 31 -i example_input/coli_file_list.txt --index-prefix my_index --temp-dir temp --mem-gigas 2 --n-threads 4 --file-colors
  • Pseudoalign .fastq file with the following contents:
@Example
CTTTGTGCGCTTCACTCATGTTCCACGCCACCATCAACAGCAGGGCAGCCATGGCGGAAAGCGGCAGCCAGGAGAGCAGCGGTGCCAGTACCAGCAGGGC
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

(the last line has to have a newline, otherwise the file cannot be parsed)
with the following command:
./themisto_binary pseudoalign -i my_index -q "example.fastq" -o "out.txt" --temp-dir temp --threshold 0.7 --sort-output

  • The file has the following contents:
    0 0 2 1
    the values are not sorted and should be "0 1 2"

Hi,

Sorry for the slow response. The --sort-output option actually just sorts the lines in the output file so the reads are listed in the same order as in the input. We might want to add another option to also sort the color identifiers within a line.

Added an option --sort-hits sort the output color ids, in commit 09806d9

For clarity, the option --sort-output has been renamed to --sort-output-lines. The old name still works to avoid breaking existing scripts.