/rs-peanut

GAF alignment evaluation tool.

Primary LanguageRustMIT LicenseMIT

peanut

GAF alignment evaluation tool.

peanut calculates alignment metrics of a given GAF file from GraphAligner evaluating the CIGAR string. It outputs four metrics:

  1. qsc
  2. uniq
  3. multi
  4. nonaln

Optionally, it writes the nonaln query regions to BED.

metrics

query sequence containment (qsc)

  • #E are the number of sequence matches (= or E symbol) in the GAF file. Nucleotide positions with sequence matches in multiple alignments are only counted once.
  • query_lens is the length of all queries in the GAF in nucleotides.

unique query sequence matches (uniq)

  • uniq_#E are the number of unique sequence matches in the GAF file.
  • query_lens is the length of all queries in the GAF in nucleotides.

multi query sequence matches (multi)

  • multi_#E are the number of multiple sequence matches in the GAF file. Nucleotide positions with more than one multiple sequence matches are only counted once.
  • query_lens is the length of all queries in the GAF in nucleotides.

non query sequence matches (nonaln)

  • nonaln_#E are the number of non-sequence matches in the GAF file.
  • query_lens is the length of all queries in the GAF in nucleotides.

usage

building

git clone https://github.com/pangenome/rs-peanut.git
cd rs-peanut
cargo build --release

example

peanut requires as an input a GAF file -g.

./target/release/peanut -g aln.gaf

The output is written to stdout in a tab-delimited format.

0.992910744238371	0.9926967987671109	0.00021394547126006352	0.007089255761628998

The first number is the qsc, the second number is the uniq, and the third number is the multi, and the fourth number is the nonaln.

TODOs

  • Add query sequence alignment match mismatch (qsamm).
  • Describe qsc.
  • Remove non-helping metrics qsamm and qsm.
  • Add 3 new metrics: number of unique query base alignments, number of multiple query base alignments, and number of nonaln query bases.

limits

So far, it has not been tested if peanut also works with GAF files not originating from GraphAligner.