sample_analysis_opts fraction.min is confusing

Question

sample_analysis_opts fraction.min is confusing

Opened this issue 3 years ago · 0 comments

The output data tables, both per-file and per-sample, have FractionOfTotal and FractionOfLocus columns, and we have a configurable threshold for the fraction of reads required to consider a peak as a candidate allele, fraction.min. But this fraction isn't either of those two listed columns; instead the denominator is the sum of the read counts in each processed-samples table, which is a more stringent set than just the matching locus via primer(s).

To summarize:

FractionOfTotal: denominator is the number of reads in the whole input file
FractionOfLocus: denominator is the number of reads for all entries sharing a MatchingLocus column (determined by forward primer and optionally reverse primer)
fraction applied when categorizing each row via analyze_sample(), which currently has no explicit column defined: denominator is the number of reads matching per-locus primer(s), repeat motif, and length range

This should be clarified in the documentation and outputs.