sample_analysis_opts fraction.min is confusing
Opened this issue · 0 comments
ressy commented
The output data tables, both per-file and per-sample, have FractionOfTotal
and FractionOfLocus
columns, and we have a configurable threshold for the fraction of reads required to consider a peak as a candidate allele, fraction.min
. But this fraction isn't either of those two listed columns; instead the denominator is the sum of the read counts in each processed-samples table, which is a more stringent set than just the matching locus via primer(s).
To summarize:
- FractionOfTotal: denominator is the number of reads in the whole input file
- FractionOfLocus: denominator is the number of reads for all entries sharing a MatchingLocus column (determined by forward primer and optionally reverse primer)
- fraction applied when categorizing each row via
analyze_sample()
, which currently has no explicit column defined: denominator is the number of reads matching per-locus primer(s), repeat motif, and length range
This should be clarified in the documentation and outputs.