ShawHahnLab/chiimp

sample_analysis_opts fraction.min is confusing

Opened this issue · 0 comments

ressy commented

The output data tables, both per-file and per-sample, have FractionOfTotal and FractionOfLocus columns, and we have a configurable threshold for the fraction of reads required to consider a peak as a candidate allele, fraction.min. But this fraction isn't either of those two listed columns; instead the denominator is the sum of the read counts in each processed-samples table, which is a more stringent set than just the matching locus via primer(s).

To summarize:

  • FractionOfTotal: denominator is the number of reads in the whole input file
  • FractionOfLocus: denominator is the number of reads for all entries sharing a MatchingLocus column (determined by forward primer and optionally reverse primer)
  • fraction applied when categorizing each row via analyze_sample(), which currently has no explicit column defined: denominator is the number of reads matching per-locus primer(s), repeat motif, and length range

This should be clarified in the documentation and outputs.