maxibor/pydamage

Restrict output of damage frequencies to limited number of positions

alexhbnr opened this issue · 1 comments

At the moment, we output both the C>T and G>A frequency at for all positions in a read that were observed at least ones. This comes with two problems.

  1. The list of sites of positions is very long when a sample has a long read length distribution and most of these positions are not very informative.
  2. When merging the output of pyDamage obtained from multiple samples with different read length distributions, we have to take care of missing columns.

Therefore, I would propose to use the commandline parameter -w specifying the window length for modelling and just output the same number of sites as well.

This as now been fixed with the --wlen flag.