Restrict output of damage frequencies to limited number of positions
alexhbnr opened this issue · 1 comments
alexhbnr commented
At the moment, we output both the C>T and G>A frequency at for all positions in a read that were observed at least ones. This comes with two problems.
- The list of sites of positions is very long when a sample has a long read length distribution and most of these positions are not very informative.
- When merging the output of pyDamage obtained from multiple samples with different read length distributions, we have to take care of missing columns.
Therefore, I would propose to use the commandline parameter -w
specifying the window length for modelling and just output the same number of sites as well.
maxibor commented
This as now been fixed with the --wlen
flag.