slowikj/seqR

Add fill or complete parameter to count_kmers

michbur opened this issue · 2 comments

Add fill or complete parameter to count_kmers to add columns with k-mers that weren't present in the data. It will streamline the binding of the k-mer matrices.

If I check the rbind method correctly, indeed it does not work as expected if the sets of column names are not the same.

However, it is important to note that the number of all possible columns is exponential. Therefore, such a feature would drastically worsen the performance and could even make some computations practically impossible to perform.

In my opinion, a better feature to consider is to add a function that merges two sparse matrices in an efficient and correct way.
Moreover, I am wondering whether such a feature should be implemented as a separate package, since this is a general issue not strongly related with the seqR package.

The decision is not to add fill param due to the exponential complexity but rather to add a feature that merges two output matrices correctly (issue #78).