trthatcher/DiscriminantAnalysis.jl

Rows or columns?

Closed this issue · 1 comments

First, this looks like a nice package. Great work.

From what I can tell, you're assuming that data points are the rows of the data matrix. This is common in statistics but opposite how several other Julia packages work (e.g., Distances.jl, MultivariateStats.jl). I believe those packages went with column-major for reasons of performance (this is especially relevant for Distances.jl, which needs to iterate over all data points often in an O(N^2) fashion).

I'm not saying you need to switch, but I am suggesting that you document your expectations clearly. In your documentation for lda, for example, you just call X a "matrix of floats," which doesn't address your expectation for layout.

Thank you!

I ran into the same row major/column major consideration in a package I wrote for kernel matrix computation. I ended up supporting both ways and it wasn't particularly arduous. I'll make the same enhancement to this package once I have the opportunity. For now, I added a note in the documentation.