JuliaStats/KernelDensity.jl

Interface of KernelDensity.pdf is not consistent with Distributions.pdf

Opened this issue · 3 comments

I've noticed KernelDensity.pdf cannot be used in the same way as Distributions.pdf. I think this is very confusing for users and is a potential source of errors. In particular, pdf.(kde::UnivariateKDE, x::Vector) and pdf(kde::BivariateKDE, M::Matrix) do not work.

I could try adding them if there is agreement it should be done.

Yes, it is unfortunate that the package defines a pdf method that works on vectors. This predates broadcasting (I think). It would be best to remove pdf(ik::InterpKDE,xs::AbstractVector) & friends and make things work with broadcasting.

As for pdf(kde::BivariateKDE, M::Matrix), I would very much prefer the solution as above combined with eachcol and/or eachrow. There is no preferred/canonical way to iterate over vectors in matrices in Julia, it is best to be explicit.

I will look at this. Leaving it as it is would be asking for problems later, e.g. when more dimensional kde would be implemented. I'd strictly comply with the Distributions.jl interface.

This is actually simpler than I thought. The pdf method for vectors was not a problem. The problem was broadcast had no information about the types UnivariateKDE and InterpKDE. You only need to add information they are scalars

Base.broadcastable(s::InterpKDE) = Ref(s)

the same for UnivariateKDE. But then, the broadcast is very slow as it calculates interpolation for each point separately. Fortunately, fixing it is even simpler, but requires a breaking change: making InterpKDE non-mutable. Then the compiler knows making those interpolations is the same call and broadcast becomes fast, actually 50% faster than the old vector method!

I'd like to stress now I see no reasonable reason as why InterpKDE was mutable. It seems completely useless trying to modify it. The same for other KDE types; changing that behaviour is not required for this problem, but I suspect this can cause efficiency problems in other circumstances.

Adding matrix argument was done in pull request #102. Together this would make KernelDensity comply with Distributions.