`rw` normalization non-negotiable for `DiffusionMap`?
Opened this issue · 6 comments
for the diffusion map, by default α::Real=0.0
and this code here to normalize the kernel matrix is not called.
I'm pretty sure the normalize!(L, α=α, norm=:rw)
line is non-negotiable. otherwise, the kernel matrix will not have columns sum to one, breaking the theory behind the diffusion map. pretty sure this needs to happen to turn the matrix into a Markov transition matrix. see Sec. 2.1 here. agree? disagree?
I don't think the normalization below is correct, anyway.
normalize!(L, α=α, norm=:sym) # Lᵅ = D⁻ᵅ*L*D⁻ᵅ
normalize!(L, α=α, norm=:rw) # M =(Dᵅ)⁻¹*Lᵅ
see Wikipedia.
normalize!(L, α=α, norm=:sym)
is correct. but then it should be normalize!(L, α=1, norm=:rw)
. I think this and the above issue can be solved by putting normalize!(L, α=1, norm=:rw)
after the if α> 0 (...) end
.
normalize!(L, α=α, norm=:sym)
line takes the kernel matrix and translates it to L_new = D⁻ᵅLD⁻ᵅ, yes. correct. but then we should make sure the columns sum to one by doing normalize!(L, α=1, norm=:rw)
. on Wikipedia, D^(alpha) is denoting a degree matrix not D to the alpha power. so this line should be α=1
for the :rw
normalization. agree? disagree?
another bug I found:
decompose
assumes that L
is symmetric. but it is not necessarily. this introduces another error.
here is my code, which does not assume L
is symmetric:
# source https://www.crcv.ucf.edu/gauss/info/project/ccs/diffusion-maps-finite.pdf
function cory_diffusion_map((X)::Matrix{Float64}, kernel::Function, d::Int; t::Int=1)
# n_features = size(X)[1]
# for f = 1:n_features
# X[f, :] = (X[f, :] .- mean(X[f, :]))/ std(X[f, :])
# end
# compute Laplacian matrix
K = pairwise(kernel, eachcol(X), symmetric=true)
# normalize, to be a Markov chain transition matrix
D = diagm(vec(sum(K, dims=1)))
D⁻¹ = diagm(1 ./ diag(D))
L = D⁻¹ * K
@assert all(sum.(eachrow(L)) .≈ 1.0)
# eigen-decomposition
decomp = eigen(L)
# skip first (1.0) eigenvalue. sort highest to lowest otherwise.
idx = sortperm(decomp.values, rev=true)[2:end]
λs = decomp.values[idx][1:d]
Vs = decomp.vectors[:, idx][:, 1:d] * diagm(λs .^ t)
return Vs
end
I now see that the current setup of fit(DiffMap, ...)
would only pass a symmetric matrix.
anyway, is a PR welcome with three changes:
- make sure
L
is a right-stochastic matrix.
# normalize it
if α > 0
normalize!(L, α=α, norm=:sym) # L̃ = D⁻ᵅ*L*D⁻ᵅ
end
# random walk
normalize!(L, norm=:rw) # M =(D̃)⁻¹ * L̃
- change
decompose
call to useskipfirst=true
. need to skip first, constant eigenvector. pretty sure current setup ofskipfirst=false
is incorrect. - change very dangerous
decompose
function which converts any matrix into a symmetric one (even if it is not) for eigendecomposition, to handle non-symmetricL
.
I'm reasonably confident the current set-up is broken since it cannot even handle the 2D S-shaped curve. my diffusion map code is able to beautifully recover the S-shaped curve.
thanks.
Just ran into the same issue and came to check if someone noticed the skipfirst=false
in the source. Good catches @SimonEnsemble
here is our implementation of diffusion maps we're using for a project.
https://github.com/SimonEnsemble/DiffusionMap.jl