Autograd.Sparse type causes regression
ekinakyurek opened this issue · 10 comments
Earlier, I could accumulate my gradients across iterations. However, recent changes in AutoGrad break it, because I can't sum two gradient array when they are AutoGrad.Sparse. There can be other issues with this type which I didn't test yet. In general, I believe one should get a gradient which is capable of everything that the corresponding parameter type can do.
julia> function foo(w)
return w[1][1]+w[2][1]
end
foo (generic function with 1 method)
julia> w = [param(3,3),param(3,3)]
2-element Array{Param{KnetArray{Float32,2}},1}:
P(KnetArray{Float32,2}(3,3))
P(KnetArray{Float32,2}(3,3))
julia> J = @diff foo(w)
T(-0.32367945)
julia> grad(J,w[1])
Sparse(KnetArray{Float32,2}(3,3)())
julia> grad(J,w[1]) + grad(J,w[2])
ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates:
+(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73
+(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74
Possible fix, define
+(::AutoGrad.Sparse, ::AutoGrad.Sparse)
Stacktrace:
[1] top-level scope at REPL[28]:1
julia> grad(J,w[1]) + grad(J,w[1])
ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates:
+(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73
+(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74
Possible fix, define
+(::AutoGrad.Sparse, ::AutoGrad.Sparse)
Stacktrace:
[1] top-level scope at REPL[29]:1
yeah, full
works for me! Though, the problematic thing about this interface is that you don't know what will your gradient type be in advance.
I realize that it has broken the Knet. When you have Adam optimizer with gclip and you get a Sparse gradient, the gclip fails in this case.
I can't replicate, the following works fine. Please provide a minimal example.
using Knet
# Load data (mnistdata basically replicates mnist.ipynb)
include(Knet.dir("data","mnist.jl"))
dtrn,dtst = mnistdata(xsize=(784,:),xtype=Array)
struct Foo; w; end
model = Foo(param(10,784))
# We turn Linear instances into callable objects for prediction:
(m::Foo)(x) = (I = (a->a[1]).(vec(argmax(x,dims=1))); m.w[:,I])
# model(x) gives predictions, let model(x,y) give the loss
(m::Foo)(x, y) = nll(m(x), y)
@info "training..."
@time Knet.minimize!(model, dtst, Adam(lr=0.0001,gclip=0.1))
dy/sparsebugs branch has implemented + for two Sparse values, please test.
Although, I didn't run your example, I believe you didn't get the error because your gradients doesn't exceed the gclip value. Here is a simpler example you can replicate without downloading anything.
julia> using Knet
julia> function foo(w)
s = 0.0
for i=1:length(w); s+=w[i]; end
return s
end
foo (generic function with 1 method)
julia> w = Param(randn(2,2))
2×2 Param{Array{Float64,2}}:
0.427868 0.657678
-0.332868 -1.50003
julia> J = @diff foo(w)
T(-0.7473544438700652)
julia> update!(value(w), grad(J,w), Adam(gclip=0.1))
ERROR: MethodError: lmul!(::Float64, ::AutoGrad.Sparse{Float64,2}) is ambiguous. Candidates:
lmul!(a, x::AutoGrad.Sparse{T,N}) where {T, N} in AutoGrad at /kuacc/users/eakyurek13/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:51
lmul!(s::Number, X::AbstractArray) in LinearAlgebra at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/LinearAlgebra/src/generic.jl:100
Possible fix, define
lmul!(::Number, ::AutoGrad.Sparse{T,N})
Stacktrace:
[1] gclip!(::AutoGrad.Sparse{Float64,2}, ::Float64) at /kuacc/users/eakyurek13/.julia/packages/Knet/IIjk8/src/update.jl:613
[2] update!(::Array{Float64,2}, ::AutoGrad.Sparse{Float64,2}, ::Adam) at /kuacc/users/eakyurek13/.julia/packages/Knet/IIjk8/src/update.jl:537
[3] top-level scope at REPL[6]:1
You are right, it was an ambiguity issue. I will create a PR now.
Fixed in current master.