Autograd.Sparse type causes regression

Question

Autograd.Sparse type causes regression

ekinakyurek opened this issue 5 years ago · 10 comments

Earlier, I could accumulate my gradients across iterations. However, recent changes in AutoGrad break it, because I can't sum two gradient array when they are AutoGrad.Sparse. There can be other issues with this type which I didn't test yet. In general, I believe one should get a gradient which is capable of everything that the corresponding parameter type can do.

julia> function foo(w)
           return w[1][1]+w[2][1]
       end
foo (generic function with 1 method)

julia> w = [param(3,3),param(3,3)]
2-element Array{Param{KnetArray{Float32,2}},1}:
 P(KnetArray{Float32,2}(3,3))
 P(KnetArray{Float32,2}(3,3))

julia> J = @diff foo(w)
T(-0.32367945)

julia> grad(J,w[1])
Sparse(KnetArray{Float32,2}(3,3)())

julia> grad(J,w[1]) + grad(J,w[2])
ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates:
  +(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73
  +(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74
Possible fix, define
  +(::AutoGrad.Sparse, ::AutoGrad.Sparse)
Stacktrace:
 [1] top-level scope at REPL[28]:1

julia> grad(J,w[1]) + grad(J,w[1])
ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates:
  +(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73
  +(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74
Possible fix, define
  +(::AutoGrad.Sparse, ::AutoGrad.Sparse)
Stacktrace:
 [1] top-level scope at REPL[29]:1

Answer 1 · 2019-10-02T05:16:22.000Z

Ekin: sparse gradients are a big boost to some models, e.g. ones that use word embeddings with large vocabularies. Without sparse gradients, the gradient would have to be the same size as the whole embedding matrix even though you only want to update a few columns. However I cannot support all possible array operations for this Sparse type without significant effort. I supported the ones I used internally for update! etc., we can add others as needed (your + seems to be an easy one, we have to decide whether you want the result to be sparse or dense). In the meantime you can simply use full(grad(x,y)) to get a regular array.

Answer 2 · 2019-10-02T19:45:29.000Z

Let say your gpu cannot handle a task with batchsize = 32. However, you want to simulate same training. One way accomplish this using batchsize=8 and averaging the gradients through 4 iteration. This is where I got the error. I hope this helps.

________________________________ From: denizyuret <notifications@github.com> Sent: Wednesday, October 2, 2019 1:16 AM To: denizyuret/AutoGrad.jl Cc: Ekin Akyürek; Author Subject: Re: [denizyuret/AutoGrad.jl] Autograd.Sparse type causes regression (#114) I understand adding gradients to parameters but why would you add two gradients together?

On Tue, Oct 1, 2019 at 9:17 PM Ekin Akyürek ***@***.***> wrote: Earlier, I could accumulate my gradients across iterations. However, recent changes in AutoGrad break it. because I can't sum two gradient now. There can be other issues with this type which I didn't test. In general, I believe one should get a gradient which is capable of everything that the corresponding parameter type can do it. julia> function foo(w) return w[1][1]+w[2][1] end foo (generic function with 1 method) julia> w = [param(3,3),param(3,3)] 2-element Array{Param{KnetArray{Float32,2}},1}: P(KnetArray{Float32,2}(3,3)) P(KnetArray{Float32,2}(3,3)) julia> J = @diff foo(w) T(-0.32367945) julia> grad(J,w[1]) Sparse(KnetArray{Float32,2}(3,3)()) julia> grad(J,w[1]) + grad(J,w[2]) ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates: +(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73 +(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74 Possible fix, define +(::AutoGrad.Sparse, ::AutoGrad.Sparse) Stacktrace: [1] top-level scope at REPL[28]:1 julia> grad(J,w[1]) + grad(J,w[1]) ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates: +(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73 +(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74 Possible fix, define +(::AutoGrad.Sparse, ::AutoGrad.Sparse) Stacktrace: [1] top-level scope at REPL[29]:1 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#114?email_source=notifications&email_token=AAN43JTJRUYR6L5ZYSUOSSLQMOH4TA5CNFSM4I4NCDR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HO5T6RA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAN43JQWTA67GE3ZVLZXTZDQMOH4TANCNFSM4I4NCDRQ> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#114?email_source=notifications&email_token=ADVGX4JP2W76GNB4U6I3V2DQMQVCPA5CNFSM4I4NCDR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEADSK7A#issuecomment-537339260>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADVGX4MXJAFOAUYOLQV4AGTQMQVCPANCNFSM4I4NCDRQ>.

Answer 3 · 2019-10-04T00:47:57.000Z

yeah, full works for me! Though, the problematic thing about this interface is that you don't know what will your gradient type be in advance.

Answer 4 · 2019-10-04T07:57:56.000Z

I will make + work as well.

…

On Fri, Oct 4, 2019, 3:47 AM Ekin Akyürek ***@***.***> wrote: yeah, full works for me! Though, the problematic thing about this interface is that you don't know what will your gradient type be in advance. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#114?email_source=notifications&email_token=AAN43JXNEPCDVQ3YVTAYTXLQM2HD5A5CNFSM4I4NCDR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAKAQEI#issuecomment-538183697>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAN43JVPUBSMCO4ZZC4P2CDQM2HD5ANCNFSM4I4NCDRQ> .

Answer 5 · 2019-10-11T17:52:53.000Z

I realize that it has broken the Knet. When you have Adam optimizer with gclip and you get a Sparse gradient, the gclip fails in this case.

Answer 6 · 2019-10-11T19:20:13.000Z

I can't replicate, the following works fine. Please provide a minimal example.

using Knet

# Load data (mnistdata basically replicates mnist.ipynb)                                                                      
include(Knet.dir("data","mnist.jl"))
dtrn,dtst = mnistdata(xsize=(784,:),xtype=Array)

struct Foo; w; end

model = Foo(param(10,784))

# We turn Linear instances into callable objects for prediction:                                                              
(m::Foo)(x) = (I = (a->a[1]).(vec(argmax(x,dims=1))); m.w[:,I])

# model(x) gives predictions, let model(x,y) give the loss                                                                    
(m::Foo)(x, y) = nll(m(x), y)

@info "training..."
@time Knet.minimize!(model, dtst, Adam(lr=0.0001,gclip=0.1))

Answer 7 · 2019-10-11T19:21:13.000Z

dy/sparsebugs branch has implemented + for two Sparse values, please test.

Answer 8 · 2019-10-13T00:09:23.000Z

Although, I didn't run your example, I believe you didn't get the error because your gradients doesn't exceed the gclip value. Here is a simpler example you can replicate without downloading anything.

julia> using Knet

julia> function foo(w)
           s = 0.0
           for i=1:length(w); s+=w[i]; end
           return s
       end

foo (generic function with 1 method)

julia> w = Param(randn(2,2))
2×2 Param{Array{Float64,2}}:
  0.427868   0.657678
 -0.332868  -1.50003

julia> J = @diff foo(w)
T(-0.7473544438700652)

julia> update!(value(w), grad(J,w), Adam(gclip=0.1))
ERROR: MethodError: lmul!(::Float64, ::AutoGrad.Sparse{Float64,2}) is ambiguous. Candidates:
  lmul!(a, x::AutoGrad.Sparse{T,N}) where {T, N} in AutoGrad at /kuacc/users/eakyurek13/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:51
  lmul!(s::Number, X::AbstractArray) in LinearAlgebra at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/LinearAlgebra/src/generic.jl:100
Possible fix, define
  lmul!(::Number, ::AutoGrad.Sparse{T,N})
Stacktrace:
 [1] gclip!(::AutoGrad.Sparse{Float64,2}, ::Float64) at /kuacc/users/eakyurek13/.julia/packages/Knet/IIjk8/src/update.jl:613
 [2] update!(::Array{Float64,2}, ::AutoGrad.Sparse{Float64,2}, ::Adam) at /kuacc/users/eakyurek13/.julia/packages/Knet/IIjk8/src/update.jl:537
 [3] top-level scope at REPL[6]:1

Answer 9 · 2019-10-13T06:16:12.000Z

You are right, it was an ambiguity issue. I will create a PR now.

Answer 10 · 2019-10-22T06:41:39.000Z

Fixed in current master.