BACKPROP SUMMARY (c) Deniz Yuret, 2014 ====================================== Dependencies: ____________ | V a[l] w[l]--->a[l+1] | \/ | | /\ | V V V V ∇z[l] ∇w[l]<--∇z[l+1] ∧ | ------------ Notation - vectorized: M # number of examples in input batch l = 1...L # layers: 1 is input, (L+1) is output N[l] # size of layer l=1..(L+1) z[l] : N[l],M # linear output of layer l=2..(L+1): z[l]=w[l-1]*a[l-1]+b[l-1] a[l] : N[l],M # output of layer l=1..L: a[1] is input, a[l]=f(z[l]) for l=2..L w[l] : N[l+1],N[l] # weight matrix connecting layer l=1..L to layer l+1 b[l] : N[l+1],1 # bias vector connecting layer l=1..L to layer l+1 f(z), ∂f(z) # activation function and its derivative J(z[L+1],y) # cost, where y is the desired output ∇z[l] : N[l],M # gradient of cost wrt z[l], same size as z[l] Notation - scalar: z[l,i,m] # linear output at layer l, unit i, instance m a[l,i,m] # output of layer l, unit i, instance m w[l,j,i] # weight from layer l unit i to layer l+1 unit j b[l,j] # bias for layer l+1 unit j ∂J / ∂z[l,i,m] # partial derivative of cost wrt z[l,i,m] Forward - vectorized form: a[1] = input (Note that + needs singleton expansion, or b[l] needs repmat) a[l] = f(z[l]) # for l=2..L z[l+1] = w[l] * a[l] + b[l] # for l=1..L Forward - scalar form: a[l,i,m] = f(z[l,i,m]) z[l+1,j,m] = b[l,j] + Σ_i w[l,j,i] * a[l,i,m] Backward ∇z - vectorized form: for l=L..2; ∇z[L+1] manually computed ∇z[l] = (∂a[l] / ∂z[l]) .* ((∂z[l+1] / ∂a[l])' * ∇z[l+1]) = ∂f(z[l]) .* (w[l]' * ∇z[l+1]) Backward ∇z - scalar form: ∂J / ∂z[l,i,m] = (∂a[l,i,m] / ∂z[l,i,m]) * Σ_j (∂z[l+1,j,m] / ∂a[l,i,m]) * (∂J / ∂z[l+1,j,m]) = ∂f(z[l,i,m]) * Σ_j w[l,j,i] * (∂J / ∂z[l+1,j,m]) Backward ∇w - vectorized form: for l=L..1 ∇w[l] = ∇z[l+1] * (∂z[l+1] / ∂w[l])' = ∇z[l+1] * a[l]' Backward ∇w - scalar form: ∂J / ∂w[l,j,i] = Σ_m (∂J / ∂z[l+1,j,m]) * (∂z[l+1,j,m] / ∂w[l,j,i]) = Σ_m (∂J / ∂z[l+1,j,m]) * a[l,i,m] Backward ∇b - vectorized form: for l=L..1 (Note b[l] can be considered a matrix N[l+1],M created with repmat, thus Σ_m.) ∇b[l] = ∇z[l+1] * (∂z[l+1] / ∂b[l])' = Σ_m ∇z[l+1] Backward ∇b - scalar form: ∂J / ∂b[l,j] = Σ_m (∂J / ∂z[l+1,j,m]) * (∂z[l+1,j,m] / ∂b[l,j]) = Σ_m (∂J / ∂z[l+1,j,m]) Softmax ∇z[L+1] - scalar form: (Assuming y[m] is the correct class for instance m, and z1=z[L+1]) J = (1/M) * Σ_m log(Σ_j(exp(z1[j,m]))) - z1[y[m],m] ∂J / ∂z1[i,m] = (1/M) * ((1/(Σ_j(exp(z1[j,m])))) * exp(z1[i,m]) - [y[m]==i]) = (1/M) * (p(y[m]==i) - [y[m]==i]) Softmax ∇z[L+1] - vectorized form: (Assuming Y[i,m]=1 if instance m is in class i, ./ needs repmat or singleton expansion) a[L+1] = exp(z[L+1]) ./ log(Σ_i(exp(z[L+1]))) ∇z[L+1] = (1/M) * (a[L+1] - Y) General ∇z[l] = ∂f(z[l]) .* (w[l]' * ∇z[l+1]) RELU ∇z[l] = [a[l]>0] .* (w[l]' * ∇z[l+1]) Sigmoid ∇z[l] = a[l] .* (1 - a[l]) .* (w[l]' * ∇z[l+1])
denizyuret/rnet
Matlab code for feed forward neural networks with RELU hidden units and Softmax cost function.
MATLABMIT