sseung0703/KD_methods_with_TF

Choice of SVD gradient

Closed this issue · 1 comments

Can you give me some insights on your choice of SVD derivative vs Tensorflow implementation (https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf) and Pytorch implementation(https://j-towns.github.io/papers/svd-derivative.pdf)?

Are those three comparable or replacable? Sorry I am not a linear algebra guy so my question may sound weird.

Basically, three derivative functions are based on paper.

Ionescu, Catalin, Orestis Vantzos, and Cristian Sminchisescu. "Training deep networks with structured layers by matrix backpropagation." arXiv preprint arXiv:1509.07838 (2015).

Tensorflow implementation is same with the paper. But in some cases, it has
redundancy and may occur Nan value.

So, I implemented new derivative function which problems are solved.

I don't know Pytorch's derivative function but I think that it is same with
Tensorflow's.

If you want detail of my implementation, check my paper.