
Wrong implementation of the xavier initializer

wddabc opened this issue · 0 comments

The Xavier init in
is wrong. According to Glorot et al (2010) Eq.(1) and Eq.(16), the weights are sampled from a uniform dist, not a normal dist (I assume the npr.randn is a normal dist as numpy.random does).

I tried to change L33 to:
ret = npr.uniform(low=-var, high=var, size=shape)
But it caused another exception in MXNet, this might be another issue

Updated: Even the above is fixed, it is still wrong as the Xavier initialization depends on different activation functions. See clab/dynet#295 and clab/dynet#348 for discussions.