mfaruqui/retrofitting

Retrofit Update Function

Opened this issue · 1 comments

Hello,
I wondered about the update function. In the code, every word vector is updated by the mean of the centroid of all neighbors and the initial vector, which accords to the update function proposed in the paper. However, I do not really understand how to get to this update function. In the paper, it is stated that the update function is deduced from the loss function by taking the partial derivative with respect to a vector x_i, set it to zero and transpose it to the vector x_i. The loss function is defined as follows:
sum[i..n](alpha * ||x_i - x_i'||^2 + sum[j:(i,j) in E](beta_ij||x_i - x_j||^2))

Here x_i' is the initialization of x_i. The graph is undirected and thus (i,j) in E implies (j,i) in E, which also means that every edge is considered two times in the formula (one time in the left part of the distance calculation and one time in the right part as a neighbor). Taking the derivative of this equation with respect to x_i, I came up with the following formula:
x_i = (alpha*x_i' + sum[j:(i,j) in E]((beta_ij + beta_ji)*x_j))/(alpha+sum[j:(i,j) in E](beta_ij+beta_ji))
which is different from
x_i = (alpha*x_i' + sum[j:(i,j) in E](beta_ij*x_j))/(alpha+sum[j:(i,j) in E](beta_ij))
I then tried out the calculation of the algorithm with both update functions on a small set of one-dimensional vectors and determined the loss after 100 iterations.
As a result, I got a lower loss for the new update function. Do I misunderstand something in the formulas?

How could i find value of Beta if i am using pretrained word embeddings?
According to me u are using not pretrained embedding but handcrafted embeddings as describe in paper as so its very easy to have a value of beta as co-occurrence probability.
I think for pretrained value of similarity measure could be value of beta. does it right?