inference method in TADW
junyachen opened this issue · 7 comments
In the model of TADW training, what's the inference method used in the following code:
What's the mathematical theory behind this code? Thank you.
def train(self):
self.adj = self.getAdj()
# M=(A+A^2)/2 where A is the row-normalized adjacency matrix
self.M = (self.adj + np.dot(self.adj, self.adj))/2
# T is feature_sizenode_num, text features
self.T = self.getT()
self.node_size = self.adj.shape[0]
self.feature_size = self.features.shape[1]
self.W = np.random.randn(self.dim, self.node_size)
self.H = np.random.randn(self.dim, self.feature_size)
# Update
for i in range(20):
print('Iteration ', i)
# Update W
B = np.dot(self.H, self.T)
drv = 2 * np.dot(np.dot(B, B.T), self.W) -
2np.dot(B, self.M.T) + self.lambself.W
Hess = 2np.dot(B, B.T) + self.lambnp.eye(self.dim)
drv = np.reshape(drv, [self.dimself.node_size, 1])
rt = -drv
dt = rt
vecW = np.reshape(self.W, [self.dimself.node_size, 1])
while np.linalg.norm(rt, 2) > 1e-4:
dtS = np.reshape(dt, (self.dim, self.node_size))
Hdt = np.reshape(np.dot(Hess, dtS), [
self.dimself.node_size, 1])
at = np.dot(rt.T, rt)/np.dot(dt.T, Hdt)
vecW = vecW + at*dt
rtmp = rt
rt = rt - at*Hdt
bt = np.dot(rt.T, rt)/np.dot(rtmp.T, rtmp)
dt = rt + bt * dt
self.W = np.reshape(vecW, (self.dim, self.node_size))
# Update H
drv = np.dot((np.dot(np.dot(np.dot(self.W, self.W.T), self.H), self.T)
- np.dot(self.W, self.M.T)), self.T.T) + self.lamb*self.H
drv = np.reshape(drv, (self.dim*self.feature_size, 1))
rt = -drv
dt = rt
vecH = np.reshape(self.H, (self.dim*self.feature_size, 1))
while np.linalg.norm(rt, 2) > 1e-4:
dtS = np.reshape(dt, (self.dim, self.feature_size))
Hdt = np.reshape(np.dot(np.dot(np.dot(self.W, self.W.T), dtS), np.dot(self.T, self.T.T))
+ self.lamb*dtS, (self.dim*self.feature_size, 1))
at = np.dot(rt.T, rt)/np.dot(dt.T, Hdt)
vecH = vecH + at*dt
rtmp = rt
rt = rt - at*Hdt
bt = np.dot(rt.T, rt)/np.dot(rtmp.T, rtmp)
dt = rt + bt * dt
self.H = np.reshape(vecH, (self.dim, self.feature_size))
Yes,I have referred to this paper. But still not clear about the inference algorithm used here. Do you use Newton optimization in this code? This paper said they alternatively optimize w and h,but not specific. I will appreciate if you tell me more.
您好,我也和您有一样的困惑,不知道W和H用的什么优化方法,不知道您现在是否弄明白了,希望能和您一起探讨,期待您的答复谢谢。
您好,我也和您有一样的困惑,不知道W和H用的什么优化方法,不知道您现在是否弄明白了,希望能和您一起探讨,期待您的答复谢谢。
他用了共轭梯度下降,搜一下,就是标准的Fletcher-Reeves
真的非常感谢您在百忙之中回复我,解决了我很长时间的疑惑。最近都在看相关方面的论文,因为自己所学知识不足代码经验有些匮乏,经常陷入死胡同,身边的同学和我的方向也不一致,没有办法请教,如果可以我非常希望在今后也能和您有这样的交流,因为陷入死胡同的时候真的渴望有个益友能讨论,突然有这样的话可能有些唐突,望见谅。
…
------------------ 原始邮件 ------------------ 发件人: "junyachen"notifications@github.com; 发送时间: 2019年6月24日(星期一) 下午5:03 收件人: "thunlp/OpenNE"OpenNE@noreply.github.com; 抄送: "刘金新"1561044851@qq.com; "Comment"comment@noreply.github.com; 主题: Re: [thunlp/OpenNE] inference method in TADW (#66) 您好,我也和您有一样的困惑,不知道W和H用的什么优化方法,不知道您现在是否弄明白了,希望能和您一起探讨,期待您的答复谢谢。 他用了共轭梯度下降,搜一下,就是标准的Fletcher-Reeves — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
可以加个微信啊,你发我加你