thunlp/OpenNE

inference method in TADW

junyachen opened this issue · 7 comments

In the model of TADW training, what's the inference method used in the following code:
What's the mathematical theory behind this code? Thank you.

def train(self):
self.adj = self.getAdj()
# M=(A+A^2)/2 where A is the row-normalized adjacency matrix
self.M = (self.adj + np.dot(self.adj, self.adj))/2
# T is feature_sizenode_num, text features
self.T = self.getT()
self.node_size = self.adj.shape[0]
self.feature_size = self.features.shape[1]
self.W = np.random.randn(self.dim, self.node_size)
self.H = np.random.randn(self.dim, self.feature_size)
# Update
for i in range(20):
print('Iteration ', i)
# Update W
B = np.dot(self.H, self.T)
drv = 2 * np.dot(np.dot(B, B.T), self.W) -
2
np.dot(B, self.M.T) + self.lambself.W
Hess = 2
np.dot(B, B.T) + self.lambnp.eye(self.dim)
drv = np.reshape(drv, [self.dim
self.node_size, 1])
rt = -drv
dt = rt
vecW = np.reshape(self.W, [self.dimself.node_size, 1])
while np.linalg.norm(rt, 2) > 1e-4:
dtS = np.reshape(dt, (self.dim, self.node_size))
Hdt = np.reshape(np.dot(Hess, dtS), [
self.dim
self.node_size, 1])

            at = np.dot(rt.T, rt)/np.dot(dt.T, Hdt)
            vecW = vecW + at*dt
            rtmp = rt
            rt = rt - at*Hdt
            bt = np.dot(rt.T, rt)/np.dot(rtmp.T, rtmp)
            dt = rt + bt * dt
        self.W = np.reshape(vecW, (self.dim, self.node_size))

        # Update H
        drv = np.dot((np.dot(np.dot(np.dot(self.W, self.W.T), self.H), self.T)
                      - np.dot(self.W, self.M.T)), self.T.T) + self.lamb*self.H
        drv = np.reshape(drv, (self.dim*self.feature_size, 1))
        rt = -drv
        dt = rt
        vecH = np.reshape(self.H, (self.dim*self.feature_size, 1))
        while np.linalg.norm(rt, 2) > 1e-4:
            dtS = np.reshape(dt, (self.dim, self.feature_size))
            Hdt = np.reshape(np.dot(np.dot(np.dot(self.W, self.W.T), dtS), np.dot(self.T, self.T.T))
                             + self.lamb*dtS, (self.dim*self.feature_size, 1))
            at = np.dot(rt.T, rt)/np.dot(dt.T, Hdt)
            vecH = vecH + at*dt
            rtmp = rt
            rt = rt - at*Hdt
            bt = np.dot(rt.T, rt)/np.dot(rtmp.T, rtmp)
            dt = rt + bt * dt
        self.H = np.reshape(vecH, (self.dim, self.feature_size))
zzy14 commented

You can refer to the original TADW paper.

Yes,I have referred to this paper. But still not clear about the inference algorithm used here. Do you use Newton optimization in this code? This paper said they alternatively optimize w and h,but not specific. I will appreciate if you tell me more.

您好,我也和您有一样的困惑,不知道W和H用的什么优化方法,不知道您现在是否弄明白了,希望能和您一起探讨,期待您的答复谢谢。

您好,我也和您有一样的困惑,不知道W和H用的什么优化方法,不知道您现在是否弄明白了,希望能和您一起探讨,期待您的答复谢谢。

他用了共轭梯度下降,搜一下,就是标准的Fletcher-Reeves

真的非常感谢您在百忙之中回复我,解决了我很长时间的疑惑。最近都在看相关方面的论文,因为自己所学知识不足代码经验有些匮乏,经常陷入死胡同,身边的同学和我的方向也不一致,没有办法请教,如果可以我非常希望在今后也能和您有这样的交流,因为陷入死胡同的时候真的渴望有个益友能讨论,突然有这样的话可能有些唐突,望见谅。

------------------ 原始邮件 ------------------ 发件人: "junyachen"notifications@github.com; 发送时间: 2019年6月24日(星期一) 下午5:03 收件人: "thunlp/OpenNE"OpenNE@noreply.github.com; 抄送: "刘金新"1561044851@qq.com; "Comment"comment@noreply.github.com; 主题: Re: [thunlp/OpenNE] inference method in TADW (#66) 您好,我也和您有一样的困惑,不知道W和H用的什么优化方法,不知道您现在是否弄明白了,希望能和您一起探讨,期待您的答复谢谢。 他用了共轭梯度下降,搜一下,就是标准的Fletcher-Reeves — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

可以加个微信啊,你发我加你