Do you have plan to implement unsupervised versions of GraphSAGE?

Question

Do you have plan to implement unsupervised versions of GraphSAGE?

chenwgen opened this issue 7 years ago · 12 comments

Hi, do you have plan to implement unsupervised versions of GraphSAGE? thanks.

Answer 1 · 2018-04-05T22:29:44.000Z

You just need to change the loss function to Eq. 1 from the original paper.

Answer 2 · 2018-05-09T08:08:57.000Z

@unsuthee have you implemented it? I have changed the loss function to Eq. 1 but the embeddings do not make sense: connected nodes do not have close embeddings. They are different from the tensorflow version.

Answer 3 · 2018-06-06T01:05:14.000Z

I've implemented the unsupervised version, by training the model using random walks or network edges. However, the converging is wired. Discussions are welcome.

Answer 4 · 2018-06-24T13:33:31.000Z

One thing that is necessary is to constrain the embeddings to be unit length. This is mentioned in the appendix of the paper I think. For instance you can use cosine instead of the dot product to achieve this. Though this is a minor thing, it can have a big impact on convergence.

Answer 5 · 2018-06-24T13:34:51.000Z

Sadly, I don't plan on implementing the unsupervised any time soon, but pull requests are welcome! :)

Answer 6 · 2018-06-24T14:06:58.000Z

@williamleif That makes sense now! Thanks for pointing out!

Answer 7 · 2018-07-03T03:45:58.000Z

Hi @HongxuChenUQ, is it possible for you to share the loss code you made for unsupervised version?

Actually, I tried to combine loss_label and loss_network and found the F1 score lifts from 0.88 to 0.93. But when I leave the loss_network alone, there will be none grad to the model's weight. Since I am new to PyTorch, it is really annoying! I can't figure out the problem.

Below is my loss code, where nodes and negtive_samples are node lists.

def loss(self, nodes, negtive_samples, num_neighs, labels):
        loss_list = []
        z_negtive_samples = self.enc(negtive_samples).t()
        z_querys = self.enc(nodes).t()
        for i,query in enumerate(nodes):
            z_query = z_querys[i]
            neighbors = list(self.adj_lists[int(query)])[:num_neighs]
            z_neighbors = self.enc(neighbors).t()
            pos = torch.min(torch.sigmoid(torch.tensor([torch.dot(z_query,z_neighbor) for z_neighbor in z_neighbors]))).requires_grad_()
            neg = torch.max(torch.sigmoid(torch.tensor([torch.dot(z_query,z_ns) for z_ns in z_negtive_samples]))).requires_grad_()
            loss_list.append(torch.max(Variable(torch.tensor(0.0)),neg-pos+self.margin))
        loss_net = Variable(torch.mean(torch.tensor(loss_list)),requires_grad=True)
        scores = self.forward(nodes)
        loss_sup = self.xent(scores, labels.squeeze())
        return loss_sup+loss_net

Answer 8 · 2018-07-03T06:44:36.000Z

@HongxuChenUQ Really appreciate that! What about your performance of F1 score?

Answer 9 · 2018-07-03T07:40:19.000Z

@fs302 I've tested it on AUC performance, it is good. You will have to train a classifier if you want to test it on F1 score.

Answer 10 · 2018-07-03T08:17:36.000Z

@HongxuChenUQ Yes, I use a 2-layer NN as downstream classifier, but only achieve F1=0.31, which is much lower than End-to-End supervised version(F1=0.84) on the same embedding setting.

I wonder if it might be the difference of positive & negative pair sampling.

Answer 11 · 2018-07-03T13:27:15.000Z

Thanks to @HongxuChenUQ, after tuning the learning rate of downstream classifier and generate a robust negative samples, the best F1 hit 0.76 for unsupervised version.

Answer 12 · 2019-01-31T17:26:05.000Z

@fs302 @ @HongxuChenUQ is it possible to share the code for unsupervised version?