williamleif/GraphSAGE

this application of graphSAGE algorithm really works? I doubt it!

homehehe opened this issue · 2 comments

so , run example_unsupervised.sh ,and use hinge loss, the trend of loss is like

Epoch: 0001
Iter: 0000 train_loss= 2.14502 train_mrr= 0.22259 train_mrr_ema= 0.22259 time= 1.17050
Iter: 0050 train_loss= 2.42778 train_mrr= 0.17273 train_mrr_ema= 0.21027 time= 0.08527
Iter: 0100 train_loss= 2.35766 train_mrr= 0.18598 train_mrr_ema= 0.20077 time= 0.07464
Iter: 0150 train_loss= 2.28739 train_mrr= 0.19115 train_mrr_ema= 0.19425 time= 0.07058
Iter: 0200 train_loss= 1.84928 train_mrr= 0.24786 train_mrr_ema= 0.19023 time= 0.06793
Iter: 0250 train_loss= 2.13344 train_mrr= 0.20242 train_mrr_ema= 0.18814 time= 0.06637
Iter: 0300 train_loss= 2.33268 train_mrr= 0.19176 train_mrr_ema= 0.18753 time= 0.06522
Iter: 0350 train_loss= 2.27680 train_mrr= 0.20565 train_mrr_ema= 0.18688 time= 0.06435
Iter: 0400 train_loss= 2.37595 train_mrr= 0.19267 train_mrr_ema= 0.18856 time= 0.06373
Iter: 0450 train_loss= 2.18717 train_mrr= 0.18997 train_mrr_ema= 0.18781 time= 0.06335
Iter: 0500 train_loss= 1.97818 train_mrr= 0.21563 train_mrr_ema= 0.18772 time= 0.06307
Iter: 0550 train_loss= 2.04497 train_mrr= 0.20323 train_mrr_ema= 0.18829 time= 0.06280
Iter: 0600 train_loss= 2.14987 train_mrr= 0.19552 train_mrr_ema= 0.18920 time= 0.06247
Iter: 0650 train_loss= 2.08283 train_mrr= 0.18723 train_mrr_ema= 0.18896 time= 0.06223
Iter: 0700 train_loss= 2.01260 train_mrr= 0.20139 train_mrr_ema= 0.18837 time= 0.06199

And run example_unsupervised.sh ,and use xent loss, the trend of loss is like:
Epoch: 0001
Iter: 0000 train_loss= 18.96014 train_mrr= 0.22259 train_mrr_ema= 0.22259 time= 1.25577
Iter: 0050 train_loss= 18.76009 train_mrr= 0.17460 train_mrr_ema= 0.21034 time= 0.08849
Iter: 0100 train_loss= 18.55639 train_mrr= 0.18638 train_mrr_ema= 0.20079 time= 0.07623
Iter: 0150 train_loss= 17.97637 train_mrr= 0.18695 train_mrr_ema= 0.19437 time= 0.07148
Iter: 0200 train_loss= 17.25497 train_mrr= 0.23274 train_mrr_ema= 0.19028 time= 0.06911
Iter: 0250 train_loss= 17.02276 train_mrr= 0.19407 train_mrr_ema= 0.18802 time= 0.06776
Iter: 0300 train_loss= 16.97402 train_mrr= 0.19147 train_mrr_ema= 0.18748 time= 0.06685
Iter: 0350 train_loss= 16.58705 train_mrr= 0.20570 train_mrr_ema= 0.18639 time= 0.06603
Iter: 0400 train_loss= 16.23674 train_mrr= 0.19959 train_mrr_ema= 0.18713 time= 0.06545
Iter: 0450 train_loss= 15.92830 train_mrr= 0.18490 train_mrr_ema= 0.18618 time= 0.06496
Iter: 0500 train_loss= 15.54055 train_mrr= 0.21570 train_mrr_ema= 0.18709 time= 0.06473
Iter: 0550 train_loss= 15.47228 train_mrr= 0.20580 train_mrr_ema= 0.18705 time= 0.06442
Iter: 0600 train_loss= 15.25948 train_mrr= 0.18497 train_mrr_ema= 0.18760 time= 0.06424
Iter: 0650 train_loss= 15.14235 train_mrr= 0.17513 train_mrr_ema= 0.18732 time= 0.06409

the result seems good, but if anyone take a close look, you will find tf.reduce_sum(true_xent), as pos_loss, increase!
and tf.reduce_sum(negative_xent) as neg_loss decrease. if set 'neg_sample_size' variable‘s value to 1, you can find
pos_loss decrease and neg_loss increase!

so , can author explain this?

I believe this behavior might be a result of the custom loss function used by graphsage to promote learning graph embeddings where similar nodes are nearer and disparate nodes are farther. The loss function for graphsage is something like J(zu) = -log(σ (zu zv)) - Q E(vn ~ Pn(v))log(σ (-zu zv)). Here Pn is a negative sampling distribution and Q the number of negative samples selected. By changing the negative sample size you then directly affect the weight of the expectation for negative values which is subtracted from the loss to encourage graph embeddings zu in which nearby nodes have similar embeddings and more distinct embeddings between nodes that are farther.

I believe this behavior might be a result of the custom loss function used by graphsage to promote learning graph embeddings where similar nodes are nearer and disparate nodes are farther. The loss function for graphsage is something like J(zu) = -log(σ (zu zv)) - Q E(vn ~ Pn(v))log(σ (-zu zv)). Here Pn is a negative sampling distribution and Q the number of negative samples selected. By changing the negative sample size you then directly affect the weight of the expectation for negative values which is subtracted from the loss to encourage graph embeddings zu in which nearby nodes have similar embeddings and more distinct embeddings between nodes that are farther.

Hello, can you get the similar embedding for nearby nodes after training?