Issue with understanding
Opened this issue · 2 comments
I am currently trying to learn how Graph Neural Networks work, but I am stuck for days with my understanding of this topic. Maybe someone of you can help me out.
I am using Zacharys Karate Club as graph dataset, where it is the goal to perform a node classification to determine which node (person) is loyal to which instructor ( Node 0 or Node 33).
For this purpose I am using the InteractionNetwork module with Linear modules for the node and edge updates. I did assume (and maybe this is where I misunderstood something GNNs) that if I put a sigmoid activation function after the node update, the nodes would have either 0 (loyal to Node 0) or 1 (loyal to Node 1) as values. But I get different double values.
Below is the code that I am using:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tree
from pyvis.network import Network
from graph_nets import blocks
from graph_nets import graphs
from graph_nets import modules
from graph_nets import utils_np
from graph_nets import utils_tf
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import sonnet as snt
import tensorflow as tf
import functools
# making GraphsTuple from karate club dataset
# get dataset from nx
karate_graph = nx.karate_club_graph()
karate_graph_tupel = karate_graph
# getting node informations
# labeling the nodes
nodes = []
for i in range(1,34):
if i == 1:
nodes.append(0)
if i == 33:
nodes.append(1)
else:
nodes.append(-1)
nodes = np.reshape(nodes, (len(nodes), 1))
nodes_float = tf.cast(nodes, dtype=tf.float64)
# getting sender and receiver informations
sender = []
receiver = []
for tupel in karate_graph.edges:
sender.append(tupel[0])
receiver.append(tupel[1])
# getting edge informations
# make graph undirected
directed_edges = karate_graph.edges
undirected_edges = [(u, v) for u, v in directed_edges] + [(v, u) for u, v in directed_edges]
karate_graph.edges = undirected_edges
edges = [[0.0] for _ in range(karate_graph.number_of_edges()*2)]
# create GraphTuple from received informations
data_dict = {
"nodes": nodes_float,
"edges": edges,
"senders": sender,
"receivers": receiver
}
graphs_tuple = utils_np.data_dicts_to_graphs_tuple([data_dict])
graphs_tuple = tree.map_structure(lambda x: tf.constant(x) if x is not None else None, graphs_tuple)
# defining graph network
graph_network = modules.InteractionNetwork(
node_model_fn=lambda: snt.Sequential([snt.Linear(output_size=1), tf.nn.sigmoid]),
edge_model_fn=lambda: snt.Sequential([snt.Linear(output_size=1)])
)
# optimizer and loss function
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
# learning loop
for epoch in range(50):
with tf.GradientTape() as tape:
output_graph = graph_network(graphs_tuple)
# Loss for labeled nodes
labeled_nodes = [0, 33]
labeled_indices = [i for i in labeled_nodes if graphs_tuple.nodes[i] != -1]
loss = loss_fn(tf.gather(graphs_tuple.nodes, labeled_indices), tf.gather(output_graph.nodes, labeled_indices))
# calculate gradient
gradients = tape.gradient(loss, graph_network.trainable_variables)
# apply gradient
optimizer.apply_gradients(zip(gradients, graph_network.trainable_variables))
# Loss output
print("Epoch %d | Loss: %.4f" % (epoch, loss.numpy()))
print(output_graph.nodes)
print(output_graph.edges)
This is the output that I get:
Loss-funtion:
Epoch 0 | Loss: 0.6619
Epoch 1 | Loss: 0.6547
Epoch 2 | Loss: 0.6478
Epoch 3 | Loss: 0.6412
Epoch 4 | Loss: 0.6351
Epoch 5 | Loss: 0.6292
Epoch 6 | Loss: 0.6233
Epoch 7 | Loss: 0.6172
Epoch 8 | Loss: 0.6110
Epoch 9 | Loss: 0.6048
Epoch 10 | Loss: 0.5988
Epoch 11 | Loss: 0.5931
Epoch 12 | Loss: 0.5877
Epoch 13 | Loss: 0.5826
Epoch 14 | Loss: 0.5777
Epoch 15 | Loss: 0.5728
Epoch 16 | Loss: 0.5680
Epoch 17 | Loss: 0.5633
Epoch 18 | Loss: 0.5589
Epoch 19 | Loss: 0.5549
Nodes:
[[0.09280719]
[0.04476126]
[0.03025987]
[0.13695013]
[0.34953291]
[0.26353402]
[0.26353402]
[0.26353402]
[0.22878334]
[0.47378198]
[0.34953291]
[0.54787342]
[0.44657832]
[0.22878334]
[0.47378198]
[0.47378198]
[0.41969087]
[0.44657832]
[0.47378198]
[0.40082739]
[0.47378198]
[0.44657832]
[0.47378198]
[0.21003225]
[0.32505647]
[0.32505647]
[0.47378198]
[0.28533633]
[0.37482885]
[0.28533633]
[0.28533633]
[0.16495933]
[0.01520448]
[0.83080503]], shape=(34, 1), dtype=float64)
I did not mention the edges, because I dont think that they are relevant for this issue and it would be too much information.
Hi! Actually sigmoid is a smooth continuous function, so it is expected that you get floating point numbers.
At train time you can use that floating value as the "mean" parameter of a Bernoulli distribution to maximize log likelihood (which is equivalent to what you are doing when maximizing the BinaryCrossentropy).
At evaluation time, you can either sample from the Bernoully distribution, or you can use the greedy a approach of just rounding to 0 or 1 e.g. sigmoid(output) > 0.5
will return True or False (and then you can cast to an integer to get 0 or 1).
Thanks a lot, then it was indeed a misunderstanding from my side. I thought sigmoid would act like a step function..,
This raises now another question:
I did change the value of "unlabeled" nodes to 0.5, because with -1 the nodes had an inclination to be classified with label 0.
I thought that the value of 0.5 could be interpreted as "not sure if loyal to Node 0 or Node 33", but the results are still staying near the value of 0.5, where sometimes Nodes close to Node 0 are above 0.5 and Nodes close to Node 33 below.
Shouldn't in the first message passing layer at least the neighbors be immediately loyal to the labeled Nodes?
My guess is that, since the other neighbors are also labeled with 0.5 that this affects their results.
But how can I fix that, if that's the case?
I really appreciate your help!