A graph neural network (GNN) is a class of neural networks for processing data represented by graph data structures. They were popularized by their use in supervised learning on properties of various molecules.Since their inception, several variants of the simple message passing neural network (MPNN) framework have been proposed. Recently, Graph Neural Networks (GNNs) have gained increasing popularity in various domains, including social networks, knowledge graphs, life sciences and recommender systems. The power of GNNs in modeling the dependencies between nodes in a graph enables the breakthrough in the research area related to graph analysis. This repository contains various Graph Machine Learning projects solved using Deep Graph Neural Networks.
Node classification is the supervised task, at which the labels of the nodes are predicted by the network. First, a Deep Graph Neural Network outputs an embedding for each node. Subsequently, the embeddings are passed through a Multi Layer Perceptron (MLP) head, which predicts the node labels.
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words. Nodes represent documents and edges represent citation links.
The optimal model achieved 84% validation accuracy.
Graph classification is the supervised task, at which the label of the graph is predicted by the network. First, a Deep Graph Neural Network outputs an embedding for each node. Subsequently, the embeddings are pooled and passed through a Multi Layer Perceptron (MLP) head, which predicts the graph label.
The MUTAG dataset consists of 188 chemical compounds divided into two classes according to their mutagenic effect on a bacterium. The chemical data was obtained from http://cdb.ics.uci.edu and converted to graphs, where vertices represent atoms and edges represent chemical bonds. Explicit hydrogen atoms have been removed and vertices are labeled by atom type and edges by bond type (single, double, triple or aromatic).
The optimal model achieved 83.6% validation accuracy.
PROTEINS is a dataset of proteins that are classified as enzymes or non-enzymes. Nodes represent the amino acids and two nodes are connected by an edge if they are less than 6 Angstroms apart. It consists of 1113 graphs with 39.06 nodes and 72.82 edges per graph on average.
The optimal model achieved 74.1% validation accuracy.