Master Thesis- Structural Literal Representation

By: Janneke van Baden

This GitHub repository contains files to run Graph Neural Network models with various structural literal representations. Graph data must be given in either .nt or .nt.gz format.

Set-up:

The code for this research has been run using Python 3.9. A requirements file is provided; these packages should be installed to run these experiments. Moreover, a data folder needs to be added to this directory, or the path files have to be changed in the code. For instructions on how to create this data folder, refer to the "Creating a 'data' folder" section below.

Creating a 'data' folder:

The data folder has to be in the main directory, and contain four folders: 'aifb', 'mutag', 'dmg777k', and 'synth'. The data for 'aifb', 'mutag', and 'dmg777k' can be retrieved from: https://gitlab.com/wxwilcke/mmkg. This has .gz files containing the triples, training, validation, and test set. The folders called 'aifb', 'mutag', and 'dmg777k' should all contain a folder named 'gz_files', containing these .gz files. For the full filepaths, the run_experiment.py file can be consulted.

The 'synth' dataset is created using repository: https://gitlab.com/wxwilcke/graphsynth. Using this package, it is generated by running the following command:

generate.py -n 75000 -r 10 -t 10000 -c 4

This results in a dataset with 75.000 nodes, 10 relation types, 10.000 target nodes, and 4 classes. To the 'data' folder, a folder called 'synth' has to be added, containing four .gz files: 'context.nt.gz', 'train.nt.gz', 'valid.nt.gz', and 'test.nt.gz'.

Running the code:

A single experiment can be run using the following code:

python run_experiment.py

Which and how parameters can be set, can be seen using:

python run_experiment.py -h

Alternatively, if the user wants to run 10 experiments with the same configuration in a row, with random seed from 1 to 10, use:

python run_final_experiments_loop.py

Or, if the user wants to do the same, but with inserted inverted relations, use:

python run_final_experiments_loop_transposed.py

The paths currently implemented are for the four different datasets, as used in this thesis. They are: 'aifb', 'mutag', 'dmg777k', and 'synth'. Note that, before this is done, either the paths need to be changed to where the user stored the files, or the user needs to create a 'data' folder, as specified earlier. The mapping techniques can be set with 'filtered', 'collapsed', 'all-to-one', and 'separate'.

About the code:

The code itself is contained in the main directory. There are different files for all models, containing their classes. These classes are used to create the models. Moreover, code to create the adjacency matrices with different layouts are given, as well as code to run the models.

The directories are structured as follows:

  • notebooks contains notebooks for the data analysis. Plots the results of the experiments. Note: the Cochran Q test used in these notebooks utelizes file 'modcomp.py' from https://gitlab.com/wxwilcke/mlstats. In case you want to run this test, make sure to add the 'modcomp.py' file to the notebooks folder.
  • results is created while running experiments, and contains the files of the experiment results in .csv format, as well as some plots used for preliminary analysis.
  • plots is created during the running of the notebooks, and contains the plots of the data analysis.