This is a work by Luisa Werner, Nabil Layaïda, Pierre Genevès, Jérôme Euzenat, Damien Graux.
This repository contains our re-implementation of the experiments conducted with Knowledge Enhanced Neural Networks (KENN) on the Citeseer Dataset, including the re-implementation of the Experiments in PyTorch and PyTorch Geometric. We also extended the experiments to the datasets Cora and PubMed.
Name | Description | #nodes | #edges | #features | #Classes | Task |
CiteSeer | from Planetoid, Citation Network | 3,327 | 9,104 | 3,703 | 6 | Node classification |
Cora | from Planetoid, Citation Network | 2,708 | 10,556 | 1,433 | 7 | Node Classification |
PubMed | from Planetoid, Citation Network | 19,717 | 88,648 | 500 | 3 | Node Classification |
This repository contains to sub-directories that refer to the experiments conducted with the initial implementation and the re-implementation.
- The initial_implementation contains code from initial experiments here, extended to Cora and PubMed
- The re_implementation of the experiments.
The results of both approaches are stored in the respective /results
The implementation of KENN layers in PyTorch is used from here.
- In order to make sure that the right environment is used, the necessary Python packages and their versions are specified in
. We use Python 3.9. To install them go in the project directory and create a conda environment with the following packages.
pip install -r requirements.txt
The full list of packages in our environment including the dependencies is specified in system_packages.txt
- While the Citeseer dataset used by the initial implementation is already stored in the directory
, the datasets Cora and PubMed have to be loaded from PyTorch Geometric [Source] and preprocessed. Run the following command from the project directory to get PubMed and Cora.
- To run the initial experiments with the specified parameters in the paper on all three datasets, run the following command from the project directory:
cd initial_implementation
- To get an overview of the results of the initial implementation. It is important that all experiments are finished before.
cd initial_implementation
We use Weights and Biases (WandB) as experiment tracking tool. The experiments can be run WITHOUT or WITH the use of WandB.
- To run the experiments without WandB, run the following command.
cd re-implementation
python conf.json
(By default, "wandb_use" : false
is set in re-implementation/conf.json
- If you want to use weights and biases specify the following parameters in
"wandb_use" : true,
"wandb_label": "<your label>",
"wandb_project" : "<your project>",
"wandb_entity": "<your entity>"
Then use the following command to run the experiments:
cd re-implementation
python conf.json
Interprete the results To get an overview of the results of the re-implementation, run the following command
cd re_implementation
To compare the results of both approaches (comparison type 2 reproducibility), go to the project folder and run
In the file re-implementation/conf.json
, the hyperparameters and settings of the runs are configured and saved. By default, the conf.json file contains parameters mentioned in the paper. The last column indicates whether the parameter can be modified in this implementation and to which values it should be set.
Parameter | description | default | state |
adam_beta1 | Adam optimizer parameter | 0.9 | modifiable |
adam_beta2 | Adam optimizer parameter | 0.999 | modifiable |
adam_eps | Adam optimizer parameter | 1e-07 | modifiable |
bias init | Initialization of bias in NN | "zeroes" | |
binary_preactivation | constant value for activation of binary predicate | 500.0 | modifiable |
clause_weight | initialization of clause weight | 0.5 | modifiable |
dataset | dataset | CiteSeer_reproduce | modifiable: [Cora_reproduce, PubMed_reproduce, CiteSeer_reproduce] |
device | gpu device for gpu computing | 1 | modifiable |
dropout | dropout rate | 0.0 | modifiable (0.0, 1.0) |
epochs | number of epochs in training | 300 | modifiable (1, ...) |
es_enabled | early stopping activated flag | false | modifiable [true, false] |
es_min_delta | early stopping delta threshold | 0.001 | modifiable (0, ...) |
es_patience | early stopping patience | 10 | modifiable (0, ...) |
eval_steps | prints each n steps during training | 10 | modifiable (1, ...) |
hidden_channels | hidden layer dimension | 50 | modifiable (1, ...) |
loss function | loss function | categorical cross-entropy | |
lr | learning rate | 0.001 | modifiable (0.0, 1.0) |
min_weight | clause weight clipping minimum value | 0.0 | modifiable (..., max_weight) |
max_weight | clause weight clipping maximum value | 500.0 | modifiable (min_weight, ...) |
mode | training mode | "transductive" | |
model | standard or KENN_Standard (for base NN Standard) | Standard | modifiable: [Standard, KENN_Standard] |
num_kenn_layers | number of KENN layers | 3 | modifiable (0, ...) |
num_layers | number of layers of base NN | 3 | modifiable (1, ...) |
optimizer | optimizer | adam | |
runs | number of runs | 30 | modifiable (1, ...) |
seed | random seed | 0 | modifiable (0,...) |
training_dimension | training dimension | 0.1 | modifiable: [0.1, 0.25, 0.5, 0.75, 0.9] |
valid_dim | validation set dimension | 0.2 | modifiable in: (0.0, 1.0) |
wandb_use | if weights and biases should be used | false | modifiable: [true, false] |
wandb_label | label for weights and biases | "None" | modifiable depending on custom WandB settings |
wandb_project | project name for weights and biases | "None" | modifiable depending on custom WandB settings |
wandb_entity | entity name for weights and biases | "None" | modifiable depending on custom WandB settings |
weight init | initialization of weights in NN | xavier uniform |
This work can be cited as follows:
@article{Werner_Layaïda_Genevès_Euzenat_Graux_2024, title={Reproduce, Replicate, Reevaluate. The Long but Safe Way to Extend Machine Learning Methods}, volume={38}, url={}, DOI={10.1609/aaai.v38i14.29515}, abstractNote={Reproducibility is a desirable property of scientific research. On the one hand, it increases confidence in results. On the other hand, reproducible results can be extended on a solid basis. In rapidly developing fields such as machine learning, the latter is particularly important to ensure the reliability of research. In this paper, we present a systematic approach to reproducing (using the available implementation), replicating (using an alternative implementation) and reevaluating (using different datasets) state-of-the-art experiments. This approach enables the early detection and correction of deficiencies and thus the development of more robust and transparent machine learning methods. We detail the independent reproduction, replication, and reevaluation of the initially published experiments with a method that we want to extend. For each step, we identify issues and draw lessons learned. We further discuss solutions that have proven effective in overcoming the encountered problems. This work can serve as a guide for further reproducibility studies and generally improve reproducibility in machine learning.}, number={14}, journal={Proceedings of the AAAI Conference on Artificial Intelligence}, author={Werner, Luisa and Layaïda, Nabil and Genevès, Pierre and Euzenat, Jérôme and Graux, Damien}, year={2024}, month={Mar.}, pages={15850-15858} }
The works of KENN can be cited as follows:
# Knowledge Enhanced Neural Networks for Relational Domains
author="Daniele, Alessandro
and Serafini, Luciano",
editor="Dovier, Agostino
and Montanari, Angelo
and Orlandini, Andrea",
title="Knowledge Enhanced Neural Networks for Relational Domains",
booktitle="AIxIA 2022 -- Advances in Artificial Intelligence",
publisher="Springer International Publishing",
# Knowledge Enhanced Neural Networks
author="Daniele, Alessandro
and Serafini, Luciano",
editor="Nayak, Abhaya C.
and Sharma, Alok",
title="Knowledge Enhanced Neural Networks",
booktitle="PRICAI 2019: Trends in Artificial Intelligence",
publisher="Springer International Publishing",