This repository is the implementation codes for the model QLogicE
. This model joints the translation embedding TransE
and the quantum logic E2R
for the sake of capturing more features to improve the expressiveness of the knowledge graph completion task.
Firstly, the code can be cloned from the github repository by following command:
git clone https://github.com/gzupanda/QLogicE.git
The code mainly depends on PyTorch 1.1.0
or later versions and Python 3
which is modified from the baseline model E2R. In this code, there are seven directories corresponding to each dataset.
The datasets used in our experiments are as follows.
Dataset | Entity | Relation | Train | Valid | Test | Triples |
---|---|---|---|---|---|---|
Kinship | 104 | 26 | 8,544 | 1,068 | 1,074 | 10,686 |
UMLS | 135 | 49 | 5,216 | 652 | 661 | 6,529 |
FB15k | 14,951 | 1,345 | 483,142 | 50,000 | 59,071 | 592,213 |
WN18 | 40,943 | 18 | 141,442 | 5,000 | 5,000 | 151,442 |
FB15k237 | 14,505 | 237 | 272,115 | 17,535 | 20,466 | 310,116 |
WN18RR | 40,943 | 11 | 86,835 | 3,034 | 3,134 | 93,003 |
YAGO3-10 | 123,182 | 37 | 1,079,040 | 5,000 | 5,000 | 1,089,040 |
In other words, every directory is an independent one with the data such as training, validing testing and trained model data. Besides, there are codes including data parser, training, testing, evaluating, etc. To better understand how to quick start to run the experiments, we take the dataset UMLS
for example.
There are seven directories in this repository corresponding the seven datasets. They are devided as three groups, . Fortunately, the running is simple. For every dataset, we just run the following steps and take the dataset umls
for example:
In every directory, there are eight python files and two directories. We can go into the objective directory to run the right codes. In this case, it is the directory unls
.
cd QLogicE
cd umls
When we finish the above step, we can run the following command to training the dataset. And the trained model is saved in the directory save. In this case, it is the dataset UMLS
.
python Train.py
In this step, the code is used for test the training results. We only need to run the followinng command.
python Test.py
Model | FB15k | FB15k | FB15k | WN18 | WN18 | WN18 | FB15k237 | FB15k237 | FB15k237 | WN18RR | WN18RR | WN18RR | YAGO3-10 | YAGO3-10 | YAGO3-10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | |
TransE | 0.628 | 49.36 | 84.73 | 0.646 | 40.56 | 94.87 | 0.310 | 21.72 | 49.65 | 0.206 | 2.79 | 49.52 | 0.501 | 40.57 | 67.39 |
E2R | 0.964 | 96.40 | 96.40 | 0.710 | 71.10 | 71.10 | 0.584 | 58.40 | 58.40 | 0.477 | 47.70 | 47.70 | 0.830 | 83.00 | 83.00 |
QLogicE | 0.969 | 96.93 | 96.93 | 0.927 | 92.70 | 92.70 | 0.949 | 94.89 | 94.89 | 0.928 | 92.79 | 92.79 | 0.937 | 93.74 | 93.74 |
In this model, the proposed model achieves outstanding results, especially on the challenging datasets such as FB15k237
, WN18RR
and YAGO3-10
.
Model | FB15k | FB15k | FB15k | WN18 | WN18 | WN18 | FB15k237 | FB15k237 | FB15k237 | WN18RR | WN18RR | WN18RR | YAGO3-10 | YAGO3-10 | YAGO3-10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | |
ConvE | 0.688 | 59.46 | 84.94 | 0.945 | 93.89 | 95.68 | 0.305 | 21.90 | 47.62 | 0.427 | 38.99 | 50.75 | 0.488 | 39.93 | 65.75 |
ConvKB | 0.211 | 11.44 | 40.83 | 0.709 | 52.89 | 94.89 | 0.230 | 13.98 | 41.46 | 0.249 | 5.63 | 52.50 | 0.420 | 32.16 | 60.47 |
ConvR | 0.773 | 70.57 | 88.55 | 0.950 | 94.56$ | 95.85 | 0.346 | 25.56 | 52.63 | 0.467 | 43.73 | 52.68 | 0.527 | 44.62 | 67.33 |
CapsE | 0.087 | 1.934 | 21.78 | 0.890 | 84.55 | 95.08 | 0.160 | 7.34 | 35.60 | 0.415 | 33.69 | 55.98 | 0.000 | 0.00 | 0.00 |
RSN | 0.777 | 72.34 | 87.01 | 0.928 | 91.23 | 95.10 | 0.280 | 19.84 | 44.44 | 0.395 | 34.59 | 48.34 | 0.511 | 42.65 | 66.43 |
TransE | 0.628 | 49.36 | 84.73 | 0.646 | 40.56 | 94.87 | 0.310 | 21.72 | 49.65 | 0.206 | 2.79 | 49.52 | 0.501 | 40.57 | 67.39 |
STransE | 0.543 | 39.77 | 79.60 | 0.656 | 43.12 | 93.45 | 0.315 | 22.48 | 49.56 | 0.226 | 10.13 | 42.21 | 0.049 | 3.28 | 7.35 |
CrossE | 0.702 | 60.08 | 86.23 | 0.834 | 73.28 | 95.03 | 0.298 | 21.21 | 47.05 | 0.405 | 38.07 | 44.99 | 0.446 | 33.09 | 65.45 |
TorusE | 0.746 | 68.85 | 83.98 | 0.947 | 94.33 | 95.44 | 0.281 | 19.62 | 44.71 | 0.463 | 42.68 | 53.35 | 0.342 | 27.43 | 47.44 |
RotatE | 0.791 | 73.93 | 88.10 | 0.949 | 94.43 | 96.02 | 0.336 | 23.83 | 53.06 | 0.475 | 42.60 | 57.35 | 0.498 | 40.52 | 67.07 |
DistMult | 0.784 | 73.68 | 86.32 | 0.824 | 72.60 | 94.61 | 0.313 | 22.44 | 49.01 | 0.433 | 39.68 | 50.22 | 0.501 | 41.26 | 66.12 |
ComplEx | 0.848 | 81.56 | 90.53 | 0.949 | 94.53 | 95.50 | 0.349 | 25.72 | 52.97 | 0.458 | 42.55 | 52.12 | 0.576 | 50.48 | 70.35 |
Analogy | 0.726 | 65.59 | 83.74 | 0.934 | 92.61 | 94.42 | 0.202 | 12.59 | 35.38 | 0.366 | 35.82 | 38.00 | 0.283 | 19.21 | 45.65 |
SimplE | 0.726 | 66.13 | 83.63 | 0.938 | 93.25 | 94.58 | 0.179 | 10.03 | 34.35 | 0.398 | 38.27 | 42.65 | 0.453 | 35.76 | 63.16 |
HolE | 0.800 | 75.85 | 86.78 | 0.938 | 93.11 | 94.94 | 0.303 | 21.37 | 47.64 | 0.432 | 40.28 | 48.79 | 0.502 | 41.84 | 65.19 |
TuckER | 0.788 | 72.89 | 88.88 | 0.951 | 94.64 | 95.80 | 0.352 | 25.90 | 53.61 | 0.459 | 42.95 | 51.40 | 0.544 | 46.56 | 68.09 |
QLogicE | 0.969 | 96.93 | 96.93 | 0.914 | 91.42 | 91.42 | 0.949 | 94.89 | 94.89 | 0.928 | 92.79 | 92.79 | 0.937 | 93.74 | 93.74 |
From this table, we can see that the proposed model QLogicE
achieves the state-of-the-art over the existing models. Notably, the performance on dateset WN18
is not as promsing as the models achieved, though the model achieve the outstanding performance on other datasets such as FB15k
, FB15k237
and WN18RR
. Fortunately, these performance results are better than the baseline model E2R
with large margin on dataset WN18
. The ablation performance results show this point.
From this table, it can be seen that the proposed model significant better than the baselines model on datasets except WN18
.
Inspired by the surprising performance of the model QLogicE
, we coformulate a dense feature model framework from the perspective of information theory to guide the further knowledge graph completion model design. The key conception of it is expressive density, it is the criterion of dense feature model and we compute out the critical value for this end. It also show that how much room to be improved for a model on a specific dataset.
In the directories of datasets FB15k237
, WN18RR
and YAGO3-10
, there are text file named triples
in corresponding directories. This file including the all triples from training set, validing set and testing set. This is because our model runs under the assumption of close world, the entity and relation out of knowledge graph are incapable of link predicting.
Since the limitation of github, we have to compress the files more than 25MB into .zip
file to upload them to the site. As a result, when run these model, you have to uncompress them.
This software comes under a non-commercial use license, please see the LICENSE file.