StarE encodes hyper-relational fact by first passing Qualifier pairs through a composition function and then summed and transformed by . The resulting vector is then merged via , and with the relation and object vector, respectively. Finally, node Q937 aggregates messages from this and other hyper-relational edges. Please refer to the paper for details.
- Python>=3.9
- PyTorch 2.1.1
- torch-geometric 2.4.0
- torch-scatter 2.1.2
- tqdm
- wandb
Create a new conda environment and execute setup.sh
.
Alternatively
pip install -r requirements.txt
The dataset can be found in data/clean/wd50k
.
Its derivatives can be found there as well:
wd50k_33
- approx 33% of statements have qualifierswd50k_66
- approx 66% of statements have qualifierswd50k_100
- 100% of statements have qualifiers
More information available in dataset README
Specified as MODEL_NAME
in the running script
stare_transformer
- main model StarE (H) + Transformer (H) [default]stare_stats_baseline
- baseline model Transformer (H)stare_trans_baseline
- baseline model Transformer (T)
Specified as DATASET
in the running script
jf17k
wikipeople
wd50k
[default]wd50k_33
wd50k_66
wd50k_100
It is advised to run experiments on a GPU otherwise training might take long.
Use DEVICE cuda
to turn on GPU support, default is cpu
.
Don't forget to specify CUDA_VISIBLE_DEVICES
before python
if you use cuda
Currently tested on cuda==12.1
Three parameters control triple/hyper-relational nature and max fact length:
STATEMENT_LEN
:-1
for hyper-relational [default],3
for triplesMAX_QPAIRS
: max fact length (3+2*quals), e.g.,15
denotes a fact with 5 qualifiers3+2*5=15
.15
is default forwd50k
datasets andjf17k
, set7
for wikipeople, set3
for triples (in combination withSTATEMENT_LEN 3
)SAMPLER_W_QUALIFIERS
:True
for hyper-relational models [default],False
for triple-based models only
The following scripts will train StarE (H) + Transformer (H) for 400 epochs and evaluate on the test set:
- StarE (H) + Transformer (H)
python run.py DATASET wd50k
- StarE (H) + Transformer (H) with a GPU.
CUDA_VISIBLE_DEVICES=0 python run.py DEVICE cuda DATASET wd50k
- You can adjust the dataset with a higher ratio of quals by changing
DATASET
with the available above names
python run.py DATASET wd50k_33
- On JF17K
python run.py DATASET jf17k CLEANED_DATASET False
- On WikiPeople
python run.py DATASET wikipeople CLEANED_DATASET False MAX_QPAIRS 7 EPOCHS 500
Triple-based models can be started with this basic set of params:
python run.py DATASET wd50k STATEMENT_LEN 3 MAX_QPAIRS 3 SAMPLER_W_QUALIFIERS False
More hyperparams are available in the CONFIG
dictionary in the run.py
.
If you want to adjust StarE encoder params prepend GCN_
to the params in the STAREARGS
dict, e.g.,
python run.py DATASET wd50k GCN_GCN_DIM 80 GCN_QUAL_AGGREGATE concat
will construct StarE with hidden dim of 80 and concat as gamma
function from the paper.
It's there out of the box! Create an account on WANDB Then, make sure you install the latest version of the package
pip install wandb
Locate your API_KEY in the user settings and activate it:
wandb login <api_key>
Then just use the CLI argument WANDB True
, it will:
- Create a
wikidata-embeddings
project in your active team - Create a run with a random name and log results there
@inproceedings{StarE,
title={Message Passing for Hyper-Relational Knowledge Graphs},
author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
booktitle={EMNLP},
year={2020}
}
For any further questions, please contact: mikhail.galkin@iais.fraunhofer.de