/StarE

EMNLP 2020: Message Passing for Hyper-Relational Knowledge Graphs

Primary LanguagePythonMIT LicenseMIT

StarE

Message Passing for Hyper-Relational Knowledge Graph.

DOI

Overview of StarE ...

StarE encodes hyper-relational fact by first passing Qualifier pairs through a composition function and then summed and transformed by . The resulting vector is then merged via , and with the relation and object vector, respectively. Finally, node Q937 aggregates messages from this and other hyper-relational edges. Please refer to the paper for details.

Requirements

  • Python>=3.9
  • PyTorch 2.1.1
  • torch-geometric 2.4.0
  • torch-scatter 2.1.2
  • tqdm
  • wandb

Create a new conda environment and execute setup.sh. Alternatively

pip install -r requirements.txt

WD50K Dataset

The dataset can be found in data/clean/wd50k. Its derivatives can be found there as well:

  • wd50k_33 - approx 33% of statements have qualifiers
  • wd50k_66 - approx 66% of statements have qualifiers
  • wd50k_100 - 100% of statements have qualifiers

More information available in dataset README

Running Experiments

Available models

Specified as MODEL_NAME in the running script

  • stare_transformer - main model StarE (H) + Transformer (H) [default]
  • stare_stats_baseline - baseline model Transformer (H)
  • stare_trans_baseline - baseline model Transformer (T)

Datasets

Specified as DATASET in the running script

  • jf17k
  • wikipeople
  • wd50k [default]
  • wd50k_33
  • wd50k_66
  • wd50k_100

Starting training and evaluation

It is advised to run experiments on a GPU otherwise training might take long. Use DEVICE cuda to turn on GPU support, default is cpu. Don't forget to specify CUDA_VISIBLE_DEVICES before python if you use cuda Currently tested on cuda==12.1

Three parameters control triple/hyper-relational nature and max fact length:

  • STATEMENT_LEN: -1 for hyper-relational [default], 3 for triples
  • MAX_QPAIRS: max fact length (3+2*quals), e.g., 15 denotes a fact with 5 qualifiers 3+2*5=15. 15 is default for wd50k datasets and jf17k, set 7 for wikipeople, set 3 for triples (in combination with STATEMENT_LEN 3)
  • SAMPLER_W_QUALIFIERS: True for hyper-relational models [default], False for triple-based models only

The following scripts will train StarE (H) + Transformer (H) for 400 epochs and evaluate on the test set:

  • StarE (H) + Transformer (H)
python run.py DATASET wd50k
  • StarE (H) + Transformer (H) with a GPU.
CUDA_VISIBLE_DEVICES=0 python run.py DEVICE cuda DATASET wd50k
  • You can adjust the dataset with a higher ratio of quals by changing DATASET with the available above names
python run.py DATASET wd50k_33
  • On JF17K
python run.py DATASET jf17k CLEANED_DATASET False
  • On WikiPeople
python run.py DATASET wikipeople CLEANED_DATASET False MAX_QPAIRS 7 EPOCHS 500

Triple-based models can be started with this basic set of params:

python run.py DATASET wd50k STATEMENT_LEN 3 MAX_QPAIRS 3 SAMPLER_W_QUALIFIERS False

More hyperparams are available in the CONFIG dictionary in the run.py.

If you want to adjust StarE encoder params prepend GCN_ to the params in the STAREARGS dict, e.g.,

python run.py DATASET wd50k GCN_GCN_DIM 80 GCN_QUAL_AGGREGATE concat

will construct StarE with hidden dim of 80 and concat as gamma function from the paper.

Integration with Weights & Biases (WANDB)

It's there out of the box! Create an account on WANDB Then, make sure you install the latest version of the package

pip install wandb

Locate your API_KEY in the user settings and activate it:

wandb login <api_key>

Then just use the CLI argument WANDB True, it will:

  • Create a wikidata-embeddings project in your active team
  • Create a run with a random name and log results there

When using this codebase or dataset please cite:

@inproceedings{StarE,
  title={Message Passing for Hyper-Relational Knowledge Graphs},
  author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
  booktitle={EMNLP},
  year={2020}
}

For any further questions, please contact: mikhail.galkin@iais.fraunhofer.de