The splits provided in mpqe are very tailored to that implementation. This script takes that data (the three sets AM, AIFB, MUTAG) and converts it into a more accessible form.
The generated files, for each of these datasets are:
The triples
test.nt
: the test triples in n-triples formattrain.nt
: the training triples in n-triples formatvalid.nt
: the validation triples in n-triples format
Mappings to unique IDs
entoid
: a mapping from each entity used in any of the three files above to a unique IDreltoid
: a mapping from each relation used in any of the three files above to a unique ID
The types of entities
entity_url_typing.txt
: a type label for each entity occurring in the files aboveentity_id_typing.txt
: a type label for each entity occurring in the files above, indexed by the ID fromentiod
These datasets are also in this repository ./data/compressed_triple_splits.zip
To run things yourself, get the data from mpqe and extract them in the folder called data
Then, install the dependencies using
pip install -e .
Then run
mpqe-clean