- A benchmark for Spatial Knowledge Graph Completion (SKGC) extracted from DBpedia.
- It can be used to evaluate Spatial Knowledge Graph Embedding or Completion methods.
- The S-DBpedia baseline dataset contains two types of datasets.
- Data scale: S-DBpedia_small, S-DBpedia_medium, S-DBpedia_large, S-DBpedia.
- Data Sparsity: S-DBpedia_GT5E, S-DBpedia_GT10E, S-DBpedia_GT20E, S-DBpedia_GT50E.
Dataset Statstics and Analysis
Dataset | Relation | Entity | Training Set | Validation Set | Test Set |
---|---|---|---|---|---|
S-DBpedia_small | 174 | 241,167 | 227,213 | 5,945 | 5,976 |
S-DBpedia_medium | 182 | 412,896 | 454,426 | 21,329 | 21,149 |
S-DBpedia_large | 189 | 648,866 | 908,853 | 69,602 | 69,096 |
S-DBpedia | 189 | 878,779 | 1,817,706 | 194,560 | 194,869 |
S-DBpedia_GT5E | 131 | 44,127 | 128,258 | 15,296 | 15,268 |
S-DBpedia_GT10E | 166 | 90,487 | 266,529 | 32,316 | 32,301 |
S-DBpedia_GT20E | 173 | 183,659 | 541,295 | 66,295 | 66,283 |
S-DBpedia_GT50E | 182 | 461,140 | 1,219,286 | 147,897 | 148,128 |
- Data scale: S-DBpedia_small, S-DBpedia_medium, S-DBpedia_large, S-DBpedia.
- Data Sparsity: S-DBpedia_GT5E, S-DBpedia_GT10E, S-DBpedia_GT20E, S-DBpedia_GT50E.
Attribute information
We extracted all attributes of entities in the dataset from DBpedia. It contains text, numerical, and image information.
The full dataset with attribute information is available in Zenodo.
The data in zenodo includes the dataset file (dataset name) and all attribute files of the entity (Attribute.tar.gz)
Assessment of the benchmark quality
Detailed instructions for data quality assessment can be viewed in Baseline_Quality_Assessment_Instructions.md.
DataProcessCode/
: This folder contains the detailed process of creating S-DBpedia.Code/
: This folder contains the experimental code for knowledge graph completion in the paper and detailed real use cases.Additional-Information/
: This folder contains detailed information about the data processing part of the paper.Quality_Assessment_code/
: This folder contains code used to evaluate benchmark quality.
dataset available: https://doi.org/10.5281/zenodo.7431612
We conduct link prediction experiments on all datasets using five models: TransE, DistMult, ConvE, TransE-GDR and SSLP. We have provided detailed experimental use cases.
The experimental code and related instructions can be viewed at README under Code/
. The configurations for each of the models are given in the Code/
diectory.
Additional details about the dataset can be found in another README.