The SPARQL Query Endpoint of GAKG is gakg/sparql, with the Graph IRI: https://www.acekg.cn/
.
GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data. To our knowledge, GAKG is currently the largest and most comprehensive geoscience academic knowledge graph, consisting more than 68 million triples. Figure 1 shows the overview of GAKG. And if you want to explore the entire GAKG, view https://gakg.acemap.info.
In order to better serve the data mining and knowledge discovery communities, GAKG preserves several datasets and GAKG-oriented resources.
No. | Resource Name | Resource Type | Link |
---|---|---|---|
1 | GAKG Datasets | data dump | info, Google Drive, ftp |
2 | GA16K | Knowledge Representation Benchmark | info, /benchmarks/GA16K/ , ftp |
3 | GPCN | Geoscience Papers Citation Network | info, ftp |
4 | GACN | Geoscience Authors Cooperation Network | info, ftp |
5 | GAKG SPARQL Endpoint | Query Endpoint | GAKG Snorql |
6 | Pipeline Supplements | models and codes | Google Drive |
- Statistics
dataset | node | edge | num. lcc | clustering | triangle | daglongest |
---|---|---|---|---|---|---|
GPCN | 842,121 | 16,034,510 | 1,450 | 0.0699 | 38,789,469 | 176 |
GPCN-lwcc | 838,219 | 16,031,892 | 1 | 0.0701 | 38,789,345 | 176 |
GACN | 860,280 | 5,381,861 | 32,609 | 0.690 | 43,502,542 | 15 |
GACN-lc | 752,718 | 5,231,507 | 1 | 0.700 | 43,332,307 | 15 |
- Baselines
Baseline | Source Code | Paper |
---|---|---|
Louvain | https://sites.google.com/site/findcommunities/ | Fast unfolding of communities in large networks 1 |
Map Equation | https://www.mapequation.org/code.html | Maps of random walks on complex networks reveal community structure 2 |
LPA | https://github.com/zhuo931077127/LPA-algorithm-Demo | Near linear time algorithm to detect community structures in large-scale networks 3 |
The entire benchmarks also can be accessed from google drive
- Statistics
Benchmark | Entities Type | Relations TypeS | Triples |
---|---|---|---|
GA16K | 10 | 16,363 | 151,662 |
WN18 | 18 | 40,943 | 141,442 |
FB15K | 1,345 | 14,951 | 483,142 |
- Baselines
Baseline | Source Code | Paper |
---|---|---|
RESCAL | https://github.com/mnick/rescal.py | A Three-Way Model for Collective Learning on Multi-Relational Data 4 |
TransE | https://github.com/zqhead/TransE | Translating Embeddings for Modeling Multi-relational Data 5 |
TransH | https://github.com/zqhead/TransH | Knowledge Graph Embedding by Translating on Hyperplanes 6 |
SimplE | https://github.com/baharefatemi/SimplE | SimplE Embedding for Link Prediction in Knowledge Graphs 7 |
RotatE | https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding | RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space 8 |
The pipeline of Paper Knowledge Extraction is in folder /code/paperknowledge/
-
BERT-based QA (
/code/paperknowledge/bertQA
)- run single QA, with parameters: train_batch_size,learning_rate
python run_squad.py --do_predict
- run batch QA, with parameters: start file, end file, and year
CUDA_VISIBLE_DEVICES=1 python run_test.py 116 125 2015
/code/paperknowledeg/bertQA/model
can be download via google drive
-
Paper Knowledge Entities Extraction (
/code/paperknowledge/paperentity
)- run
python example.py
/code/paperknowledge/paperentity/dde
and related files can be download via google drive
Our paper has been accepted as a resource paper at CIKM-2021 :
@inproceedings{deng2021gakg,
title={GAKG: A Multimodal Geoscience Academic Knowledge Graph},
author={Deng, Cheng and Jia, Yuting and Xu, Hui and Zhang, Chong and Tang, Jingyao and Fu, Luoyi and Zhang, Weinan and Zhang, Haisong and Wang, Xinbing and Zhou, Chenghu},
booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
pages={4445--4454},
year={2021}
}