The SPARQL Query Endpoint of GAKG is gakg/sparql, with the Graph IRI: https://www.acekg.cn/
.
GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data. To our knowledge, GAKG is currently the largest and most comprehensive geoscience academic knowledge graph, consisting more than 68 million triples. Figure 1 shows the overview of GAKG. And if you want to explore the entire GAKG, view https://gakg.acemap.info.
In order to better serve the data mining and knowledge discovery communities, GAKG preserves several datasets and GAKG-oriented resources.
No. | Resource Name | Resource Type | Link |
---|---|---|---|
1 | GAKG Datasets | data dump | info, Google Drive, ftp |
2 | GA16K | Knowledge Representation Benchmark | info, /benchmarks/GA16K/ , ftp |
3 | GPCN | Geoscience Papers Citation Network | info, ftp |
4 | GACN | Geoscience Authors Cooperation Network | info, ftp |
5 | GAKG SPARQL Endpoint | Query Endpoint | GAKG Snorql |
6 | Pipeline Supplements | models and codes | Google Drive |
- Statistics
dataset | node | edge | num. lcc | clustering | triangle | daglongest |
---|---|---|---|---|---|---|
GPCN | 842,121 | 16,034,510 | 1,450 | 0.0699 | 38,789,469 | 176 |
GPCN-lwcc | 838,219 | 16,031,892 | 1 | 0.0701 | 38,789,345 | 176 |
GACN | 860,280 | 5,381,861 | 32,609 | 0.690 | 43,502,542 | 15 |
GACN-lc | 752,718 | 5,231,507 | 1 | 0.700 | 43,332,307 | 15 |
- Baselines
Baseline | Source Code | Paper |
---|---|---|
Louvain | https://sites.google.com/site/findcommunities/ | Fast unfolding of communities in large networks 1 |
Map Equation | https://www.mapequation.org/code.html | Maps of random walks on complex networks reveal community structure 2 |
LPA | https://github.com/zhuo931077127/LPA-algorithm-Demo | Near linear time algorithm to detect community structures in large-scale networks 3 |
The entire benchmarks also can be accessed from google drive
- Statistics
Benchmark | Entities Type | Relations TypeS | Triples |
---|---|---|---|
GA16K | 10 | 16,363 | 151,662 |
WN18 | 18 | 40,943 | 141,442 |
FB15K | 1,345 | 14,951 | 483,142 |
- Baselines
Baseline | Source Code | Paper |
---|---|---|
RESCAL | https://github.com/mnick/rescal.py | A Three-Way Model for Collective Learning on Multi-Relational Data 4 |
TransE | https://github.com/zqhead/TransE | Translating Embeddings for Modeling Multi-relational Data 5 |
TransH | https://github.com/zqhead/TransH | Knowledge Graph Embedding by Translating on Hyperplanes 6 |
SimplE | https://github.com/baharefatemi/SimplE | SimplE Embedding for Link Prediction in Knowledge Graphs 7 |
RotatE | https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding | RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space 8 |
The pipeline of Paper Knowledge Extraction is in folder /code/paperknowledge/
-
BERT-based QA (
/code/paperknowledge/bertQA
)- run single QA, with parameters: train_batch_size,learning_rate
python run_squad.py --do_predict
- run batch QA, with parameters: start file, end file, and year
CUDA_VISIBLE_DEVICES=1 python run_test.py 116 125 2015
/code/paperknowledeg/bertQA/model
can be download via google drive
-
Paper Knowledge Entities Extraction (
/code/paperknowledge/paperentity
)- run
python example.py
/code/paperknowledge/paperentity/dde
and related files can be download via google drive
We now have a paper under review on CIKM-Resource Track:
@inproceedings{GAKG,
title = "GAKG: A Multimodal Geoscience Academic Knowledge Graph",
author = "Cheng Deng, Yuting Jia, Chong Zhang, Jingyao Tang, Hui Xu, Luoyi Fu, Weinan Zhang, Haisong Zhang, Xinbing Wang, Chenghu Zhou",
}
- Coming soon: Python Package gakg