/gakg

GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.

Primary LanguagePythonApache License 2.0Apache-2.0

gakg-logo

GAKG: A Multimodal Geoscience Academic Knowledge Graph 😈

The SPARQL Query Endpoint of GAKG is gakg/sparql, with the Graph IRI: https://www.acekg.cn/.

Overview

GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data. To our knowledge, GAKG is currently the largest and most comprehensive geoscience academic knowledge graph, consisting more than 68 million triples. Figure 1 shows the overview of GAKG. And if you want to explore the entire GAKG, view https://gakg.acemap.info.

gakg-logo
Figure 1. Overview of Multimodal GeoScience Academic Knowledge Graph (GAKG)

In order to better serve the data mining and knowledge discovery communities, GAKG preserves several datasets and GAKG-oriented resources.

No. Resource Name Resource Type Link
1 GAKG Datasets data dump info, Google Drive, ftp
2 GA16K Knowledge Representation Benchmark info, /benchmarks/GA16K/, ftp
3 GPCN Geoscience Papers Citation Network info, ftp
4 GACN Geoscience Authors Cooperation Network info, ftp
5 GAKG SPARQL Endpoint Query Endpoint GAKG Snorql
6 Pipeline Supplements models and codes Google Drive

Community Detection

  • Statistics
dataset node edge num. lcc clustering triangle daglongest
GPCN 842,121 16,034,510 1,450 0.0699 38,789,469 176
GPCN-lwcc 838,219 16,031,892 1 0.0701 38,789,345 176
GACN 860,280 5,381,861 32,609 0.690 43,502,542 15
GACN-lc 752,718 5,231,507 1 0.700 43,332,307 15
  • Baselines
Baseline Source Code Paper
Louvain https://sites.google.com/site/findcommunities/ Fast unfolding of communities in large networks 1
Map Equation https://www.mapequation.org/code.html Maps of random walks on complex networks reveal community structure 2
LPA https://github.com/zhuo931077127/LPA-algorithm-Demo Near linear time algorithm to detect community structures in large-scale networks 3

The entire benchmarks also can be accessed from google drive

Knowledge Representation Learning

  • Statistics
Benchmark Entities Type Relations TypeS Triples
GA16K 10 16,363 151,662
WN18 18 40,943 141,442
FB15K 1,345 14,951 483,142
  • Baselines
Baseline Source Code Paper
RESCAL https://github.com/mnick/rescal.py A Three-Way Model for Collective Learning on Multi-Relational Data 4
TransE https://github.com/zqhead/TransE Translating Embeddings for Modeling Multi-relational Data 5
TransH https://github.com/zqhead/TransH Knowledge Graph Embedding by Translating on Hyperplanes 6
SimplE https://github.com/baharefatemi/SimplE SimplE Embedding for Link Prediction in Knowledge Graphs 7
RotatE https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space 8

Paper Knowledge Extraction

The pipeline of Paper Knowledge Extraction is in folder /code/paperknowledge/

  • BERT-based QA (/code/paperknowledge/bertQA)

    • run single QA, with parameters: train_batch_size,learning_rate
    python run_squad.py --do_predict
    • run batch QA, with parameters: start file, end file, and year
    CUDA_VISIBLE_DEVICES=1 python run_test.py 116 125 2015
    • /code/paperknowledeg/bertQA/model can be download via google drive
  • Paper Knowledge Entities Extraction (/code/paperknowledge/paperentity)

    • run
    python example.py
    • /code/paperknowledge/paperentity/dde and related files can be download via google drive

Citation

Our paper has been accepted as a resource paper at CIKM-2021 :

@inproceedings{deng2021gakg,
  title={GAKG: A Multimodal Geoscience Academic Knowledge Graph},
  author={Deng, Cheng and Jia, Yuting and Xu, Hui and Zhang, Chong and Tang, Jingyao and Fu, Luoyi and Zhang, Weinan and Zhang, Haisong and Wang, Xinbing and Zhou, Chenghu},
  booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  pages={4445--4454},
  year={2021}
}

System Introduction

We now introduce the GAKG platform: GAKG Knowledge Navigation System