/graph-dive

Primary LanguageJupyter Notebook

image.png πŸ“• Predict a publication trend of AI journals / conferences using GNNs
Baseline paper: Structured Citation Trend Prediction Using Graph Neural Network

Members

πŸ‘‘μ°¨μ§€μˆ˜
μœ€μˆ˜μ§„
μ‘°ν˜„μš°
μ§„ν˜„λΉˆ
λ°•μˆ˜λΉˆ
κΉ€μ‚°
κΉ€λ―Όμ„œ

Requirements

Verisions (Recommended)

  • Python 3.7.x
  • Pytorch 1.12.1+cu113
  • Torch_geometric 2.1.0

Docker

We recommend using our Dockerfile to get started easily

## build docker image
$ docker build -t graph-dive:latest . 

## execute docker container
$ docker run --name graph-dive --ipc=host -it -v <working_dir>:/workspace -w /workspace graph-dive:latest /bin/bash

Model

We follow the architecture of baseline paper which is based on GATs and GCNs.
[Training stage] train_figure

[Prediction stage] test_figure

Dataset

MAG(Microsoft Academic Graph)

We use author, affiliation, the number of citation, title and abstract of paper, year as raw inputs. MAG schema Please check this webpage for more information.

Data directory tree

Directory tree including data should be as follows:

β”œβ”€graph-dive/
└─data/
	β”œβ”€ affiliationembedding.csv
	β”œβ”€ edge_data/
	β”‚   β”œβ”€ 1158167855_refs.csv #{CVPR_conference_id}_refs.csv
	β”‚   β”œβ”€ 1184914352_refs.csv #{AAAI_conference_id}_refs.csv
	β”‚   └─ ...
	β”œβ”€ year_data/
	β”‚   β”œβ”€ 1158167855.csv #{CVPR_conference_id}.csv
	β”‚   β”œβ”€ 1184914352.csv #{AAAI_conference_id}.csv 
	β”‚   └─ ...
	β”œβ”€ json_1158167855/ # CVPR
	β”‚   β”œβ”€ {paper_id1}.json
	β”‚   β”œβ”€ {paper_id2}.json
	β”‚   └─ ...
	β”œβ”€ json_1184914352/ # AAAI
	β”‚   └─ ...
	...

For each journal/conference, conference IDs are look like:

Conference Conference ID # of nodes
ICML 1180662882 8653
ICASSP 1121227772 16997
NeurIPS 1127325140 8113
AAAI 1184914352 13766
EMNLP 1192655580 5667
CVPR 1158167855 13058
ICDM 1183478919 4169
CIKM 1194094125 4201

Run

Command examples

# CVPR
$ bash scripts/run_CVPR.sh

# ICASSP
$ bash scripts/run_ICASSP.sh

Note that the number of valid data are smaller than the values stated above due to insufficient sources(OpenAlex API, MAG dataset, etc..)

πŸ“ SKILLS

Frameworks: