/ChromeGCN

Bioinformatics 2020: Graph Neural Networks for DNA Sequence Classification

Primary LanguagePython

ChromeGCN: Graph Neural Networks for Genomics
Jack Lanchantin and Yanjun Qi
Bioinformatics 2020
[paper] [slides] [poster]

This repository contains a PyTorch implementation of ChromeGCN from Graph Convolutional Networks for Epigenetic State Prediction Using Both Sequence and 3D Genome Data (Lanchantin and Qi 2019)

Get the data

Download the raw and processed data using the following commands (13GB zipped, 90GB unzipped):

wget http://chromegcn.s3.amazonaws.com/processed_data.tar.gz
mkdir data/processed_data/
tar -xvf processed_data.tar.gz -C data/processed_data/

(optional) If you want to process the raw data, download it using the commands below and follow the instructions in data/README.md

wget http://chromegcn.s3.amazonaws.com/data.tar.gz
mkdir data/orig_data/
tar -xvf data.tar.gz -C data/orig_data/

Pretrain the Independent Window Model (CNN)

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py  -batch_size 64 -d_model 128 -epochs 100 -dropout 0.2  -lr 0.25 -window_model 'expecto' -optim 'sgd' -cell_type 'GM12878' -pretrain -shuffle_train -dataroot './data/processed_data/' -results_dir './results/'

Save features from best epoch

(Use same flags as pretraining the CNN model)

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py -batch_size 64 -d_model 128 -epochs 100 -dropout 0.2  -lr 0.25 -window_model 'expecto' -optim 'sgd' -cell_type 'GM12878' -save_feats -dataroot './data/processed_data/' -results_dir './results/' 

Train the 3D Genome Chromosome Model (GCN)

CUDA_VISIBLE_DEVICES=0 python main.py -batch_size 64 -d_model 128 -epochs 1000 -dropout 0.2  -window_model 'expecto' -chrome_model 'gcn' -optim 'sgd' -lr 0.25 -load_pretrained -lr2 0.25 -optim2 'sgd' -chrome_model 'gcn' -gate -gcn_layers 2 -adj_type 'hic' -hicnorm 'SQRTVC' -cell_type 'GM12878' -overwrite -hicsize 500000 -dataroot './data/processed_data/' -results_dir './results/'
@article{lanchantin2019graph,
  title={Graph Convolutional Networks for Epigenetic State Prediction Using Both Sequence and 3D Genome Data},
  author={Lanchantin, Jack and Qi, Yanjun},
  journal={BioRxiv},
  pages={840173},
  year={2019},
  publisher={Cold Spring Harbor Laboratory}
}