pre-training BERT with molecular data

Primary LanguagePython

A Merged Molecular Representation Learning

Pre-training BERT with molecular data for learning merged molecular representation.

Demo Video: https://youtu.be/efDLkbdXvRg

Usage for pre-training

Make a directory for saving models.

After preparing data with SMILES format (e.g. ZINC) and modifying SmilesDataset in data_utils.py, run the pretraining.py or double_pretraining.py.

python double_pretraining.py --path {dataset_path} --save_path {model_path} --adjacency True --batch 256 --epoch 20 --seq 256 --layers 6 --nhead 16 --seed 7

Fine-tuning with a pre-trained model

Download the pre-trained model using git lfs pull or here.

For using the fine-tuning service, prepare an input and output directory.

A dataset with CSV format and input.json should be located in the input directory.

Example input file for the fine-tuning service

input.json - {"split-ratio":0.8, "task":classification, "time":30}

dataset.csv - 2 columns (smiles, label)