Rongzhi Dong, Nihang Fu, Jianjun Hu, Edirisuriya M. D. Siriwardane
Machine Learning and Evolution Laboratory
Department of Computer Science and Engineering
University of South Carolina
@article{dong2023matdiff,
title={Generative Design of inorganic compounds using deep diffusion language models},
author={Rongzhi Dong, Nihang Fu, Jianjun Hu, Edirisuriya M. D. Siriwardane},
journal={arXiv preprint arXiv:xxxxx},
year={2023}
}
- Install Diffusion-LM
conda install mpi4py
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -e improved-diffusion/
pip install -e transformers/
pip install spacy==3.2.4
pip install datasets==1.8.0
pip install huggingface_hub==0.4.0
pip install wandb
- Train Diffusion-LM:
cd code/Diffusion-LM/improved-diffusion
mkdir diffusion_models
sh run.sh
the trained model is saved in ./diffusion_models
- Sample from Diffusion-LM:
mkdir generation_outputs
sh decode.sh
The generation is saved in ./generation_outputs.
sq2formula.py
The sequences are then conver to formulas and the formula results are saved to formulas.csv
- Install Diffusion-BERT
conda create --name DB python=3.8
conda activate DB
pip install -r requirements.txt
- Train Diffusion-BERT:
cd code/Diffusion-BERT
python word_freq.py
to get the frequency in the text corpus
sh run.sh
for unconditional generation
- Sampling from Diffusion-BERT:
Pass the path to the checkpoint obtained during training to predict.py.
python predict.py
The generated sequences are saved to temp.txt
python sq2formula.py
The sequences are then conver to formulas and the formula results are saved to formulas.csv
Our work is based on two text-generation diffusion models including Diffusion-LM Improves Controllable Text Generation and DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models. Their source codes are made for text generation and can be found at https://github.com/XiangLi1999/Diffusion-LM and https://github.com/Hzfinfdu/Diffusion-BERT. We have included both here with modification for convenience of users.