This is the Pytorch implementation of TEMPO in the paper: [TEMPO: A Transformer-based Mutation Prediction Framework for SARS-CoV-2 Evolution].
- pytorch
- sklearn
This data file contains the original and preprossed protein sequence data for SARS-COV-2, H1N1, H3N2 and H5N1, which is necessary to run the code. Before running the code, data.zip shuold be downloaded separately, you can click here to download the data for convenience.
The files contained in data.zip
- Preprocessed data used to reproduce the paper, including SARS-COV-2, H1N1, H3N2 and H5N1 dataset.
- Phylogenetic tree data for SARS-COV-2, named as "tree.txt".
- COV-19 s-protein sequence data aligned by mafft, named as "spike_prot_processed.csv".
This is a supplementary data which is not necessary to run the code, while it could be helpful for others to understand our paper in more depth and to do further work based on it. The phylogenetic tree data for SARS-COV-2 can be found at here.
To run the code
- add the "data.zip" to the root directory of the project(at the same level as "training.py")
- decompress the data and you will get a folder named data.
unzip data.zip
- modify the dataset path defined in training.py(line 14 to line 31), corresponding to your data folder's path in your enviroment.
- train the model which the folllowing command:
python training.py > output.txt
The results are output for every 10 epochs of the training process. The following metrics will be recorded in output.txt file:
- T_loss: training loss of this epoch
- T_acc: training accuracy of this epoch
- T_pre: training precision of this epoch
- T_rec: training recall of this epoch
- T_fscore: training f1 score of this epoch
- T_mcc: training matthews correlation coefficient of this epoch
- V_loss: validation loss of this epoch
- V_acc: validation accuracy of this epoch
- V_pre: validation precision of this epoch
- V_rec: validation recall of this epoch
- V_fscore: validation f1 score of this epoch
- V_mcc: validation matthews correlation coefficient of this epoch
- BEST_V_loss: best validation loss of all iterations so far
- BEST_V_acc: best validation accuracy of all iterations so far
- BEST_V_pre: best validation precision of all iterations so far
- BEST_V_rec: best validation recall of this all iterations so far
- BEST_V_fscore: best validation f1 score of all iterations so far
- BEST_V_mcc: best validation matthews correlation coefficient of all iterations so far