DNA-Deep Learning simulation study

Repository structure

This repository is organized in the following manner:

Directory Description
simulation-script All msprime related scripts that generate the sequences and scripts on antibiotic resistence
dataset All dataset (sequence and label) should be saved in this directory, each sequence set and its corresponding label set should have a unique name, in the form of sequence-NAME.in and label-NAME.in. Currently includes a script that generate fake data and label. This directory will be removed for the finalized repository, it's here just to show our pipeline structure
nn-model-script The only script that runs all NN models. The difference between different analysis will occurs in arguments and JSON files. We will use the same model script for all our analysis to make our life easier.
model-spec Each model (RNN,CNN,Transformer...) will have a .json file with all its pre-defined parameter in it. Each json file will include the same number of entries(RNN model will have entries such as filterSize and strideSize). If the model does not have a certain parameter, we leave the value empty in the json file (For RNN, we will have "filerSize": "").
output Save all output file and scripts for plotting our model performance here. The output file should have the same name as its input sequence/label. The format should be NAME.out. The NAME.out files will be deleted for the finalized repository, and only visualization scripts will be kept.