/SimEvoNN

Neural Network modelling for Fisher - Wright Simulation of Evolution

Primary LanguageJupyter Notebook

This project is an implementation of Neural Networks for effective population size and mutation rate parameters estimation of Wright-Fisher simulator method.

Installation

There are two parts of the project:

[1] - The WF simulator, which is bulky and requires Python3.7 environment. This environment contains ELFI, PhyloDeep and Pypy packages. Simply install by

either

pip install -r requirements.txt

or

conda env create -f simulator_env.yml

[2] - The Neural Network, which requires PyToch and Python3.10 environment

either

pip install -r ./nn_model/requirements.txt

or

conda env create -f nn_env.yml

Usage

Simulator

The scripts listed here can be used in the 'simulator_env'. Each used . The data is generated by running the simulator with different parameters.

1. run_simulator.py

This script is used to run the Wright-Fisher simulator with different parameters. The parameters are set in the script. This script is mainly used to generate summary statistics for the Neural Network. The output is a .csv file with the summary statistics and the parameters used to generate the data. The script can simply be run by

python run_simulator.py --input_fasta [str:fasta/file/path.fasta] --n_simulations [int:number of simulations] --outdir [str:output/dir/path]

2. run_get_tree_stats.py

This script is used to generate summary statistics from a given newick tree and also plots tree. Outputs path.png and path_stats.json The script can be run by

python run_get_tree_stats.py --infile_path [str:tree/file/path.tree] --output_path [str:output/tree/path.png]

3. run_bolfi.py

This script is used to run the Bayesian Optimisation Likelihood-Free Inference (BOLFI) algorithm. The script can simply be run by

python run_bolfi.py --observed_data [str:input/dir/path.csv] --fasta [str:fasta/file/path.fasta]

Data Analysis and Neural Network

The scripts listed here can be used in the 'nn_env'. Each script has a help function that can be accessed by running the script with the -h flag. The data is generated by running the simulator with different parameters.

1. run_preprocess.py

This script is used to preprocess the summary statistics data generated by the simulator. The script can be run by

python run_preprocess.py --input_csv [str:input/dir/path] --output_csv [str:output/path.csv]

2. run_train_nn.py

This script is used to train the Neural Network and by adjusting optimiser, loss function etc. The trained model is saved in output path. The script can be run by

python run_train_nn.py --input_csv [str:input/dir/path.csv] --output_path [str:output/dir/path.pth]