The purpose of this project is to predict active and inactive regulatory regions using machine learning. To train the models, different types of data are used: epigenomic data and sequence data. The epigenomic data represents the intersection between the DNA and proteins, while the sequence data are simply chromosomal sequences.
The genome data are preprocessed in order to increase the models' performances. The learning algorithms tested are decision tree, random forest, perceptron, multi-layer perceptron, feed-forward neural network and convolutional neural network.
A detailed report is contained in latex
folder.
micheleantonazzi/bioinformatics-project
Analyze the active regulatory region of DNA using FFNN and CNN
Python