/phylogenetic-cnn

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Phylogenetic Convolutional Neural Network

A novel architecture for metagenomic classification defined as phylogenetic convolutional neural network, as presented in the paper Phylogenetic Convolutional -Neural Networks in Metagenomics.

Getting Started

These instructions will get you a copy of the project up and running on your local machine.

Clone this repo

git clone https://gitlab.fbk.eu/MPBA/phylogenetic-cnn.git 

The DAP (Data Analysis Protocol) Project is included in this repo as an external reference (i.e. Git Submodule).

Therefore, the first time this repo is cloned, the Git Submodule must be initialised - after the clone command, you should see a dap directory in your cloned copy which is empty.

Thus:

cd dap
git submodule init
git submodule update

Alternatively:

You could do the same operations in just one line:

git clone --recursive https://gitlab.fbk.eu/MPBA/phylogenetic-cnn.git

Prerequisites

A complete conda environment is provided as a .yml file in the folder envs.

Additionally it is required to install the mlpy library. Further instructions to install MLPY 3.5.0 Python package are reported in the README.md file, in the envs/deps folder.

Replication Package

disease
cdf IBD data IBD results Synthetic data Synthetic results
cdr IBD data IBD results Synthetic data Synthetic results
icdf IBD data IBD results Synthetic data Synthetic results
icdr IBD data IBD results Synthetic data Synthetic results
ucf IBD data IBD results Synthetic data Synthetic results
ucr IBD data IBD results Synthetic data Synthetic results

Running Experiments

Runners

One can select the algorithm (SVM, random forrest, MLP, ph-cnn) to be used by simply decide which runner to execute.

  • multilayerperceptron_runner.py: Multi-Layer Perceptron
  • phylocnn_runner.py: Phylogenetic Convolutional Neural Network
  • randomforest_runner.py: Random forest
  • svm_runner.py: Support Vector Machine
  • transfer_learning_runner.py: Phylogenetic Convolutional Neural Network used for transfer learning. It assumes that pre-trained network weights are provided (see weights folder).

Settings

In order to configure how the program runs one needs to modify the following files:

  • settings.py - where it can be chose which type of data we want to load, where are the data, where to output, etc...
  • dap/settings.py - where it can be set how the DAP is supposed to operate. More informations are available in the readme in the dap folder and in the paper.
  • dap/deep_learning_settings.py - where all the settings specific for deep learning can be set.

Notebooks

  • PhyloConv1D: In this notebook we report code examples and explanations on how to use the new PhyloConv1D Keras layer. We use experimental data, as examples.

  • Embedding - ICDf data: In this notebook we report results and plots of embeddings of Phylo-Convolutional Layers calculated on data of ICDf disease included in the IBD dataset, as reported in the paper.

Authors

License

This project is licensed under GNU General Public License v3.0 GNU GPLv3 - see the LICENSE.txt file for details