/Deep-Learning-for-Clustering-in-Bioinformatics

Deep Learning-based Clustering Approaches for Bioinformatics

Primary LanguageJupyter NotebookOtherNOASSERTION

Deep Learning-based Clustering Approaches for Bioinformatics

Codes and supplementary materials for our paper "Deep Learning-based Clustering Approaches for Bioinformatics" published in Briefings in Bioinformatics journal. This repo will be updated periodically. In particular, more complete Jupyter notebooks will be added. In this article, we reviewed deep learning-based approaches for cluster analysis, including network training, representation learning, parameter optimization, and formulating clustering quality metrics. We also discussed how representation learning based on different autoencoder architectures (e.g., vanilla, variational, LSTM, and convolutional) can be more effective than ML-based approaches (e.g., PCA) in different scenarios, e.g., bio-imaging, gene expression clustering, and clustering biomedical texts.

Deep learning-based unsupervised/clustering methods, link to papers & codes

We provide the list of deep learning-based unsupervised/clustering methods, link to papers, and codes. Besides, new articles proposing approaches and paper will be listed. So stay tuned!

Title Article Conference/Journal Code
Deep clustering with convolutional autoencoders (DCEC) Link ICONIP'2017 GitHub
Unsupervised Data Augmentation for Consistency Training (UDA) Link Arxiv'2019 GitHub
Deep Clustering via joint convolutional autoencoder embedding and relative entropy minimization (DEPICT) Link ICCV'2017 GitHub
Discriminatively Boosted Clustering (DBC) Link Arxiv'2017 N/A
Variational Deep Embedding (VADE) Link IJCAI'2017 GitHub
Convolutional Embedded Networks (CEN)} Link Arxiv'2018 GitHub
Deep Subspace Clustering Networks (DSC-Nets) Link NIPS'2017 GitHub
Graph Clustering with Dynamic Embedding (GRACE) Link Arxiv'2017 N/A
Deep Unsupervised Clustering Using Mixture of Autoencoders (MIXAE) Link Arxiv'2017 N/A
Deep Embedded Clustering (DEC) Link ICML'2016 GitHub
A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture Link IEEE ACCESS 2018
GEMSEC: Graph Embedding with Self Clustering Link Arxiv,2018 GitHub
Clustering with Deep Learning: Taxonomy and New Methods Link Arxiv, 2018 GitHub
Deep Continuous Clustering (DCC) Link Arxiv, 2018 GitHub
Deep Clustering with Convolutional Autoencoders (DCEC) Link ICONIP'2018 GitHub
SpectralNet: Spectral Clustering Using Deep Neural Networks Link ICLR'2018 GitHub
Subspace clustering using a low-rank constrained autoencoder (LRAE) Link Information Sciences'2018 N/A
Clustering-driven Deep Embedding with Pairwise Constraints (CPAC) Link Arxiv'2018 GitHub
Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering Link PMLR'2017 N/A
Deep Unsupervised Clustering With Gaussian Mixture Variational AutoEncoders (GMVAE) Link ICLR'2017 GitHub
Is Simple Better?: Revisiting Simple Generative Models for Unsupervised Clustering Link NIPS'2017 Workshop GitHub
Imporved Deep Embedding Clustering (IDEC) Link IJCAI'2017 GitHub
Deep Clustering Network (DCN) Link Arxiv'2016 GitHub
Joint Unsupervised Learning of Deep Representations and Image Clustering (JULE) Link CVPR'2016 GitHub
Deep Embedding Network for Clustering (DEN) Link ICPR'2014 N/A
Auto-encoder Based Data Clustering (ABDC) Link CIARP'2013 GitHub
Learning Deep Representations for Graph Clustering Link AAAI'2014 GitHub

Running provided Jupyter notebooks

To run the examples interactively, you need to install some Python modules and libraries.

  • Python 3
  • Scikit-learn
  • Keras
  • TensorFlow.

For the Jupyter notebook, git it from this Link and install it on your machine. Then clone this repo using following command, given that you have already installed the git:

git clone https://github.com/rezacsedu/Deep-learning-for-clustering-in-bioinformatics.git

Alternatively, install all the required libraries by issuing the following command:

 cd Deep-learning-for-clustering-in-bioinformatics
 pip3 install -r requirements.txt
 cd Notebboks

Then start Jupyter notebbok by issuing the following command:

jupyter notebook

In the opened browser, go to Jupyter tab and window open the notebook.

LSTM_AE_Text_Clustering.ipynb

If you want to skip the training, soon we'll provide the pre-trained weights, which you can restore and start fine-tuning. Happy coding! Leave a comment if you have any question.

Acknowledgement

The ClusteringLayer class and the target_distribution function are based on DEC from https://github.com/XifengGuo/DCEC/blob/master/DCEC.py by Xifeng Guo

Citation request

If you use the code of this repository in your research, please consider citing the folowing papers:

@article{karim2021deep,
      title={Deep learning-based clustering approaches for bioinformatics},
      author={Karim, Md Rezaul and Beyan, Oya and Zappa, Achille and Costa, Ivan G and Rebholz-Schuhmann, Dietrich and Cochez, Michael and Decker, Stefan},
      journal={Briefings in bioinformatics},
      volume={22},
      number={1},
      pages={393--415},
      year={2021},
      publisher={Oxford University Press}
      }

Contributing

If you find more related work, which are not listed here, please create a PR or sugest by filing issues. Your contribution will be highly appreciated. For any questions, feel free to open an issue or contact at rezaul.karim@rwth-aachen.de.