Codes and supplementary materials for our paper "Deep Learning-based Clustering Approaches for Bioinformatics" published in Briefings in Bioinformatics journal. This repo will be updated periodically. In particular, more complete Jupyter notebooks will be added. In this article, we reviewed deep learning-based approaches for cluster analysis, including network training, representation learning, parameter optimization, and formulating clustering quality metrics. We also discussed how representation learning based on different autoencoder architectures (e.g., vanilla, variational, LSTM, and convolutional) can be more effective than ML-based approaches (e.g., PCA) in different scenarios, e.g., bio-imaging, gene expression clustering, and clustering biomedical texts.
We provide the list of deep learning-based unsupervised/clustering methods, link to papers, and codes. Besides, new articles proposing approaches and paper will be listed. So stay tuned!
Title | Article | Conference/Journal | Code |
---|---|---|---|
Deep clustering with convolutional autoencoders (DCEC) | Link | ICONIP'2017 | GitHub |
Unsupervised Data Augmentation for Consistency Training (UDA) | Link | Arxiv'2019 | GitHub |
Deep Clustering via joint convolutional autoencoder embedding and relative entropy minimization (DEPICT) | Link | ICCV'2017 | GitHub |
Discriminatively Boosted Clustering (DBC) | Link | Arxiv'2017 | N/A |
Variational Deep Embedding (VADE) | Link | IJCAI'2017 | GitHub |
Convolutional Embedded Networks (CEN)} | Link | Arxiv'2018 | GitHub |
Deep Subspace Clustering Networks (DSC-Nets) | Link | NIPS'2017 | GitHub |
Graph Clustering with Dynamic Embedding (GRACE) | Link | Arxiv'2017 | N/A |
Deep Unsupervised Clustering Using Mixture of Autoencoders (MIXAE) | Link | Arxiv'2017 | N/A |
Deep Embedded Clustering (DEC) | Link | ICML'2016 | GitHub |
A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture | Link | IEEE ACCESS 2018 | |
GEMSEC: Graph Embedding with Self Clustering | Link | Arxiv,2018 | GitHub |
Clustering with Deep Learning: Taxonomy and New Methods | Link | Arxiv, 2018 | GitHub |
Deep Continuous Clustering (DCC) | Link | Arxiv, 2018 | GitHub |
Deep Clustering with Convolutional Autoencoders (DCEC) | Link | ICONIP'2018 | GitHub |
SpectralNet: Spectral Clustering Using Deep Neural Networks | Link | ICLR'2018 | GitHub |
Subspace clustering using a low-rank constrained autoencoder (LRAE) | Link | Information Sciences'2018 | N/A |
Clustering-driven Deep Embedding with Pairwise Constraints (CPAC) | Link | Arxiv'2018 | GitHub |
Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering | Link | PMLR'2017 | N/A |
Deep Unsupervised Clustering With Gaussian Mixture Variational AutoEncoders (GMVAE) | Link | ICLR'2017 | GitHub |
Is Simple Better?: Revisiting Simple Generative Models for Unsupervised Clustering | Link | NIPS'2017 Workshop | GitHub |
Imporved Deep Embedding Clustering (IDEC) | Link | IJCAI'2017 | GitHub |
Deep Clustering Network (DCN) | Link | Arxiv'2016 | GitHub |
Joint Unsupervised Learning of Deep Representations and Image Clustering (JULE) | Link | CVPR'2016 | GitHub |
Deep Embedding Network for Clustering (DEN) | Link | ICPR'2014 | N/A |
Auto-encoder Based Data Clustering (ABDC) | Link | CIARP'2013 | GitHub |
Learning Deep Representations for Graph Clustering | Link | AAAI'2014 | GitHub |
To run the examples interactively, you need to install some Python modules and libraries.
- Python 3
- Scikit-learn
- Keras
- TensorFlow.
For the Jupyter notebook, git it from this Link and install it on your machine. Then clone this repo using following command, given that you have already installed the git
:
git clone https://github.com/rezacsedu/Deep-learning-for-clustering-in-bioinformatics.git
Alternatively, install all the required libraries by issuing the following command:
cd Deep-learning-for-clustering-in-bioinformatics
pip3 install -r requirements.txt
cd Notebboks
Then start Jupyter notebbok by issuing the following command:
jupyter notebook
In the opened browser, go to Jupyter tab and window open the notebook.
LSTM_AE_Text_Clustering.ipynb
If you want to skip the training, soon we'll provide the pre-trained weights, which you can restore and start fine-tuning. Happy coding! Leave a comment if you have any question.
The ClusteringLayer class and the target_distribution function are based on DEC from https://github.com/XifengGuo/DCEC/blob/master/DCEC.py by Xifeng Guo
If you use the code of this repository in your research, please consider citing the folowing papers:
@article{karim2021deep,
title={Deep learning-based clustering approaches for bioinformatics},
author={Karim, Md Rezaul and Beyan, Oya and Zappa, Achille and Costa, Ivan G and Rebholz-Schuhmann, Dietrich and Cochez, Michael and Decker, Stefan},
journal={Briefings in bioinformatics},
volume={22},
number={1},
pages={393--415},
year={2021},
publisher={Oxford University Press}
}
If you find more related work, which are not listed here, please create a PR or sugest by filing issues. Your contribution will be highly appreciated. For any questions, feel free to open an issue or contact at rezaul.karim@rwth-aachen.de.