AI4Bio-Reading-List

This is an AI for Biology reading list maintained by the MBZUAI AI4Bio Group.

Contents:

Note: For applications of diffusion methods in protein science, check Diffusion reading list.

1. Protein Level

1.1 Protein Structure Prediction

  • Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. [Paper] [Slides]

  • Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021. Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., ... & Baker, D. [Paper]

  • ColabFold: making protein folding accessible to all]. Nature Methods. 2022. Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. [Paper]

  • Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv. 2022. Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. [Paper] [Slides]

  • High-resolution de novo structure prediction from primary sequence. BioRxiv. 2022. Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., ... & Peng, J. [Paper] [Slides]

  • Helixfold-single: Msa-free protein structure prediction by using protein language model as an alternative. ArXiv. 2022. Fang, X., Wang, F., Liu, L., He, J., Lin, D., Xiang, Y., ... & Song, L. [Paper] [Slides]

1.2 Protein Function Prediction

  • Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS. 2021. Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., ... & Fergus, R. [Paper]

  • Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. Nature Machine Intelligence. 2019. Upmeier zu Belzen, J., Bürgel, T., Holderbach, S., Bubeck, F., Adam, L., Gandor, C., ... & Eils, R. [Paper]

  • Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks. Nature Machine Intelligence. 2020. Wan, C., & Jones, D. T. [Paper]

  • Protein function prediction for newly sequenced organisms. Nature Machine Intelligence. 2021. Torres, M., Yang, H., Romero, A. E., & Paccanaro, A. [Paper]

1.3 Protein Design

  • Expanding functional protein sequence spaces using generative adversarial networks. Nature Machine Intelligence. 2021. Repecka, D., Jauniskis, V., Karpus, L., Rembeza, E., Rokaitis, I., Zrimec, J., ... & Zelezniak, A. [Paper]

  • Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence. 2022. Castro, E., Godavarthi, A., Rubinfien, J., Givechian, K., Bhaskar, D., & Krishnaswamy, S. [Paper]

  • A high-level programming language for generative protein design. bioRxiv. 2022-12. Hie, B., Candido, S., Lin, Z., Kabeli, O., Rao, R., Smetanin, N., ... & Rives, A. [Paper] [Slides]

  • A universal deep-learning model for zinc finger design enables transcription factor reprogramming. Nature Biotechnology. 2023. Ichikawa, D. M., Abdin, O., Alerasool, N., Kogenaru, M., Mueller, A. L., Wen, H., ... & Noyes, M. B. [Paper] [Slides]

2. Protein Interaction Level

  • Predicting drug–protein interaction using quasi-visual question answering system. Nature Machine Intelligence. 2020. Zheng, S., Li, Y., Chen, S., Xu, J., & Yang, Y. (2020). [Paper]

  • A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nature Machine Intelligence. 2020. Wang, M., Cang, Z., & Wei, G. W. [Paper]

  • Computed structures of core eukaryotic protein complexes. Science. 2021. Humphreys, I. R., Pei, J., Baek, M., Krishnakumar, A., Anishchenko, I., Ovchinnikov, S., ... & Baker, D. [Paper]

  • Harnessing protein folding neural networks for peptide–protein docking. Nature communications. 2022. Tsaban, T., Varga, J. K., Avraham, O., Ben-Aharon, Z., Khramushin, A., & Schueler-Furman, O. [Paper]

  • Protein complex prediction with AlphaFold-Multimer. BioRxiv. 2022. Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., ... & Hassabis, D. [Paper]

  • Improved prediction of protein-protein interactions using AlphaFold2. Nature communications. 2022. Bryant, P., Pozzati, G., & Elofsson, A. [Paper]

  • AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nature communications. 2022. Gao, M., Nakajima An, D., Parks, J. M., & Skolnick, J. [Paper]

  • Uni-Fold Symmetry: harnessing symmetry in folding large protein complexes. bioRxiv. 2022. Li, Z., Yang, S., Liu, X., Chen, W., Wen, H., Shen, F., ... & Zhang, L. [Paper]

  • Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nature Communications. 2022. Bryant, P., Pozzati, G., Zhu, W., Shenoy, A., Kundrotas, P., & Elofsson, A. [Paper]

  • Improve the Protein Complex Prediction with Protein Language Models. bioRxiv. 2022. Chen, B., Xie, Z., Xu, J., Qiu, J., Ye, Z., & Tang, J. [Paper]

3. Cell Level

  • Clustering single-cell RNA-seq data with a model-based deep learning approach. Nature Machine Intelligence. 2019. Tian, T., Wan, J., Song, Q., & Wei, Z. [Paper]

  • An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nature Machine Intelligence. 2020. Wang, L., Nie, R., Yu, Z., Xin, R., Zheng, C., Zhang, Z., ... & Cai, J. [Paper]

  • Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nature Machine Intelligence. 2020. Hu, J., Li, X., Hu, G., Lyu, Y., Susztak, K., & Li, M. [Paper]

  • Simultaneous deep generative modelling and clustering of single-cell genomic data. Nature Machine Intelligence. 2021. Liu, Q., Chen, S., Jiang, R., & Wong, W. H. (2021). [Paper]

  • Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods. 2021. Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., ... & Kelley, D. R. [Paper] [Slides]

  • scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nature Machine Intelligence. 2022. Yang, F., Wang, W., Wang, F., Fang, Y., Tang, D., Huang, J., ... & Yao, J. [Paper] [Slides]

  • GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations. BioRxiv. 2022. Roohani, Y., Huang, K., & Leskovec, J. [Paper] [Slides]

  • A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nature Machine Intelligence. 2022. Lakkis, J., Schroeder, A., Su, K., Lee, M. Y., Bashore, A. C., Reilly, M. P., & Li, M. [Paper]

  • Contrastive learning enables rapid mapping to multimodal single-cell atlas of multimillion scale. Nature Machine Intelligence. 2022. Yang, M., Yang, Y., Xie, C., Ni, M., Liu, J., Yang, H., ... & Wang, J. [Paper]

  • Interpreting the B-cell receptor repertoire with single-cell gene expression using Benisse. Nature Machine Intelligence. 2022. Zhang, Z., Chang, W. Y., Wang, K., Yang, Y., Wang, X., Yao, C., ... & Wang, T. [Paper]

  • Simultaneous dimensionality reduction and integration for single-cell ATAC-seq data using deep learning. Nature Machine Intelligence. 2022. Kopp, W., Akalin, A., & Ohler, U. [Paper]

  • Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nature Machine Intelligence. 2022. Chen, X., Chen, S., Song, S., Gao, Z., Hou, L., Zhang, X., ... & Jiang, R. [Paper]

  • Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics. 2022. Li, Jiaqi, Jingjing Wang, Peijing Zhang, Renying Wang, Yuqing Mei, Zhongyi Sun, Lijiang Fei et al. [Paper]

4. Others