/NLP4SciencePapers

Must-read papers on NLP for science.

NLP4Science Papers

Must-read papers on NLP for science (protein and molecular). The paper list is mainly mantained by team ZJUNLP

Contents

NLP for Protein Papers

Nature and its series

  1. Large language models generate functional protein sequences across diverse families. Nature Biotechnology. Ali Madani, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr., Caiming Xiong, Zachary Z. Sun, Richard Socher, James S. Fraser & Nikhil Naik. [pdf]. 2023.1.
  2. I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nature Protocols. Xiaogen Zhou, Wei Zheng, Yang Li, Robin Pearce, Chengxin Zhang, Eric W. Bell, Guijun Zhang & Yang Zhang. [pdf]. 2022.8.
  3. SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment. npj | Systems Biology and Applications. Siyue Wang, Giles R. S. Atkinson & Wayne B. Hayes. [pdf]. 2022.7.
  4. Learning functional properties of proteins with language models. Nature Machine Intelligence. Serbulent Unsal, Heval Atas, Muammer Albayrak, Kemal Turhan, Aybar C. Acar & Tunca Doğan [pdf]. 2022.3.
  5. Protein function prediction for newly sequenced organisms. Nature Machine Intelligence. Mateo Torres, Haixuan Yang, Alfonso E. Romero & Alberto Paccanaro [pdf]. 2021.12.
  6. Structure-based protein function prediction using graph convolutional networks Nature Communications. Vladimir Gligorijević, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C. Taylor, Ian M. Fisk, Hera Vlamakis, Ramnik J. Xavier, Rob Knight, Kyunghyun Cho & Richard Bonneau. [pdf]. 2021.4.
  7. Expanding functional protein sequence spaces using generative adversarial networks Nature Machine Intelligence. Donatas Repecka, Vykintas Jauniskis, Laurynas Karpus, Elzbieta Rembeza, Irmantas Rokaitis, Jan Zrimec, Simona Poviloniene, Audrius Laurynenas, Sandra Viknander, Wissam Abuajwa, Otto Savolainen, Rolandas Meskys, Martin K. M. Engqvist & Aleksej Zelezniak. [pdf]. 2021.3.
  8. Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks Nature Machine Intelligence. Cen Wan & David T. Jones. [pdf]. 2020.8

Other Journals and Conferences

  1. Exploring Evolution-aware & free protein language models as protein function predictors. NeurIPS 2022. Hu, Mingyang and Yuan, Fajie and Yang, Kevin K and Ju, Fusong and Su, Jin and Wang, Hui and Yang, Fei and Ding, Qiuyang [pdf], 2022.06
  2. A Semi-supervised Graph Deep Neural Network for Automatic Protein Function Annotation. Bioinformatics and Biomedical Engineering. Akrem Sellami, Bishnu Sarker, Salvatore Tabbone, Marie-Dominique Devignes & Sabeur Aridhi. [pdf], 2022.6
  3. ProTranslator: zero-shot protein function prediction using textual description. arXiv:2204.10286. Hanwen Xu, Sheng Wang. [pdf], 2022.5
  4. A deep learning framework for predicting protein functions with co-occurrence of GO terms. IEEE/ACM Trans Comput Biol Bioinform. Min Li, Wenbo Shi, Fuhao Zhang, Min Zeng, Yaohang Li. [pdf], 2022.5
  5. PANDA2: protein function prediction using graph neural networks. NAR Genomics and Bioinformatics. Chenguang Zhao, Tong Liu, Zheng Wang. [pdf], 2022.2
  6. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction. Bioinformatics. Ronghui You, Shuwei Yao, Hiroshi Mamitsuka, Shanfeng Zhu. [pdf], 2021.7
  7. Accurate Protein Function Prediction via Graph Attention Networks with Predicted Structure Information. Briefings in Bioinformatics. Boqiao Lai, Jinbo Xu. [pdf], 2021.6
  8. Effusion: prediction of protein function from sequence similarity networks. Bioinformatics. Jeffrey M Yunes, Patricia C Babbitt. [pdf], 2019.2
  9. Protein function prediction from dynamic protein interaction network using gene expression data. Journal of Bioinformatics and Computational Biology. . Sovan Saha, Abhimanyu Prasad, Piyali Chatterjee, Subhadip Basu and Mita Nasipuri. [pdf], 2018.12
  10. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation. Methods. Ronghui You, Xiaodi Huang, Shanfeng Zhu. [pdf], 2018.08

NLP for Molecular Papers

Nature and its series

  1. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature Communications. Zheni Zeng, Yuan Yao, Zhiyuan Liu, Maosong Sun. [pdf]. 2022.02
  2. Language models can learn complex molecular distributions. Nature Communications. Daniel Flam-Shepherd, Kevin Zhu, Alán Aspuru-Guzik. [pdf]. 2022.07
  3. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence. Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, Payel Das. [pdf]. 2022.12
  4. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nature Communications. Michael Moret, Irene Pachon Angona, Leandro Cotos, Shen Yan, Kenneth Atz, Cyrill Brunner, Martin Baumgartner, Francesca Grisoni, Gisbert Schneider. [pdf]. 2023.01

Other Journals and Conferences

  1. Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. Arxiv. Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen. [pdf], 2023
  2. Domain-Agnostic Molecular Generation with Self-feedback. Arxiv. Yin Fang, Ningyu Zhang, Zhuo Chen, Xiaohui Fan, Huajun Chen. [pdf], 2023
  3. Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective. Arxiv. Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, Qing Li. [pdf], 2023
  4. Translation between Molecules and Natural Language. EMNLP. Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, Heng Ji. [pdf], 2022
  5. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning. Ross Irwin, Spyridon Dimitriadis, Jiazhen He and Esben Jannik Bjerrum. [pdf], 2022.01
  6. Multilingual Molecular Representation Learning via Contrastive Pre-training. ACL. Zhihui Guo, Pramod Sharma, Andy Martinez, Liang Du, Robin Abraham. [pdf], 2022

Contributors

We thank all members in team ZJUNLP for the paper recommendation. Pull requests and issues are welcomed!