/NLP4SciencePapers

Must-read papers on NLP for science.

NLP4Science Papers

Must-read papers on NLP for science (protein and molecular). The paper list is mainly mantained by team ZJUNLP

Contents

NLP for Protein Papers

Nature and its series

  1. I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nature Protocols. Xiaogen Zhou, Wei Zheng, Yang Li, Robin Pearce, Chengxin Zhang, Eric W. Bell, Guijun Zhang & Yang Zhang. [pdf]. 2022.8.
  2. SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment. npj | Systems Biology and Applications. Siyue Wang, Giles R. S. Atkinson & Wayne B. Hayes. [pdf]. 2022.7.
  3. Learning functional properties of proteins with language models. Nature Machine Intelligence. Serbulent Unsal, Heval Atas, Muammer Albayrak, Kemal Turhan, Aybar C. Acar & Tunca Doğan [pdf]. 2022.3.
  4. Protein function prediction for newly sequenced organisms. Nature Machine Intelligence. Mateo Torres, Haixuan Yang, Alfonso E. Romero & Alberto Paccanaro [pdf]. 2021.12.
  5. Structure-based protein function prediction using graph convolutional networks Nature Communications. Vladimir Gligorijević, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C. Taylor, Ian M. Fisk, Hera Vlamakis, Ramnik J. Xavier, Rob Knight, Kyunghyun Cho & Richard Bonneau. [pdf]. 2021.4.
  6. Expanding functional protein sequence spaces using generative adversarial networks Nature Machine Intelligence. Donatas Repecka, Vykintas Jauniskis, Laurynas Karpus, Elzbieta Rembeza, Irmantas Rokaitis, Jan Zrimec, Simona Poviloniene, Audrius Laurynenas, Sandra Viknander, Wissam Abuajwa, Otto Savolainen, Rolandas Meskys, Martin K. M. Engqvist & Aleksej Zelezniak. [pdf]. 2021.3.
  7. Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks Nature Machine Intelligence. Cen Wan & David T. Jones. [pdf]. 2020.8

Other Journals and Conferences

  1. Exploring Evolution-aware & free protein language models as protein function predictors. NeurIPS 2022. Hu, Mingyang and Yuan, Fajie and Yang, Kevin K and Ju, Fusong and Su, Jin and Wang, Hui and Yang, Fei and Ding, Qiuyang [pdf], 2022.06
  2. A Semi-supervised Graph Deep Neural Network for Automatic Protein Function Annotation. Bioinformatics and Biomedical Engineering. Akrem Sellami, Bishnu Sarker, Salvatore Tabbone, Marie-Dominique Devignes & Sabeur Aridhi. [pdf], 2022.6
  3. ProTranslator: zero-shot protein function prediction using textual description. arXiv:2204.10286. Hanwen Xu, Sheng Wang. [pdf], 2022.5
  4. A deep learning framework for predicting protein functions with co-occurrence of GO terms. IEEE/ACM Trans Comput Biol Bioinform. Min Li, Wenbo Shi, Fuhao Zhang, Min Zeng, Yaohang Li. [pdf], 2022.5
  5. PANDA2: protein function prediction using graph neural networks. NAR Genomics and Bioinformatics. Chenguang Zhao, Tong Liu, Zheng Wang. [pdf], 2022.2
  6. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction. Bioinformatics. Ronghui You, Shuwei Yao, Hiroshi Mamitsuka, Shanfeng Zhu. [pdf], 2021.7
  7. Accurate Protein Function Prediction via Graph Attention Networks with Predicted Structure Information. Briefings in Bioinformatics. Boqiao Lai, Jinbo Xu. [pdf], 2021.6
  8. Effusion: prediction of protein function from sequence similarity networks. Bioinformatics. Jeffrey M Yunes, Patricia C Babbitt. [pdf], 2019.2
  9. Protein function prediction from dynamic protein interaction network using gene expression data. Journal of Bioinformatics and Computational Biology. . Sovan Saha, Abhimanyu Prasad, Piyali Chatterjee, Subhadip Basu and Mita Nasipuri. [pdf], 2018.12
  10. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation. Methods. Ronghui You, Xiaodi Huang, Shanfeng Zhu. [pdf], 2018.08

NLP for Molecular Papers

Nature and its series

  1. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature Communications. Zheni Zeng, Yuan Yao, Zhiyuan Liu, Maosong Sun. [pdf]. 2022.02
  2. Language models can learn complex molecular distributions. Nature Communications. Daniel Flam-Shepherd, Kevin Zhu, Alán Aspuru-Guzik. [pdf]. 2022.07
  3. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence. Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, Payel Das. [pdf]. 2022.12
  4. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nature Communications. Michael Moret, Irene Pachon Angona, Leandro Cotos, Shen Yan, Kenneth Atz, Cyrill Brunner, Martin Baumgartner, Francesca Grisoni, Gisbert Schneider. [pdf]. 2023.01

Other Journals and Conferences

  1. Translation between Molecules and Natural Language. EMNLP. Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, Heng Ji. [pdf], 2022
  2. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning. Ross Irwin, Spyridon Dimitriadis, Jiazhen He and Esben Jannik Bjerrum. [pdf], 2022.01
  3. Multilingual Molecular Representation Learning via Contrastive Pre-training. ACL. Zhihui Guo, Pramod Sharma, Andy Martinez, Liang Du, Robin Abraham. [pdf], 2022
  4. Molecular Language Model as Multi-task Generator. Arxiv. Yin Fang, Ningyu Zhang, Zhuo Chen, Xiaohui Fan, Huajun Chen. [pdf], 2023

Contributors

We thank all members in team ZJUNLP for the paper recommendation. Pull requests and issues are welcomed!