/BioIE-LLM

Biological Information Extraction from Large Language Models (LLMs)

Primary LanguagePythonApache License 2.0Apache-2.0

BioIE-LLM

Biological Information Extraction from Large Language Models (LLMs)

This is the official code of the papers:

Installation

The code was implemented on Python version 3.8, and the versions of the dependencies are listed in requirements.txt

Datasets

  • STRING DB: the human (Homo sapiens) protein network for performing a protein-protein interaction (PPI) recognition task.
  • KEGG DB: the KEGG human pathways which have been identified as being activated in response to low-dose radiation exposure in a recent study.
  • INDRA DB: a set of human gene regulatory relation statements that represent mechanistic interactions between biological agents.

Reproduction

To reproduce the results of the experiments, use the bash script run.sh. You need to change model/data paths accordingly.

Results

Here are the results of the experiments. The experiments were conducted on 8Ă—NVIDIA V100 GPUs. Note different number of GPUs and batch size can produce slightly different results.

Recognizing Protein-Protein Interactions

  • STRING Task1 - Precision for the generated binding proteins for 1K protein samples.
  • STRING Task2 - Micro F-scores for randomly selected positive and negative pairs (I.e., 1K = 500 pos + 500 neg).
  • Model prediction consistency between Task1 and Task2.
Model STRING Task1 STRING Task2 Consistency
Galactica (6.7B) 0.166 0.552 0.726
LLaMA (7B) 0.043 0.484 0.984
Alpaca (7B) 0.052 0.521 0.784
RST (11B) 0.146 0.529 1.000
BioGPT-Large (1.5B) 0.100 0.504 0.814
BioMedLM (2.7B) 0.069 0.643 0.861

KEGG Pathway Recognition

  • KEGG Task1 - Precision for the generated genes that belong to the top 20 pathways relevant to low-dose radiation exposure.
  • KEGG Task2 - Micro F-scores for randomly selected positive and negative pairs (I.e., 1K = 500 pos + 500 neg).
  • Model prediction consistency between Task1 and Task2.
Model KEGG Task1 KEGG Task2 Consistency
Galactica (6.7B) 0.256 0.564 0.917
LLaMA (7B) 0.180 0.562 0.881
Alpaca (7B) 0.268 0.522 1.0
RST (11B) 0.255 0.514 0.0
BioGPT-Large (1.5B) 0.550 0.497 0.923
BioMedLM (2.7B) 0.514 0.568 0.821

Evaluating Gene Regulatory Relations

  • INDRA Task - Micro F-scores with 1K samples for each class.
Model 2 class 3 class 4 class 5 class 6 class
Galactica (6.7B) 0.704 0.605 0.567 0.585 0.597
LLaMA (7B) 0.351 0.293 0.254 0.219 0.212
Alpaca (7B) 0.736 0.645 0.556 0.636 0.535
RST (11B) 0.640 0.718 0.597 0.667 0.614
BioGPT-Large (1.5B) 0.474 0.390 0.293 0.328 0.288
BioMedLM (2.7B) 0.542 0.408 0.307 0.230 0.195

Citation

@inproceedings{park2023automated,
  title={Automated Extraction of Molecular Interactions and Pathway Knowledge using Large Language Model, Galactica: Opportunities and Challenges},
  author={Park, Gilchan and Yoon, Byung-Jun and Luo, Xihaier and Lpez-Marrero, Vanessa and Johnstone, Patrick and Yoo, Shinjae and Alexander, Francis},
  booktitle={The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks},
  pages={255--264},
  year={2023}
}
@inproceedings{Park2023ComparativePE,
  title={Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge},
  author={Gilchan Park and Byung-Jun Yoon and Xihaier Luo and Vanessa L'opez-Marrero and Patrick Johnstone and Shinjae Yoo and Francis J. Alexander},
  year={2023}
}