Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery

Overview

To set up the environment for this repository, please follow the steps below:

Step 1: Create a Python environment (optional) If you wish to use a specific Python environment, you can create one using the following:

conda create -n pyt1.11 python=3.8.5

Step 2: Install PyTorch with CUDA (optional) If you want to use PyTorch with CUDA support, you can install it using the following:

conda install pytorch==1.11 torchvision torchaudio cudatoolkit=11.3 -c pytorch

Step 3: Install Python dependencies To install the required Python dependencies, run the following command:

pip install -r requirements.txt

Unzip all the zip files located in the data folder, including its subfolders.
Place the following folders, extracted from their respective zip files, under the data folder: kg,ct, and gold_subset
Locate the local_context_dataset folder unzipped from data/idea-sentence/local_context_dataset.zip.Move it to idea-sentence/models/T5.
Find the local_dataset folder unzipped from data/idea-node/local_dataset.zip. Place them in idea-node/models/Dual_Encoder.
Copy the file e2t.json and paste it into the following folders: idea-node\models\GPT3.5*\, idea-node\preprocess\, idea-sentence\models\GPT3.5*\, and idea-sentence\preprocess\

The project data includes the following components:

data/local_context_dataset: This folder contains the training, validation, and testing files for idea sentence generation.
data/local_dataset: This folder contains the training, validation, and testing files for idea node prediction.
data/kg/*.json: The data/kg directory contains files that store the original Information Extraction (IE) results for all paper abstracts.
data/ct/*.csv: The data/ct directory contains files that represent the citation network for all papers.
data/gold_subset: This directory contains our gold annotation subsets.
idea-node/evaluation and idea-sentence/evaluation contain sample evaluation code.

To train the model under *\models\*, run the following command:

bash finetune_*.sh

To test the model under *\models\*, run the following command:

bash eval_*.sh