NovEval: Automatically Evaluating Scholarly Novelty in Alignment with Human Assessments

The repository accompanies the paper titled A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept. NovEval is a tool designed to automatically evaluate the novelty of an academic manuscript. Its judgment has been shown to significantly correlate with that of human experts. NovEval estimates how rarely a sequence of words (i.e., the manuscript) occurs in the universe of scholarly discourse using a GPT-2 model trained solely on English Wikipedia. See our manuscript for details.

The repository contains scripts to reproduce the GPT-2 model and the reported experiments.

Online Demo

Play with it now at NovEval.

Model

The trained GPT-2 model is hosted at Zenodo.

For those who want to reproduce the model, use the following code.

# loaded modules on IU's Big Red 200 cluster:
#  1) craype-x86-rome          5) gcc/11.2.0          9) cray-libsci/21.08.1.2  13) nano/6.4
#  2) libfabric/1.11.0.3.71    6) craype/2.7.14      10) PrgEnv-gnu/8.3.2       14) cudatoolkit/11.7
#  3) craype-network-ofi       7) cray-dsmml/0.2.2   11) xalt/2.10.34           15) python/3.10.5
#  4) perftools-base/21.12.0   8) cray-mpich/8.1.14  12) git/2.34

python3.10 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

torchrun --standalone --nproc_per_node=4 train.py config/train_gpt2_wikipedia_en.py

The model was trained during the update from Pytorch 1.0 to 2.0; thus, dependencies may have conflicts with each other.

Validity Check Reproduction

# for token- and sentence-level sanity check
python -m face_validity.sh

# for section-level sanity check 
python -m known_group_validity.sh

License

0BSD

Contact

hw56@indiana.edu

Bibtex

@inproceedings{wang2024noveval,
  author = {Haining Wang},
  title = {A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept},
  booktitle = {Proceedings of iConference 2024: Wisdom, Well-Being, Win-Win},
  year = {2024},
  doi = {10.1007/978-3-031-57867-0_31},
  publisher = {Springer Nature Switzerland},
  address = {Cham},
  pages = {409--420},
  isbn = {978-3-031-57867-0},
}