-
Download IUI pdfs
./data_scripts/fetch_iui_data.sh
-
Download scisummnet papers
./data_scripts/fetch_scisumm_data.sh
-
Follow the instructions in the README.md of the
scienceparseplus
module. Specfically, make sure you:- download the model weights
- create the docker image
- start the service in a docker container.
-
Process your PDFs with scienceparseplus
python src/run_spp.py --input_dir=<PATH TO PDFS>
-
Download the pretrained model
cd sequential_sentence_classification gdown https://drive.google.com/uc?id=1bx9hl6AhQdQ6hId4-N4ENM-jt7cwpqi6
-
Alternatively, follow the README.md in the
sequential_sentence_classification
module to re-train the model. -
Run the classifier
python src/run_ssc.py --path_to_model=sequential_sentence_classification/model.tar.gz --test_jsonl_file=<path to input JSONL file> --output_file=<path to output JSON file>