Trend Analyzing Algorithm designed alongside "A Survey of Zero-Knowledge Proofs in a Post-Quantum Context"

An Overview

This script is designed to mine the abstracts in the 'abstracts.csv' file, to find common words or phrases within them. These abstracts will be separated by the year. Keyphrases or keywords will only be recorded if they occur in more than one paper, as to avoid recording words or phrases unique to a single paper.

Respository Architecture

Important Folders

paper-sorted contains the curated csvs that were used to derive certain figures in the paper.

unsorted-sanitized contain the csvs generated by the python script, which will sanitize the abstracts.csv file, in order to perform frequency analysis.

Important Files

abstracts.csv is a CSV containing only the abstract, and the year. It is used with the python script.

article-info.csv is a CSV containing authorship information, and other metadata regarding the articles whose abstracts we mined. All of these abstracts are publicly-availible. We thank the authors of these articles and the publishers for making them so.

blocklist.txt is a newline-delimited file, containing words that should not be accounted for in the frequency analysis

mine-year.py is the main python script. There are no arguments or options.

Usage

python3 mine-year.py

Procedures used

To derive the results located in the 'unsorted-sanitized', run the script.

The CSVs provided in the 'paper-sorted' folder were processed additionally. To create these files, we accounted for plural or alternate forms of words by adding the instances of these forms to the root word.