Trend Analyzing Algorithm designed alongside "A Survey of Zero-Knowledge Proofs in a Post-Quantum Context"
This script is designed to mine the abstracts in the 'abstracts.csv' file, to find common words or phrases within them. These abstracts will be separated by the year. Keyphrases or keywords will only be recorded if they occur in more than one paper, as to avoid recording words or phrases unique to a single paper.
paper-sorted
contains the curated csvs that were used to derive certain figures in the paper.
unsorted-sanitized
contain the csvs generated by the python script, which will sanitize the abstracts.csv
file, in order to perform frequency analysis.
abstracts.csv
is a CSV containing only the abstract, and the year. It is used with the python script.
article-info.csv
is a CSV containing authorship information, and other metadata regarding the articles whose abstracts we mined. All of these abstracts are publicly-availible. We thank the authors of these articles and the publishers for making them so.
blocklist.txt
is a newline-delimited file, containing words that should not be accounted for in the frequency analysis
mine-year.py
is the main python script. There are no arguments or options.
python3 mine-year.py
To derive the results located in the 'unsorted-sanitized', run the script.
The CSVs provided in the 'paper-sorted' folder were processed additionally. To create these files, we accounted for plural or alternate forms of words by adding the instances of these forms to the root word.