Content

The scripts in this repository allow the analysis of the abstracts of the articles of the Special Sections of a scientific journal. The abstracts are processed with the slr-kit tool. Each script can be run independently of the others. The scripts are presented in the order of execution expected during the workflow.

Scopus2csv.py

Converts a CSV file exported from Scopus into a slr-kit compatible CSV file.

  • INPUT: CSV file exported from Scopus
  • OUTPUT: CSV file slr-kit_abstracts.csv in slr-kit compatible format

Positional arguments:

  • input_file: CSV exported from Scopus

Example of usage:

python Scopus2csv.py Scopus.csv

PreprocBySpecSec.py

Splits the articles preprocessed by slr-kit according to the Special Section they belong to.

  • INPUT: CSV file containing articles and names of the Special Sections; CSV file containing the items preprocessed by slr-kit
  • OUTPUT: set of folders containing the preprocessed articles divided by Special Section; executable file named run_all_process.bat to be able to postprocess articles and then LDA (postprocessing and LDA are provided by slr-kit)

Positional arguments:

  • spec_sec_csv: CSV file containing articles and names of the Special Sections
  • preproc_file: CSV file containing the preprocessed articles of all the Special Sections

Example of usage:

python PreprocBySpecSec.py Spec_Sec.csv SpecSec_preproc.csv

SpecSecFake.py

Create fake (test) Special Sections from articles postprocessed by slr-kit.

  • INPUT: folders containing the postprocessed articles divided by Special Section; CSV file containing the postprocessed articles of all the Special Sections
  • OUTPUT: folders containing the postprocessed articles of the fake Special Sections

Positional arguments:

  • directories_special_section: folders containing all postprocessed articles divided by Special Section
  • postprocess_file: CSV file containing the postprocessed articles of all the Special Sections

Positional arguments:

  • -fake: number of fake Special Sections to be created; default = 200

Example of usage:

python SpecSecFake.py (Get-ChildItem -Path "SpecSec\SpecSec*").FullName SpecSec_postproc.csv

SpecSecHist.py

Creates a histogram illustrating the number of items in each Special Section or fake Special Section.

  • INPUT: folders containing the postprocessed articles divided by Special Section or Special Section fake
  • OUTPUT: histogram in PNG format

Positional arguments:

  • directories: list of directories of Special Sections or fake Special Sections

Example of usage:

python SpecSecHist.py (Get-ChildItem -Path "SpecSec\SpecSec*").FullName
python SpecSecHist.py (Get-ChildItem -Path "SpecSecFake\SpecSecFake*").FullName

SpecSecGraph.py

Creates a graph for each Special Section or Special Section fake.

  • INPUT: folders containing the postprocessed articles divided by Special Section or Special Section fake
  • OUTPUT: graphs in png format

The weight of the sides of the graphs is given by the number of words that the postprocessed abstracts of two articles have in common.

Positional arguments:

  • directories: list of folders containing postprocessed articles divided by Special Section or Special Section fake

Example of usage:

python SpecSecGraph.py (Get-ChildItem -Path "SpecSec\SpecSec*").FullName
python SpecSecGraph.py (Get-ChildItem -Path "SpecSecFake\SpecSecFake*").FullName

SpecSecElab.py

Calculates the parameters of coherence between the articles of Special Section and Special Section fake.

  • INPUT: folders containing the postprocessed articles divided by Special Section; folders containing the postprocessed articles divided by Special Section fake
  • OUTPUT: CSV file called Spec_Sec_metrics.csv containing the parameters of coherence between the articles of the Special Sections; CSV file called Spec_Sec_fake_metrics.csv containing the parameters of coherence between the articles of the fake Special Sections

Positional arguments:

  • --spec_sec: list of Special Section folders
  • --spec_sec_fake: directory listing of fake Special Sections

Positional arguments:

  • -th: integer value of the threshold for calculating the coherence, i.e., number of words that two articles must have at least in common within their postprocessed abstract (if not specified, set to 10 by default)

Example of usage:

python SpecSecElab.py --spec_sec (Get-ChildItem -Path "SpecSec\SpecSec*").FullName --spec_sec_fake (Get-ChildItem -Path "SpecSecFake\SpecSecFake*").FullName

SpecSecBoxPlot.py

Creates a box plot for the Special Sections and a box plot for the fake Special Sections.

  • INPUT: CSV file for the Special Sections generated by SpecSecElab.py; CSV file for fake Special Sections generated by SpecSecElab.py
  • OUTPUT: box plots named SpecSec_BoxPlot.png and SpecSecFake_BoxPlot.png

Positional arguments:

  • spec_sec_metrics: CSV file with the coherence parameters of the Special Sections
  • spec_sec_fake_metrics: CSV file with the coherence parameters of the fake Special Sections

Example of usage:

python SpecSecBoxPlot.py Spec_Sec_metrics.csv Spec_Sec_fake_metrics.csv

SpecSecPlot.py

Creates a graph with two curves comparing the average coherence values of Special Section and fake Special Section.

  • INPUT: CSV file for the Special Sections generated by SpecSecElab.py; CSV file for fake Special Sections generated by SpecSecElab.py
  • OUTPUT: graph named Plot.png

Positional arguments:

  • spec_sec_metrics: CSV file with the coherence parameters of the Special Sections
  • spec_sec_fake_metrics: CSV file with the coherence parameters of the fake Special Sections

Example of usage:

python SpecSecPlot.py Spec_Sec_metrics.csv Spec_Sec_fake_metrics.csv