/plos_corpus

parsing the plos corpus dump of fall 2016 (Python + R)

Primary LanguagePython

PLoS Parsing

Mix of python (first stage - to fix problems and parse the xml and calculate initial statistics) and R scripts (later statistics & plotting).

Comments and help welcome. This is all in the very early stages, fair amount of articles are missing/failing parsing. Stuff can always be more elaborate.