sciscraper
is a Python package and command line interface (CLI) for scraping scientific articles from various sources.
It should work on Python 3.8+; on MacOS, Windows, and Linux.
You can install sciscraper via pip:
pip install sciscraper
sciscraper
offers the following scraping choices:
- directory: takes a directory of .pdf files, and returns a .csv file of bibliographic data for each;
- wordscore: takes a .csv file of bibliographic data for multiple papers and returns a .csv with a percentage value of its relevance to the configured query;
- citations: takes a .csv file of bibliographic data for multiple papers and returns a .csv of their citations (i.e. the ensuing papers that cited them);
- reference: takes a .csv file of bibliographic data for multiple papers and returns a .csv of their references (i.e. the papers that were referenced in the originals);
- download: experimental takes a .csv file of bibliographic data for multiple papers, attempts to download .pdfs of each into a directory; and,
- images: takes a .csv file of bibliographic data for multiple papers, attempts to download charts and figures of the papers from SemanticScholar.
You can initialize sciscraper
from the terminal by entering sciscraper
followed by -m
, the designated scraping choice (see above), and the target file. For example:
sciscraper -m directory <folder pathname goes here...>
Or alternatively:
sciscraper -m wordscore <filename.csv>
And so forth.
- PART ONE: -> https://youtu.be/MXM6VEtf8SE
- PART TWO: -> https://www.youtube.com/watch?v=6ac4Um2Vicg
- ArjanCodes
- Michele Cotrufo
- Nathan Lippi
- Jon Watson Rooney
- Colin Meret
- James Murphy (mCoding)
- Micael Jarniac
John Fallot john.fallot@gmail.com
The MIT License Copyright (c) 2021- John Fallot