This tool is used for
text processing and analysis of multiple folders of multiple PDF files(currently not working)- Combining search results from multiple journal databases and annotating articles with journal rankings
- Google Scholar too often returns "unscholarly" articles.
- Using "scholarly" journal databases results in duplicates and combining multiple search results into a single spreadsheet
If a researcher had two research questions:
RQ1: How are literature reviews automated?
RQ2: What are the meta concepts of literature reviews?
Then search keyword sets would look something like:
SKS1: (literature AND review) AND (automated)
SKS2: (literature AND review) AND (meta)
Searching for these SKSs across five journal databases would result in 10 result sets which would then have to be:
- Checked for duplicates
- Checked for journal reputation
- Combined into a single usable spreadsheet
By using this tool, these 10 result sets still must be searched and downloaded, but steps 1-3 are now automated.
- Download this codebase
- Go to webofknowledge.com and search for
(literature AND review) AND (automated)
. - Click on "Export" and download the Excel file of the results.
- Create a folder in
input
calledsks1
- Move your downloaded file into
sks1
- Repeat the previous on scopus.com
- Repeat the previous with the search
(literature AND review) AND (meta)
and folder namesks2
- Run the program
- Open
combined_searches.xlsx
and see that your search keyword set results have been combined, duplicates have been removed, journal rankings have been assigned, and the data has been normalized.
- Check that
python
andgit
are properly installed. If not, google how to do that. - Check that
pip
is installed. If not, google how to do that. - Git clone this project
- Run
pip install -r requirements.txt
- Run
python3 runner.py
- Keyword search functionality across folders/pdfs
- Make metadata an xlsx file instead of json for better UX