A Python program for analyzing bibliographic data in your browser. The current version supports data exported from the Thomson Reuters Web of Science.
The long-term purpose of this program is to create similar functionality as in the Histcite package, but using free software instead of proprietary. This way, it will be easier for researchers to modify specific functions for pursuing their research questions.
It should be noted that this software should be interpreted more as a suggestion or concept, not as a user-friendly and ready-made application.
Features include:
- Sort publications by Author, Years, Keywords, Source, Times Cited.
- Analyze Cited References.
- Export word co-occurrence networks and co-author networks as .gexf files for further analysis in Gephi.
- Easy-to-share output (a few static html files).
For a live demo, please visit this page.
This program is written in Python 3. The following external packages are required:
- pandas
- bokeh
- lxml
- pygexf
Depending on your platform, lxml and pygexf may have to be compiled from source rather than relying on the standard package managers. Please google a bit, things may be especially complicated in Windows regarding lxml.
This program uses Tsorter by Terrill Dent, which is licensed under the MIT licence.
Input data: Use with data exported as .tsv files from Web of Science. I have only tried with the "Mac OS/UTF-8" output option.
python3 Cite.py YOURDATA.tsv
The script will automatically start up a webserver running on port 8000. Go to localhost:8000
with your favourite browser (with support for Javascript). You can use the included example_data.tsv
as a test dataset.
The script will create a handful of html files and gexf network files.
- Large datasets (more than tens of thousands of articles) will eventually make the Javascript sorting function very slow. Currently the software will limit the number of records to the 500 most cited. Change the value in
for record in dict(collections.Counter(indexbodydict).most_common(500))
to increase this number.
- Enable larger data sets (se above).
- Enable more input files, such as Scopus, SwePub etc.
- Add more useful functionality.
- ??? (please send me a message to suggest further improvements).