/isi

Tools for abusing the Thomson Reuters (aka ISI) Web of Science

Primary LanguagePythonMIT LicenseMIT

ISI Tools

These are a bunch of tools for dealing with the ISI Web of Science. You can use them to extract, clean, and process article records in the ISI Flat File format.

Some examples of the sorts of work that can be done with this data:

The tools are very alpha at this stage and have a heavy Unix bias. Please submit bug reports and feature requests. I would love to be useful to the wider world.

ISI Scraper

Example

[kousu@galleon isi]$ ./isi_scrape.py [user name] [library barcode] SU=Sociology PY="2006-2015"
In using this to download records from the Web of Science, you should be aware of the terms of service:

Thomson Reuters determines reasonable  of data to download by comparing your download activity
against the average annual download rates for all Thomson Reuters clients using the product in question.
Thomson Reuters determines insubstantial  of downloaded data to mean an amount of data taken
from the product which (1) would not have significant commercial value of its own; and (2) would not act
as a substitute for access to a Thomson Reuters product for someone who does not have access to the product.

The authors of this software take no responsibility for your use of it. Don't get b&.

Started new UW Library Proxy session 'oUeCh7QJ89pR15H'
Logged into ISI as UW:[user name].
Got 69220 results
Ripping results.
Exporting records [1,501) to 2AiZ7oSbJ2Y7a2MctLA_0001.isi
Exporting records [501,1001) to 2AiZ7oSbJ2Y7a2MctLA_0501.isi
Exporting records [1001,1501) to 2AiZ7oSbJ2Y7a2MctLA_1001.isi
Exporting records [1501,2001) to 2AiZ7oSbJ2Y7a2MctLA_1501.isi
Exporting records [2001,2501) to 2AiZ7oSbJ2Y7a2MctLA_2001.isi
Exporting records [2501,3001) to 2AiZ7oSbJ2Y7a2MctLA_2501.isi
[...]
[kousu@galleon isi]$ ls PY\=2006-2015_SU\=Sociology/
2AiZ7oSbJ2Y7a2MctLA_0001.isi  2AiZ7oSbJ2Y7a2MctLA_1001.isi  2AiZ7oSbJ2Y7a2MctLA_2001.isi  parameters.txt
2AiZ7oSbJ2Y7a2MctLA_0501.isi  2AiZ7oSbJ2Y7a2MctLA_1501.isi  2AiZ7oSbJ2Y7a2MctLA_2501.isi  [...]
[kousu@galleon isi]$ cat PY\=2006-2015_SU\=Sociology/parameters.txt 
ISI scrape
==========

Query: PY=2006-2015 SU=Sociology
Records: 69220
ISI Session: 2AiZ7oSbJ2Y7a2MctLA
Date: 2015-03-12 13:20:38.785762

[kousu@galleon isi]$ 

A tip: once you are comfortable with the tool, write yourself a script isi_scrape.logined.sh like this

#!/bin/sh

HERE=$(dirname $0)
$HERE/isi_scrape.py -q <username> <password> "$@"

ISI Verify

Verifies the integrity of a scraped corpus.

TODO (not written yet)

ISI Join

Combine separate .isi files into a single file. This is needed for processing with sci^2.

Example

TODO

ISI Count

Counts the number of records in a set of ISI files.

Example

TODO