Exact Editions Issue Pages Scraper
Simple scraper to download pdf pages from Exact Editions magazines. Using CasperJS and PhantomJs. Only usable with a subscription.
Please only use if you have a subscription to an existing magazine and do not use to scrape for distribution. Respect intellectual property -- writers and artists gotta make a living too :)
Please read Exact Edition's terms of service if in doubt.
Installation
Dependencies
Npm
To install dependencies
npm install -g
Usage
###Fetch all pages of one issue Clone this repo. Cd to the folder.
Then run in terminal
casperjs getissue.js --username=<your EE username> --password=<your EE password> <issue_link_1>...<issue_link_n>
###Fetch specific pages of one issue Usually when a few pages from getissue.js fails.
Run in terminal
casperjs getissue.js --username=<your EE username> --password=<your EE password> --pages=<page 1>:<prefix 1>,<page 2>:<prefix 2> <issue_link>
Note that prefix is used in the following way for file naming (to be consistent with getissue.js named files:
<issue title>-<prefix>-<page label number>.pdf
For example:
casperjs getpage.js --username=name@example.com --password=example --pages=OFC:001,11:ABC http://www.exacteditions.com/read/popshot/the-time-issue-40247
Will download the following
- The Time Issue-001-OFC.pdf
- The Time Issue-ABC-11.pdf
Files will be downloaded to 'download' child directory
Compiling to one PDF
Used PDFTK on command line for this. You will have to install it.
pdftk *.pdf cat output output.pdf