/pdf_scraper

scrape web pages and download pdf files

Primary LanguageRuby

##Parsing HTML and downloading pdf links

Simple ruby script, which downloads pdf links on single webpage.

Sample output:

amies-air:pdf_scraper jxberc$ ruby cjp_scraper.rb
There are 22 papers in Volume 52, Number 1-I:
Fetching...file 1: Historical Review on Analytic,
  ...Success, saved as downloads/Volume 52, Number 1-I/Historical Review on Analytic, .pdf
Fetching...file 2: Dynamics of the Quantum Correla
  ...Success, saved as downloads/Volume 52, Number 1-I/Dynamics of the Quantum Correla.pdf
Fetching...file 3: Entanglement Percolation of Sma
  ...Success, saved as downloads/Volume 52, Number 1-I/Entanglement Percolation of Sma.pdf
Fetching...file 4: Evolution of the Universe with
  ...Success, saved as downloads/Volume 52, Number 1-I/Evolution of the Universe with .pdf
Fetching...file 5: An Optimal Choice of Reference
  ...Success, saved as downloads/Volume 52, Number 1-I/An Optimal Choice of Reference .pdf
Fetching...file 6: Muon-Electron Hyperfine Couplin
  ...Success, saved as downloads/Volume 52, Number 1-I/Muon-Electron Hyperfine Couplin.pdf
Fetching...file 7: Photodetachment of a Hydrogen N
  ...Success, saved as downloads/Volume 52, Number 1-I/Photodetachment of a Hydrogen N.pdf

...truncated for brevity...

Fetching...file 22:
  ...Success, saved as downloads/Volume 52, Number 1-I/.pdf

Downloads Complete (Volume 52, Number 1-I)

There are 23 papers in Volume 52, Number 1-II:
Fetching...file 1: Preface
  ...Success, saved as downloads/Volume 52, Number 1-II/Preface.pdf
Fetching...file 2: Probing and Controlling Autoion
  ...Success, saved as downloads/Volume 52, Number 1-II/Probing and Controlling Autoion.pdf
jamies-air:pdf_scraper jxberc$