/google-patents-scraper

A simple scraper for the Google patents website I wrote as a freelance project

Primary LanguagePython

google-patents-scraper

A simple scraper for the Google patents website I wrote as a freelance project. Saves each patent's HTML, images and PDF in a directory.

  1. Requirements
  1. Command line parameters:
  -h, --help            show this help message and exit
  --start START         start patent id (default: None)
  --end END             end patent id (inclusive) (default: None)
  --output_dir OUTPUT_DIR
                        output directory (default: ./)
  --org {EP,US,WO,DE}   prefix of the organization publishing the patent
                        (default: EP)

example command line:
python scraper.py --start 234 --end 1872