/GooglePatentsPdfDownloader

Download patents as PDF documents from Google Patents

Primary LanguagePythonMIT LicenseMIT

Google Patents PDF Downloader

Download patents as PDF documents from Google Patents

Installation

You can install the development version from GitHub with:

pip install git+https://github.com/lorenzbr/GooglePatentsPdfDownloader.git

Please make sure you have Google Chrome and the corresponding chromedriver.exe (see here) installed to access the website using Selenium.

Run GooglePatentsPdfDownloader

python -m GooglePatentsPdfDownloader
  patent      Patent number(s) to be downloaded

optional arguments:
  --driver    Path and file name of the Chrome driver exe
  --brave     Switch application from Google Chrome to Brave.
  --output    An output path where documents are saved. Default ./pdf
  --time      Waiting time in seconds for each request.
  --rm-kind   A list containing the patent kind codes which should be removed from patent numbers

Examples

Download a single patent to the current working directory (not found w/ kind code).

python -m GooglePatentsPdfDownloader US4405829A1 --rm_kind A1
python -m GooglePatentsPdfDownloader EP0551921B1

Download multiple patents using a list of inputs to directory ./patents.

python -m GooglePatentsPdfDownloader US4405829 EP0551921B1 --output "./patents"

With Brave browser download multiple patents using a txt file to director ./pdf.

python -m GooglePatentsPdfDownloader docs/data/patents.txt --brave

Examples (modular)

from GooglePatentsPdfDownloader import PatentDownloader
patent_downloader = PatentDownloader(chrome_driver='chromedriver.exe', brave=True)

# Download a single patent to the current working directory (not found w/ kind code)
patent_downloader.download(patent="US4405829A1", remove_kind_codes=['A1'])
patent_downloader.download(patent="EP0551921B1")


# Download multiple patents using a list of inputs to the current working directory
patent_downloader.download(
    patent=["US4405829A1", "EP0551921B1", "EP1304824B1"],
    output_path="./pdf_files",
    remove_kind_codes=["A1"]
)

# Download multiple patents using a txt file to the current working directory
patent_downloader.download(
    patent="docs/data/patents.txt", 
    output_path="",
    remove_kind_codes=["A1"]
)

License

This repository is licensed under the MIT license.

See here for further information.