Extract dates from a scanned and OCRed PDF (typically letters), detect and parse dates, and renames the file based on the first detected date in the document. Renamed to big-endian (i.e. YYYYMMDD) format.
- Python 3
- Python packages:
- pdfminer.six
- dateparser
Install python dependencies with pip:
$ pip install pdfminer.six dateparser
To rename a single file document.pdf
$ python pdf_date_renamer.py document.pdf
To rename all PDFs in the current directory (recursive)
$ python pdf_date_renamer.py .
To rename all PDFs in a specified directory
$ python pdf_date_renamer.py path_to_directory