aidenybai/docscan

👓 Scans documents and returns strings

PythonMIT

Docscan

Docscan is a lightweight document scanner. It allows users to open up document types and return the information inside as strings via regex.

Requirements:

zipfile
io
re
XML

Usage: Note: fileName must be in the directory Example: DocuScan("C:\Users\You\Desktop\folder1\test.pdf")

Instantiate class Docscan('fileName').
use print(variable.returnFileText())
use print(variable.executeRegex('regex here'))
use print(executeHeaderRegex('regex here'))
use print(executeFooterRegex('regex here'))

Methods:

returnFileText() - Returns the text of a file.
executeRegex(regexExpression) - creates a list of all matching cases of regexExpression
executeHeaderRegex(regularExpression) - creates a list of all matching cases of regexExpression in the header XML.
executeFooterRegex(regularExpression) - creates a list of all matching cases of regexExpression in the Footer XML.