/extract-ti-from-reports

Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.

Primary LanguageJupyter Notebook

pdf_to_text

Uses the pdfminer.six library to perform the task of converting .PDF to .TXT

pip install pdfminer.six

text_to_json_ti

Converts .TXT to .JSON, using regular expressions to separate JSON items by predetermined fields.
URL and Filename items are extracted along with any incorrect information (not malicious) to create a whitelist array for filtering.

field_to_excel

Reads all of the specific field data from the .JSON files, dataframes them, and saves them to an .XLSX file with statistics as needed.