/PDF-crawler

Primary LanguageJupyter Notebook

PDF extraction and JSON conversion

main.py runs pdf link extraction from crawler.py and conversion with pdf2json.py

future additions will be clustering based on format using clustering.py