PDF extraction and JSON conversion main.py runs pdf link extraction from crawler.py and conversion with pdf2json.py future additions will be clustering based on format using clustering.py