Dark Army AI Tutor Webscraper Preprocessing Pipeline Convert scientific papers into json with common structure https://github.com/allenai/s2orc-doc2json