by Junqi Ma (jxm844@case.edu) and Tim Henderson (tim.tadh@gmail.com)
"-n 30000" is used to generate about 700 files whose sizes are larger than 300kb Example
./WarcExtractor -n 30000 --file crawl-file.warc.gz -o result-dir
TODO: add command input to give the size of html file