All the details are in our paper.
- supports batch processing by HEPS.
- Usage:
$ ruby batch.rb <path_to_PhantomJS_binary> ./html-dir ./target-dir
- It is developed by using:
- CentOS release 6.5
- Ruby 2.1.2p95
- PhantomJS 2.0.1-development
- This implementation ignores the childNodes of IFRAME and NOSCRIPT elements as well as SCRIPT and STYLE elements.
- Current parameter values are roughly optimized for entire our data set (not only the training data set explained in our paper).