Limit the parsing depth of the html parsing to avoid out of memory situations
GoogleCodeExporter opened this issue · 1 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
(using ver. 1.2.0)
1. HTMLParse "http://worldwidescience.org/topicpages/s.html". ArticleExtractor
is just fine for demonstration purposes.
With 8GB of JVM-memory, this will result in an out of memory exception.
Attached is a patch, which allows limiting the amount of TextBlocks being
created/appended by boilerpipe. If that limit is reached, boilerpipe will
ignore all further content from the parsed input.
Original issue reported on code.google.com by mstr...@gmail.com
on 25 Nov 2013 at 4:29
Attachments:
GoogleCodeExporter commented
Please change type to "enhancement"
Original comment by mstr...@gmail.com
on 26 Nov 2013 at 8:13