hybrid extractor?
GoogleCodeExporter opened this issue · 0 comments
GoogleCodeExporter commented
Christian,
We have a corpus that is a mixture of news articles and other web pages, some
of which contain tables. The ArticleExtractor has trouble with many of these
other pages. Is there a hybrid extractor that detects when it would be better
to run KeepEverythingExtractor and when better to run ArticleExtractor?
Perhaps we should just use KeepEverything for now...?
Thanks!
jrf
Original issue reported on code.google.com by j...@mit.edu
on 27 Apr 2012 at 3:08