textfilter
A hadoop programm analyze xml file containing large corpus of wikipedia pages and filter the pages with certain keywords(case insensitive).
hadoop jar textfilter-0.0.1-SNAPSHOT.jar input outpu keyword1 keyword2 keyword3
A hadoop programm analyze xml file containing large corpus of wikipedia pages and filter the pages with certain keywords.
JavaApache-2.0
A hadoop programm analyze xml file containing large corpus of wikipedia pages and filter the pages with certain keywords(case insensitive).
hadoop jar textfilter-0.0.1-SNAPSHOT.jar input outpu keyword1 keyword2 keyword3