Creating index for wikisource with MapReduce
Library dom4j is deprecated because we can't put a large file into memory,using sax to trasform xml to txt.
PosProcessor:Calculate position where words in the article and the total counts of words and articles.
DFandTFProcessor:Calculate df and tf info of each word.