This is a project that implements co-occurrence of text words algorithm with Hadoop. Here we follow the tutorial of book implementing Pairs and Stripes algorithm.
$ yarn jar <hadoop>.jar [pairs | stripes] <input_file>
- Book - Data-Intensive Processing with MapReduce
- Original Blog - Calculating a Co-Occurrence Matrix With Hadoop
- Adapter to this repo - 以 Hadoop 計算共現矩陣