fbukevin/hadoop-cooccurrence

This is a hadoop project implementing words co-occurrence algorithm.

Java

Introduction

This is a project that implements co-occurrence of text words algorithm with Hadoop. Here we follow the tutorial of book implementing Pairs and Stripes algorithm.

Usage

$ yarn jar <hadoop>.jar [pairs | stripes] <input_file>

tutorial

Book - Data-Intensive Processing with MapReduce
Original Blog - Calculating a Co-Occurrence Matrix With Hadoop
Adapter to this repo - 以 Hadoop 計算共現矩陣