/wikipedia-hadoop

Wikipedia Inputformat and other useful Hadoop-related stuff

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

wikipedia-hadoop Build Status

Wikipedia Inputformat and some useful Wikipedia Hadoop utils.

Usage

At first you have to set the WikiInputFormat as your job InputFormat:

job.setInputFormatClass(WikiInputFormat.class);

Your Mappers incoming Key and Value need to be from the types LongWritable and WikiRevisionWritable.