Wikipedia Inputformat and some useful Wikipedia Hadoop utils.
At first you have to set the WikiInputFormat
as your job InputFormat:
job.setInputFormatClass(WikiInputFormat.class);
Your Mappers incoming Key and Value need to be from the types LongWritable
and WikiRevisionWritable
.