whym/wikihadoop
Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
Python
Issues
- 1
compatibility with elastic mapreduce?
#11 opened by GabrielF00 - 20
Using with "current" dump
#7 opened by DataJunkie - 3
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.isDirectory()Z
#12 opened by ravisg - 2
- 5
download link broken
#10 opened by GabrielF00 - 4
Using cloudera distribution
#8 opened by Fkawala - 2
Missing revisions
#2 opened by whym - 0
Non-uniform progress report
#6 opened by whym - 0
Generalize the splitter for non-Wikipedia XMLs
#5 opened by whym - 0
Connect to the Python differ
#4 opened by whym - 0
Connect to the Python differ
#3 opened by whym