Pure Python SequenceFile Reader and Writer implementation that allows you to read and write Hadoop sequence files without using Java.
python setup.py install
or in your project requirements.txt:
-e git+https://github.com/commoncrawl/python-hadoop.git@main#egg=hadoop
See examples how to read and write SequenceFiles and other file formats specific to Hadoop resp. MapReduce.
Author: Matteo Bertozzi theo.bertozzi@gmail.com (see the original repository)
Contributions to this fork:
- via bityon/python-hadoop
- Brian Bloniarz brian.bloniarz@gmail.com
- Alex Roper aroper@umich.edu
- Jeremy G. Kahn jeremy@trochee.net
- Python 3 migration
- Jing Conan Wang (@jingcwang: jingcwang/Hadoop
- Jie Tang (@jtang7: jtang7/Hadoop)
See the commit logs for a complete list of contributors.