/python-hadoop

python-hadoop

Primary LanguagePythonApache License 2.0Apache-2.0

Python Hadoop I/O Utilities

Pure Python SequenceFile Reader and Writer implementation that allows you to read and write Hadoop sequence files without using Java.

Installation

python setup.py install

or in your project requirements.txt:

-e git+https://github.com/commoncrawl/python-hadoop.git@main#egg=hadoop

Usage

See examples how to read and write SequenceFiles and other file formats specific to Hadoop resp. MapReduce.

Credits

Author: Matteo Bertozzi theo.bertozzi@gmail.com (see the original repository)

Contributions to this fork:

See the commit logs for a complete list of contributors.