Python module for reading and writing Apache ORC file format. It uses the Apache ORC's Core C++ API under the hood, and provides a similar interface as the csv module in the Python standard library.
Supports only Python 3.6 or newer and ORC 1.6.
- Reading ORC files.
- Writing ORC files.
- While using Python's stream/file-like object IO interface.
That sums up quite well the purpose of this project.
Minimal example for reading an ORC file:
import pyorc
with open("./data.orc", "rb") as data:
reader = pyorc.Reader(data)
for row in reader:
print(row)
And another for writing one:
import pyorc
with open("./new_data.orc", "wb") as data:
with pyorc.Writer(data, "struct<col0:int,col1:string>") as writer:
writer.write((1, "ORC from Python"))
Any contributions are welcome. If you would like to help in development fork or report issue here on Github. You can also help in improving the documentation.