/orc-metadata-reader

Python ORC metadata reader

Primary LanguageCMIT LicenseMIT

ORC Metadata Reader

Build Status Python version License: MIT

Library for reading ORC metadata in python.

Install

python setup.py install

Usage

Read a local file.

from orc_metadata.reader import read_metadata

# Read metadata from local ORC file
result = read_metadata('path/to/file.orc', schema=True)

Read S3 files.

from orc_metadata.reader import read_metadata_s3

# Read metadata from ORC files in S3
for result in read_metadata_s3('s3_bucket', 'prefix/path/partition=foo/', fetch_size=8000):
    yield result

Sample output can be found here.

Args

Name Default Meaning
schema False Get ORC schema.
file_stats False Get ORC file statistics.
stripe_stats False Get ORC stripes statistics.
stripes False Get ORC stripe footer information, requires full file scan.
fetch_size None Fetch the specified amount of trailing bytes from each file. If None, it will fetch the entire file. Only for read_metadata_s3.

Note

Reading ORC metadata will not require reading the entire file unless stripes=True. When using read_metadata_s3 you can specify fetch_size=N which will only fetch the trailing N bytes from each file in s3.

Supported compressions

  • zlib
  • Snappy
  • LZO
  • LZ4

Contributing

Linting

flake8 orc_metadata test --ignore=F401 --max-line-length=80

Testing

python test/runner.py

Code of Conduct

Please read the code of conduct.

Bugs

Please create an issue using this template.