Library for reading ORC metadata in python.
python setup.py install
Read a local file.
from orc_metadata.reader import read_metadata
# Read metadata from local ORC file
result = read_metadata('path/to/file.orc', schema=True)
Read S3 files.
from orc_metadata.reader import read_metadata_s3
# Read metadata from ORC files in S3
for result in read_metadata_s3('s3_bucket', 'prefix/path/partition=foo/', fetch_size=8000):
yield result
Sample output can be found here.
Name | Default | Meaning |
---|---|---|
schema | False | Get ORC schema. |
file_stats | False | Get ORC file statistics. |
stripe_stats | False | Get ORC stripes statistics. |
stripes | False | Get ORC stripe footer information, requires full file scan. |
fetch_size | None | Fetch the specified amount of trailing bytes from each file. If None , it will fetch the entire file. Only for read_metadata_s3 . |
Reading ORC metadata will not require reading the entire file unless stripes=True
. When using read_metadata_s3
you can specify fetch_size=N
which will only fetch the trailing N bytes from each file in s3.
- zlib
- Snappy
- LZO
- LZ4
flake8 orc_metadata test --ignore=F401 --max-line-length=80
python test/runner.py
Please read the code of conduct.
Please create an issue using this template.