Utility to inspect Parquet files.
parquet-tools
support following methods to install:
- Download pre-built binaries
- brew install on Mac
- Container image
- Install from source
- Prebuilt packages
Once it is installed you can refer to usage page for details of how to use the tool.
This project is inspired by:
- parquet-go/parquet-tools: https://github.com/xitongsys/parquet-go/tree/master/tool/parquet-tools/
- Python parquet-tools: https://pypi.org/project/parquet-tools/
- Java parquet-tools: https://mvnrepository.com/artifact/org.apache.parquet/parquet-tools
- Makefile: https://github.com/cisco-sso/kdk/blob/master/Makefile
Some test cases are from:
- https://registry.opendata.aws/binding-db/
- https://github.com/xitongsys/parquet-go/tree/master/example/
- https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet
- https://azure.microsoft.com/en-us/services/open-datasets/catalog/
- https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Tools used:
- https://golang.org/
- https://github.com/golangci/golangci-lint
- https://github.com/jstemmer/go-junit-report
- https://circleci.com/
TODO list is tracked as enhancement in issues.