Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-tools and proto-tools for Scio's Protobuf in Avro file, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.
It uses your existing OAuth2 credentials and allows authentication via a browser.
You can install the tools via our Homebrew tap on Mac.
brew tap spotify/public
brew install gcs-avro-tools gcs-parquet-tools gcs-proto-tools
avro-tools tojson <GCS_PATH>
parquet-tools cat <GCS_PATH>
proto-tools tojson <GCS_PATH>
Or build them yourself.
sbt assembly
java -jar avro-tools/target/scala-2.12/avro-tools-1.8.2.jar tojson <GCS_PATH>
java -jar parquet-tools/target/scala-2.12/parquet-tools-1.10.1.jar cat <GCS_PATH>
java -jar proto-tools/target/scala-2.12/proto-tools-3.4.0.jar cat <GCS_PATH>
To make avro-tools and parquet-tools work with GCS we need:
- GCS connector and its dependencies
- GCS connector configuration
GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.