/gcs-tools

GCS support for avro-tools, parquet-tools and protobuf

Primary LanguageJavaApache License 2.0Apache-2.0

GCS Tools

Build Status GitHub license

Raison d'être:

Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-tools and proto-tools for Scio's Protobuf in Avro file, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.

It uses your existing OAuth2 credentials and allows authentication via a browser.

Usage:

You can install the tools via our Homebrew tap on Mac.

brew tap spotify/public
brew install gcs-avro-tools gcs-parquet-tools gcs-proto-tools
avro-tools tojson <GCS_PATH>
parquet-tools cat <GCS_PATH>
proto-tools tojson <GCS_PATH>

Or build them yourself.

sbt assembly
java -jar avro-tools/target/scala-2.12/avro-tools-1.8.2.jar tojson <GCS_PATH>
java -jar parquet-tools/target/scala-2.12/parquet-tools-1.10.1.jar cat <GCS_PATH>
java -jar proto-tools/target/scala-2.12/proto-tools-3.4.0.jar cat <GCS_PATH>

How it works:

To make avro-tools and parquet-tools work with GCS we need:

GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.