/pyspark_gcs

GCS connector batteries for pyspark

Primary LanguagePythonApache License 2.0Apache-2.0

pyspark_gcs

Build status GitHub license

Raison d'être

pyspark package doesn't include GCS FS support. pyspark user needs to manually configure/install GCS jars. This package adds GCS batteries for pyspark. This is essentially a workaround for SPARK-33605.

Install

pip install pyspark_gcs

Usage

from pyspark_gcs import get_spark_session

spark = get_spark_session(service_account_keyfile_path="gcp_key.json")

spark is a pyspark session with GCS FS support. Because GCS connector doesn't yet support Default Application Credentials hadoop-connectors#59, as a user you need to provide service_account_keyfile_path or use GOOGLE_APPLICATION_CREDENTIALS environment variable.