pyspark package doesn't include GCS FS support. pyspark user needs to manually configure/install GCS jars. This package adds GCS batteries for pyspark. This is essentially a workaround for SPARK-33605.
pip install pyspark_gcs
from pyspark_gcs import get_spark_session
spark = get_spark_session(service_account_keyfile_path="gcp_key.json")
spark
is a pyspark session with GCS FS support. Because GCS connector doesn't yet
support Default Application Credentials hadoop-connectors#59, as a user you need to provide service_account_keyfile_path
or use
GOOGLE_APPLICATION_CREDENTIALS
environment variable.