Apache Spark on Cloud Dataproc powered by Compute Engine

This hand-on lab covers working with Apache Spark on Cloud Dataproc powered by Google Compute Engine. It has modules that cover environment configuation n preparatation, creating Cloud Dataproc clusters, running Spark jobs via jobs, notebooks, and cleaning up the environment post lab. The purpose of the modules are to get users unfamiliar with Cloud Dataproc up and running quickly, to improve productivity and focus on data engineering and machine learning with Apache Spark versus environment and provisioning nuances.

Audience

Google Customer Engineers

Environment

Argolis

Lab Modules

Module Resource
1 Foundational Setup
2 Create a Spark Cluster
3 Submit Spark batch jobs
4 Spark notebooks
10 Clean up

Get started

By clicking on the module 1 above.

Dont forget to

Shut down/delete resources as needed.

Credits

This is a community effort by Google Cloud Data Analytics Specialist Engineers. Contributions are welcome.

# Contributor Contribution Team
1 Anagha Khanolkar Primary Author North America Technology Team
2 Jay O' Leary Author Sub-regional Technology Team