This hand-on lab covers working with Apache Spark on Cloud Dataproc powered by Google Compute Engine. It has modules that cover environment configuation n preparatation, creating Cloud Dataproc clusters, running Spark jobs via jobs, notebooks, and cleaning up the environment post lab. The purpose of the modules are to get users unfamiliar with Cloud Dataproc up and running quickly, to improve productivity and focus on data engineering and machine learning with Apache Spark versus environment and provisioning nuances.
Google Customer Engineers
Argolis
Module | Resource |
---|---|
1 | Foundational Setup |
2 | Create a Spark Cluster |
3 | Submit Spark batch jobs |
4 | Spark notebooks |
10 | Clean up |
By clicking on the module 1 above.
Shut down/delete resources as needed.
This is a community effort by Google Cloud Data Analytics Specialist Engineers. Contributions are welcome.
# | Contributor | Contribution | Team |
---|---|---|---|
1 | Anagha Khanolkar | Primary Author | North America Technology Team |
2 | Jay O' Leary | Author | Sub-regional Technology Team |