/cloud-dataproc

Samples for Cloud Dataproc

Primary LanguagePythonApache License 2.0Apache-2.0

Google Cloud Dataproc

This repository contains code and documentation for use with Google Cloud Dataproc.

Samples in this Repository

  • codelabs/opencv-haarcascade provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.
  • spark-tensorflow provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.

See each directories README for more information.

Additional Dataproc Repositories

You can find more Dataproc resources in these github repositories:

For more information

For more information, review the Dataproc documentation. You can also pose questions to the Stack Overflow community with the tag google-cloud-dataproc. See our other Google Cloud Platform github repos for sample applications and scaffolding for other frameworks and use cases.

Contributing changes

Licensing