Google Cloud Dataproc
This repository contains code and documentation for use with Google Cloud Dataproc.
Samples in this Repository
codelabs/opencv-haarcascade
provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.spark-tensorflow
provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.
See each directories README for more information.
Additional Dataproc Repositories
You can find more Dataproc resources in these github repositories:
- Dataproc initialization actions
- Dataproc Python examples
- Dataproc Java Bigtable sample
- Dataproc Spark-Bigtable samples
For more information
For more information, review the Dataproc
documentation. You can also
pose questions to the Stack
Overflow community
with the tag google-cloud-dataproc
.
See our other Google Cloud Platform github
repos for sample applications and
scaffolding for other frameworks and use cases.
Contributing changes
- See CONTRIBUTING.md
Licensing
- See LICENSE