/java-dataproc

Primary LanguageJavaApache License 2.0Apache-2.0

Google Dataproc Client for Java

Java idiomatic client for Dataproc.

Maven Stability

Quickstart

If you are using Maven with BOM, add this to your pom.xml file

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>9.1.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-dataproc</artifactId>
  </dependency>
</dependencies>

If you are using Maven without BOM, add this to your dependencies:

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-dataproc</artifactId>
  <version>1.0.0</version>
</dependency>

If you are using Gradle, add this to your dependencies

compile 'com.google.cloud:google-cloud-dataproc:1.1.0'

If you are using SBT, add this to your dependencies

libraryDependencies += "com.google.cloud" % "google-cloud-dataproc" % "1.1.0"

Authentication

See the Authentication section in the base directory's README.

Getting Started

Prerequisites

You will need a Google Cloud Platform Console project with the Dataproc API enabled. You will need to enable billing to use Google Dataproc. Follow these instructions to get your project set up. You will also need to set up the local development environment by installing the Google Cloud SDK and running the following commands in command line: gcloud auth login and gcloud config set project [YOUR PROJECT ID].

Installation and setup

You'll need to obtain the google-cloud-dataproc library. See the Quickstart section to add google-cloud-dataproc as a dependency in your code.

About Dataproc

Dataproc is a faster, easier, more cost-effective way to run Apache Spark and Apache Hadoop.

See the Dataproc client library docs to learn how to use this Dataproc Client Library.

Samples

Samples are in the samples/ directory. The samples' README.md has instructions for running the samples.

Sample Source Code Try it
Create Cluster source code Open in Cloud Shell
Create Cluster With Autoscaling source code Open in Cloud Shell
Instantiate Inline Workflow Template source code Open in Cloud Shell
Quickstart source code Open in Cloud Shell
Submit Hadoop Fs Job source code Open in Cloud Shell

Troubleshooting

To get help, follow the instructions in the shared Troubleshooting document.

Transport

Dataproc uses gRPC for the transport layer.

Java Versions

Java 7 or above is required for using this client.

Versioning

This library follows Semantic Versioning.

Contributing

Contributions to this library are always welcome and highly encouraged.

See CONTRIBUTING for more information how to get started.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See Code of Conduct for more information.

License

Apache 2.0 - See LICENSE for more information.

CI Status

Java Version Status
Java 7 Kokoro CI
Java 8 Kokoro CI
Java 8 OSX Kokoro CI
Java 8 Windows Kokoro CI
Java 11 Kokoro CI