/aws-glue-java

Simple PoC that demonstrate usage Java in AWS Glue ETL pipelines.

Primary LanguageJava

AWS Glue ETL Java

Simple PoC that demonstrate usage Java in AWS Glue ETL pipelines.

Examples

You can run these sample job scripts on any of AWS Glue ETL jobs, container, or local environment.

  • Clean and Process

    This sample ETL script shows you how to take advantage of both Spark and AWS Glue features to clean and transform data for efficient analysis. A Java version you can find in this file DataCleaningJob.java

  • Spark API

    This sample ETL script show usage DataFrame and Dataset on AWS Glue.

Programming AWS Glue ETL Scripts in Java

The following sections describe how to use the AWS Glue Java library and the AWS Glue API in ETL scripts, and provide reference documentation for the library.

JavaDataSink

JavaDataSink encapsulates a destination and a format that a JavaDynamicFrame can be written to.

com.github.vitalibo.glue.api.java.JavaDataSink

JavaDataSource

JavaDataSource encapsulates a source and format that a JavaDynamicFrame can be produced from.

com.github.vitalibo.glue.api.java.JavaDataSource

JavaDynamicFrame

A JavaDynamicFrame is a distributed collection of self-describing DynamicRecord objects.

com.github.vitalibo.glue.api.java.JavaDynamicFrame

JavaGlueContext

JavaGlueContext is the entry point for reading and writing a JavaDynamicFrame.

com.github.vitalibo.glue.api.java.JavaGlueContext