Simple PoC that demonstrate usage Java in AWS Glue ETL pipelines.
You can run these sample job scripts on any of AWS Glue ETL jobs, container, or local environment.
-
This sample ETL script shows you how to take advantage of both Spark and AWS Glue features to clean and transform data for efficient analysis. A Java version you can find in this file DataCleaningJob.java
-
This sample ETL script show usage DataFrame and Dataset on AWS Glue.
The following sections describe how to use the AWS Glue Java library and the AWS Glue API in ETL scripts, and provide reference documentation for the library.
JavaDataSink encapsulates a destination and a format that a JavaDynamicFrame can be written to.
com.github.vitalibo.glue.api.java.JavaDataSink
JavaDataSource encapsulates a source and format that a JavaDynamicFrame can be produced from.
com.github.vitalibo.glue.api.java.JavaDataSource
A JavaDynamicFrame is a distributed collection of self-describing DynamicRecord objects.
com.github.vitalibo.glue.api.java.JavaDynamicFrame
JavaGlueContext is the entry point for reading and writing a JavaDynamicFrame.
com.github.vitalibo.glue.api.java.JavaGlueContext