SparkMLCustomLibrary
This is a demo library for Spark ML related project
The purpose of this library is to demostrate:
- retrieve a csv file from S3
- create metadata along with data file,
- transfer into DataFrame,
- function to show visualization/chart in zeppelin
- combine with spark ml pipeline (TBD)
Compete Test cases are under /src/test
Build Instruction:
mvn clean install
Usage:
- quick retrieve a csv file with file name, using default s3 bucket in code
val preparedData: DataFrame = PrepareDataFromS3().getFileAsDF("table.csv")
- retrieve a csv file from a specific bucket, using filename and bucket name
val preparedData: DataFrame = PrepareDataFromS3().setBucket("snowf0xrawdata").getFileAsDF("table.csv")
- retrieve a csv file from S3, apply new meta data
val filePackage: FilePackage = PrepareDataFromS3().setBucket("snowf0xrawdata").getFileAsPackage("table.csv")
- in zeppelin
val filePackage:FilePackage =PrepareDataFromS3().setBucket("snowf0xrawdata").getFileAsPackage("table.csv")
filePackage.showZeppelinChart()