Topics in Big Data Analytics (Big Data Analytics and Deep Learning Systems)
BDDL Fall 2018 @SNU Course GitHub: https://github.com/swsnu/bd2018
- Resource management
- YARN, Mesos, Borg
- Meta-framework
- REEF
- Dataflow Processing Framework
- MapReduce, Dryad, Spark
- High-level Data Processing
- Hive, Pig, FlumeJava, DryadLINQ, Beam
- Stream Processing
- Strom, Heron, SparkStreaming, Flink, MillWheel, Dataflow, Samza
- Machine Learning / Deep Learning Systems
- . TBA
Simple BeamSQL batch processing application for distributed query processing using Beam / Spark / Apache Nemo
- JRE 1.8
- Maven
- ND4J, DL4J
> ./download_datasets.sh
> ./run_spark
> ./run_nemo
fifa-ranking.csv
- FIFA Rankings by Rank Dates (1993-2018)
- Kaggle FIFA Ranking
wc2018-players.csv
- 2018 FIFA World Cup Players (32 countries * 23-man)
- Kaggle FIFA WORLD CUP 2018 Players
. simple queries
SELECT rank_num, country FROM RANKING WHEN rank_date = '2018-06-07'
SELECT rank_num, RANKING.COUNTRY, PLAYER.height, PLAYER.weight FROM PLAYER GROUP BY country"
SELECT rank_num, RANKING.COUNTRY, PLAYER.height, PLAYER.weight, BMI(PLAYER.height, PLAYER.weight) FROM RANKING INNER JOIN PLAYER ON RANKING.country = PLAYER.country
. requires a lot more
PIPELINE_OPTION
PIPELINE
PCOLLECTION
PTRANSFORMATION