Big-Data-Platform-Programming

What does it from

The path following the course from Prof. Lau Wing-Cheong, great thanks to his hard work preparing all the big data technology introductions from theory on papers to practical examples, those are really challenging courses but worth it. ❤️

What is it about

This repository includes my learning path of big data platform programming, including Hadoop MapReduce stack, Spark, Spark Streaming, Pig, Hive, setup steps of Hadoop and Spark over Kubernetes, usage of docker. I used database query language, Python, Scala and Shell programming to finish it. 📚

The content and details for the knowledge I learnt from this class:

  1. https://www.mubucm.com/doc/7oUDxRXRs58

Future Work

In the future, I will add more detailed theory 📑 notes (learn from papers) for the current tools used in this repository, more useful tools for data streaming processing including Kafka, Spark Streaming and Flink etc practical introductions. Also, find out more behind the scene, currently I am interested in distributed KV system like etcd and its implementation of Raft. ❔