CS 167 covers the data management and systems aspects of big data platforms such as Hadoop, Spark, and AsterixDB. In this course, you will learn how the data is stored in a distributed file system and how the queries run in parallel. The course will cover the following topics.
- An overview of big data management systems
- Distributed storage of big data
- Programming models in big data (e.g., MapReduce and RDD)
- Packages for big data analysis (e.g., SparkSQL, MLlib, and SparkR)
- An overview of key-value stores
- Big SQL systems (e.g., AsterixDB, Impala, and SparkSQL)