This assessment was for my Introduction to Big Data Module.
Learning Goals & Outcomes
Learn to model, cleanse, normalize, shard, map, query and analyze substantial real-world data (230mb+);
Understand the data cleansing, normalization and sharding processes by writing PYTHON scripts to process and convert the data to first (cleansed) CSV and then (normalized) SQL;
Design and implement a relational (MySQL) database and then write a PYTHON script to pipe (import) the cleansed data into the appropriate tables ensuring all integrity constraints are met.
Construct and implement a set of SQL queries to extract data using various filters and constraints.
Map (forward engineer) the data to a NoSQL database of your choice (MongoDB, BaseX, CouchBase, ArangoDB etc.)
Write a short, reflective report on the learning outcomes you have achieved.
Get exposure to and learn the use of a range of data oriented technologies (databases, python & sql.)
Learn and use the MARKDOWN markup syntax.
DATA is in a zipped file.