This assessment was for my Introduction to Big Data Module.

Learning Goals & Outcomes

Learn to model, cleanse, normalize, shard, map, query and analyze substantial real-world data (230mb+);

Understand the data cleansing, normalization and sharding processes by writing PYTHON scripts to process and convert the data to first (cleansed) CSV and then (normalized) SQL;

Design and implement a relational (MySQL) database and then write a PYTHON script to pipe (import) the cleansed data into the appropriate tables ensuring all integrity constraints are met.

Construct and implement a set of SQL queries to extract data using various filters and constraints.

Map (forward engineer) the data to a NoSQL database of your choice (MongoDB, BaseX, CouchBase, ArangoDB etc.)

Write a short, reflective report on the learning outcomes you have achieved.

Get exposure to and learn the use of a range of data oriented technologies (databases, python & sql.)

Learn and use the MARKDOWN markup syntax.

DATA is in a zipped file.