/WQU-data-science-challenges

I successfully completed a 2-unit, 16-week and 6 mini-projects of the Data Science module at WorldQuant University. The mini-projects included scientific computing, data wrangling, machine learning and natural language processing with Python.

Primary LanguageJupyter Notebook

WQU-data-science-challenges

To complete the 2-unit, 16-week Applied Data Science Module of WorldQuant University, students are required to succeed 6 mini-projects in total. I have successfully completed them and maintained a cumulative average score of 90% or above. The mini projects are as follows:

Applied Data Science Unit I - Scientific Computing and Python

In mini project 1, students use Python to compute Mersenne numbers, using the Lucas-Lehmer test to identify Mersenne numbers that are prime. They had to use Python data structures and core programming principles such as for loops to implement their solution. Further, they had to implement the Sieve of Eratosthenes as a faster solution for checking if numbers are prime, learning about the importance of algorithm time complexity.

In mini project 2, students used Object Oriented Programming to create a class that represents a geometric point. They define methods that describe common operations with points such as adding two points together and finding the distance between two points. Finally, they wrote a K-means clustering algorithm that uses the previous defined point class.

In mini project 3, students used basic Python data structures, functions, and control program flow to answer posed questions over medical data from the British NHS on prescription drugs. They had to use fundamental data wrangling techniques such as joining data sets together, splitting data into groups, and aggregating data into summary statistics.

In mini project 4, students used the Python package pandas to perform data analysis on a prescription drug data set from the British NHS. They answered questions such as identifying what medical practices prescribe opioids at an usually high rate and what practices are prescribing substantially more rare drugs compared to the rest of the medical practices. They used statistical concepts like z-score to help identify the aforementioned practices.

Applied Data Science Unit II - Machine Learning & Statistical Analysis

In mini project 5, students worked with nursing home inspection data from the United States, predicting which providers may be fined and for how much. They used the scikit-learn Python package to construct progressively more complicated machine learning models. They had to impute missing values, apply feature engineering, and encode categorical data.

In mini project 6, students used natural language processing to train various machine learning models to predict an Amazon review rating based on the text of the review. Further, they used one of the trained models to gain insight on the reviews, identifying words that are highly polar. With these highly polar words identified, one can understand what words highly influence the model’s prediction.