WQU_data_science_lab

The WQU Applied Data Science Lab is a sequence of eight projects, where you solve real-world problems using data science tools. Each project consists of four lessons and one assignment. You'll work through the lessons at your own pace, following my code-along videos and exchanging ideas with other learners in the forum. Lessons are for experimentation (learning new concepts and practicing new coding techniques), so they're not graded. In the assignment, you'll demonstrate that you can successfully apply what you've learned in the lessons to a new dataset

I had the opportunity to work on the following projects:

  1. HOUSING IN MEXICO: This project involved using a dataset of 21,000 properties to determine if real estate prices are influenced more by property size or location. It entailed importing and cleaning data from a CSV file, building data visualizations, and examining the relationship between two variables using correlation.

  2. APARTMENT SALES IN BUENOS AIRES: Here we built a linear regression model to predict apartment prices in Argentina. We also created a data pipeline to impute missing values and encode categorical features, and improved model performance by reducing overfitting.

  3. AIR QUALITY IN NAIROBI: In this project I built an ARMA time-series model to predict particulate matter levels in Kenya. I extracted data from a MongoDB database using pymongo, and improved model performance through hyperparameter tuning.

  4. EARTHQUAKE DAMAGE IN NEPAL: This one was about building logistic regression and decision tree models to predict earthquake damage to buildings. it involved extracting data from SQLite database, and revealing the biases in data that can lead to discrimination.

  5. BANKRUPTCY IN POLAND: This project was all about building random forest and gradient boosting models to predict whether a company will go bankrupt. We navigated the Linux command line, addressed imbalanced data through resampling, and considered the impact of performance metrics precision and recall.

  6. CUSTOMER SEGMENTATION IN THE US: In this project we built a k-means model to cluster US consumers into groups. We used principal component analysis (PCA) for data visualization and created an interactive dashboard with Plotly Dash.

  7. A/B TESTING AT WORLDQUANT UNIVERSITY: Here we conducted a chi-square test to determine if sending an email can increase program enrollment at WQU. We built custom Python classes to implement an ETL process and created an interactive data application following a three-tiered design pattern.

  8. VOLATILITY FORECASTING IN INDIA: In this final project I was able to create a GARCH time series model to predict asset volatility. I acquired stock data through an API, clean and store it in a SQLite database, and built my own API to serve model predictions.