Data Science projects

This is a small repo intended to be a sandbox for statistical programming, data science and machine learning. I mainly use R for my programming although Python is a work in process. You can find a brief description of each project below.

Wine Quality prediction

This project makes use of several econometric modelling techniques to make predictions on the quality of white wine based on certain chemical components present in each one. Linear and Ridge regressions are used, as well as subset selection methods, LASSO, and Principal Components regression.

Credit card purchase fraud

The aim of this project is to predict and classify whether a given credit card transaction is fraudulent. The features given are principal components of the original features to preserve privacy in the data. Simple machine learning and statistical algorithms are used, such as logistic regression, linear and quadratic discriminant analysis and decision trees. Ensemble methods such as boosting, bagging and random forests are implemented as well.

Bank marketing analytics

This project stems from a dataset of marketing campaign calls made by a Portuguese banking institution to persuade clients to subscribe a deposit. This data can be used to fit prediction and classification models, as well as performing some interesting visual analysis.

Association rules and anomaly detection

This is an analysis of common combinations and rules of association within a recipe book. THere is also an application of association rules for tasks of anomaly detection.

Subgroup Discovery

Playground

The playground is where I host any standalone scripts or analysis that do not have the scope of a full project but are useful nonetheless to learn some key concepts.