Portfolio

Data Science and ML Projects

Olist Database Analysis

Analyze Olist eCommerce public dataset consisting of 100k orders using Postgres SQL for advanced in-depth analysis of the interaction between customers and sellers using complex queries. The project includes creating a database of tables, joining these tables and performing a statistical inquiry into the database to gain neccessary insights. link

	Project Info
Type Of Problem	Data Wrangling
Data Format	RDBMS
Tools	PostgreSQL, PgAdmin4
Language	SQL
Keywords	wrangling, selection, aggregates

Credit Card Fraud Prediction

The project consists of a highly imbalanced target class. Our goal is to recognize fraudulent credit card transactions so that customers are not charged for purchases they did not make. We deploy ensemble methods and boosting algorithms like Random Forest, Xgboost, and LightGBM. We further try to improve our prediction using a Voting classifier and stacking technique to get a better AUC score.link

	Project Info
Type Of Problem	Classification
Data Format	csv, pickle, parquet
Tools	imblearn, sklearn, xgboost, lightgbm, optuna
Language	Python
Keywords	imbalance, ensemble, boosting, stacking

NYC Yellow Taxi Fare Prediction

In this project we try to predict the total fare for NYC yellow taxi data provided by TLC Inc. We use LightGBM boosting framework and Keras Deep Neural Network to implement regression model. We use Feature Engineering to add new feature to the existing data to improve our model.To access the robustness of our models we use root mean squared error. link

	Project Info
Type Of Problem	Regression
Data Format	pickle, parquet
Tools	sklearn, lightgbm, keras
Language	Python
Keywords	feature engineering, boosting,Neural Networks

Modern Portfolio Theory

The aim of this project is to provide an analytical perspective of Modern Portfolio Theory by applying statistical methods on various assumptions of the theory and test how these assumptions are significant within the modern financial system. We download data using twelvedata API for stocks listed on NYSE. link

	Project Info
Type Of Problem	Timeseries
Data Format	json, api
Tools	pandas, seaborn, numpy
Language	Python
Keywords	statistics, aggregates, simulation, finance

A/B Testing KKBOX churn rate

In this project we implement A/B testing techniques using Frequentist approach and Bayesian approach to test the significance of churn rate among two group of music listeners, we also implement permuatation-test technique to generate null distribution using random shuffling. link

	Project Info
Type Of Problem	A/B testing
Data Format	csv
Tools	pandas, seaborn, numpy, scipy
Language	Python
Keywords	frequentist, bayesian, hypothesis testing, permutation-test