Analyze Olist eCommerce public dataset consisting of 100k orders using Postgres SQL for advanced in-depth analysis of the interaction between customers and sellers using complex queries. The project includes creating a database of tables, joining these tables and performing a statistical inquiry into the database to gain neccessary insights. link
Project Info | |
---|---|
Type Of Problem | Data Wrangling |
Data Format | RDBMS |
Tools | PostgreSQL, PgAdmin4 |
Language | SQL |
Keywords | wrangling, selection, aggregates |
The project consists of a highly imbalanced target class. Our goal is to recognize fraudulent credit card transactions so that customers are not charged for purchases they did not make. We deploy ensemble methods and boosting algorithms like Random Forest, Xgboost, and LightGBM. We further try to improve our prediction using a Voting classifier and stacking technique to get a better AUC score.link
Project Info | |
---|---|
Type Of Problem | Classification |
Data Format | csv, pickle, parquet |
Tools | imblearn, sklearn, xgboost, lightgbm, optuna |
Language | Python |
Keywords | imbalance, ensemble, boosting, stacking |
In this project we try to predict the total fare for NYC yellow taxi data provided by TLC Inc. We use LightGBM boosting framework and Keras Deep Neural Network to implement regression model. We use Feature Engineering to add new feature to the existing data to improve our model.To access the robustness of our models we use root mean squared error. link
Project Info | |
---|---|
Type Of Problem | Regression |
Data Format | pickle, parquet |
Tools | sklearn, lightgbm, keras |
Language | Python |
Keywords | feature engineering, boosting,Neural Networks |
The aim of this project is to provide an analytical perspective of Modern Portfolio Theory by applying statistical methods on various assumptions of the theory and test how these assumptions are significant within the modern financial system. We download data using twelvedata API for stocks listed on NYSE. link
Project Info | |
---|---|
Type Of Problem | Timeseries |
Data Format | json, api |
Tools | pandas, seaborn, numpy |
Language | Python |
Keywords | statistics, aggregates, simulation, finance |
In this project we implement A/B testing techniques using Frequentist approach and Bayesian approach to test the significance of churn rate among two group of music listeners, we also implement permuatation-test technique to generate null distribution using random shuffling. link
Project Info | |
---|---|
Type Of Problem | A/B testing |
Data Format | csv |
Tools | pandas, seaborn, numpy, scipy |
Language | Python |
Keywords | frequentist, bayesian, hypothesis testing, permutation-test |