/DS_Portfolio

This repo contains portfolio of my data science Projects.

MIT LicenseMIT

Portfolio

Data Science and ML Projects

portfolio

Analyze Olist eCommerce public dataset consisting of 100k orders using Postgres SQL for advanced in-depth analysis of the interaction between customers and sellers using complex queries. The project includes creating a database of tables, joining these tables and performing a statistical inquiry into the database to gain neccessary insights. link

Project Info
Type Of Problem Data Wrangling
Data Format RDBMS
Tools PostgreSQL, PgAdmin4
Language SQL
Keywords wrangling, selection, aggregates

The project consists of a highly imbalanced target class. Our goal is to recognize fraudulent credit card transactions so that customers are not charged for purchases they did not make. We deploy ensemble methods and boosting algorithms like Random Forest, Xgboost, and LightGBM. We further try to improve our prediction using a Voting classifier and stacking technique to get a better AUC score.link

Project Info
Type Of Problem Classification
Data Format csv, pickle, parquet
Tools imblearn, sklearn, xgboost, lightgbm, optuna
Language Python
Keywords imbalance, ensemble, boosting, stacking

In this project we try to predict the total fare for NYC yellow taxi data provided by TLC Inc. We use LightGBM boosting framework and Keras Deep Neural Network to implement regression model. We use Feature Engineering to add new feature to the existing data to improve our model.To access the robustness of our models we use root mean squared error. link

Project Info
Type Of Problem Regression
Data Format pickle, parquet
Tools sklearn, lightgbm, keras
Language Python
Keywords feature engineering, boosting,Neural Networks

The aim of this project is to provide an analytical perspective of Modern Portfolio Theory by applying statistical methods on various assumptions of the theory and test how these assumptions are significant within the modern financial system. We download data using twelvedata API for stocks listed on NYSE. link

Project Info
Type Of Problem Timeseries
Data Format json, api
Tools pandas, seaborn, numpy
Language Python
Keywords statistics, aggregates, simulation, finance

In this project we implement A/B testing techniques using Frequentist approach and Bayesian approach to test the significance of churn rate among two group of music listeners, we also implement permuatation-test technique to generate null distribution using random shuffling. link

Project Info
Type Of Problem A/B testing
Data Format csv
Tools pandas, seaborn, numpy, scipy
Language Python
Keywords frequentist, bayesian, hypothesis testing, permutation-test