/Data-Science-Journey

Primary LanguageJupyter NotebookMIT LicenseMIT

Data Science Journey

🔗 LinkedIn: https://www.linkedin.com/in/ankit-kothari-510a9623

📧 Email: ankit256@gmail.com

Data Science Must

Tools: Github, Docker, Pyspark, pandas, plotly

Data Exploration, Analysis and Visualization

1. H1B Data Analysis

The raw data has been downloaded from the USCIS Website which has an individual csv file for each year. It has data regaridng Employers, Initial Approvals, Continuing Approvals, Initial Denials, Continuing Denials, and demographic data. The goal of this analysis is to look at different trends around H1B visas touching Employers and States

2. INR-USD Trend Analysis 2000-2020

This project visualizes how INR changed in the last 20 years under three different Prime Minister of India

3. Identifying customer segments that would increase sales the most and target them with ads in social media.

4. Identify Undervalued apps to Improve Revenue

Tools: pandas, sqlite3, plotly, mapbox, data optimization, DASH, Heroku

Probablity and Statistics

Theory: Hypothesis Testing, AB Testing, Data Distributions, Parametric and Non-Parametric Test

Tools: Python, Pandas, scipy, plotly, statsmodel

Machine Learning

  • Bike Rental Prediction: Comparing Decesion Treed Models and Enssemble Methods using Random Forest to predict the bike rentals at a given hour of the day
  • Credit Risk Analysis: Comparing and exploring Hyperparameters to tune Logistic Regression, XGBoost and Artificial Nueral Network to predict whether a lender will pay their loan back. Uses publically available data from LendingClub.com

ML Algorithms: Linear Regression, Logistic Regression, Decesion Tree Model, Random Forest, XGBoost, ANN, Ensemble Models

Feature Extractions: Data Cleaning, Normalizing/Scaling of the data, Binning, Sampling, Correlation Matrix, Hyperparameter Tuning

Tools: Python, Pandas, sklearn, keras,

Natural Language Processing

Deep Learning Algorithms: distilBERT,BERT, LSTM, BiLSTM, 1D-CNN, GRU, Word Embeddings, Sentence Encoders, TF-IDF, LDA, NMF

Text Analysis: Text Cleaning using spacy, NER, POS, Text Classification, Chatbots, Topic Modeling

Tools: Python, Pandas, TF2.0, keras, Pytorch, spacy, pyspark, Slack RTM API, seaborn, plotly

Machine Vison and Opencv

Deep Learning Algorithms: CNN, OpenCV, Keras

Image Analysis: Blurring, Thresholding, Edge Detection, Morphological transformations, Contour detection, Affine Transformation, Transfer Learning, VGG19

Tools: Python, Pandas, TF2.0, keras, Pytorch, spacy, pyspark, OpenCV