Fun in Kaggle!!!

Real or Not? NLP with Disaster Tweets

This is part of the Competition where you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t.

Library Used: Tensorflow, Pandas, NumPy, Matplotlib, scikit-learn, nltk

My Contribution:

Exploratory Data Analysis
Preprocessing
Wordcloud for common words in real and non-real tweets
Build a DNN model with Bidirectional GRU with public score of 0.71437

House Prices: Advanced Regression Techniques

This is part of the Competition where you’re challenged to predict the final price of each home WITH With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa.

Library Used: Pandas, NumPy, Matplotlib, scikit-learn

My Contribution in Notebook-EDA:Ref

Relationship of 'SalePrice' with numerical and categorical variable
Heatmap and scatterplot of 'SalePrice' and correlated varaibles
Outlier identification with Bivariate analysis
Check for 'normality', 'skewness' and 'homoscedasticity'
Log Transform data to attain normality
Convert categorical variable into dummy

My Contribution in Notebook-Modeling

Data Preprocessing
- Identify and remove outliers
- Log Transform data to attain normality
Feature Engineering
- Identify and impute missing data
- Transforming some numerical variables that are really categorical
- Label Encoding some categorical variables that may contain information in their ordering set
- Box Cox Transformation of (highly) skewed features
Modeling
- LASSO, Elastic Net, Kernel Ridge, Gradient Boosting, XGBoost, LightGBM
- Stacking models
  - Averaging base models
  - Add a meta-model on averaged base models and use the out-of-folds predictions of these base models to train meta-model
  - Ensembling StackedRegressor, XGBoost and LightGBM The submission is in Top 6%

Mousumi44/Kaggle

Fun in Kaggle!!!

Real or Not? NLP with Disaster Tweets

House Prices: Advanced Regression Techniques