Analysis and predictive modelling of Zomato Bangalore data, including customer reviews. The folder Python_scripts_and_notebooks contains .py and .ipynb files for my 3-part analysis.
This project was executed on Kaggle. The Zomato Bangalore dataset is publicly available at https://www.kaggle.com/himanshupoddar/zomato-bangalore-restaurants. Word2Vec embeddings used in Part 3 are available at https://www.kaggle.com/sandreds/googlenewsvectorsnegative300.
- Data cleaning (identifying and dropping duplicates, reformatting features)
- Exploratory Data Analysis and observations
- Data visualizations
- Preprocessing and prediction with regression models
- Model evaluation (MSE, MAPE, R^2)
- Results summary
- Target transformation from numeric to categorical
- Preprocessing and prediction with Decision Tree, Random Forest and XGBoost
- Model evaluation (Accuracy, Cohen Kappa, F1 score, Precision, Recall)
- Feature Importance visualization
- Results summary
- Text mining and insights (unigrams, bigrams, trigrams and FreqDist plots)
- Text processing (regex, tokenizing, stopword removal, lemmatizing, vectorizing with Word2Vec)
- Building an LSTM Neural Network
- Model evaluation
- Results summary