/ml-disasters-on-social-media

My capstone project for Udacity Machine Learning Engineer nanodegree

Primary LanguageJupyter Notebook

Disasters on Social Media

This is my capstone project for Udacity Machine Learning nanodegree.

I picked one of Kaggle’s “Getting Started” competitions on Natural Language Processing for my capstone project. The competition is ​Real or Not? NLP with Disaster Tweets​, and can be found on Kaggle’s website at: https://www.kaggle.com/c/nlp-getting-started/overview/evaluation

The goal of this project is to build a machine learning model that predicts which Tweets are about real disasters and which ones aren’t.

Watch a video of it working: https://www.youtube.com/watch?v=lzGzzTPXokE

Software and libraries used

I implemented this solution on Amazon Sagemaker. No special libraries have been used that are not included in Amazon Sagemaker.

Description of notebooks

cp00-data-analysis.ipynb: This notebook is used to analyze the dataset and generate the information included in the related section of the report.

cp01-pytorch-bagofwords.ipynb: This notebook contains the code for the first model I built, based on PyTorch and bag-of-words technique with single words.

cp02-pytorch-2grams.ipỳnb: Similarly to the previous one, this notebook contains the code for the second model, that uses 2-grams instead of single words.

cp03-blazingtext.ipynb: This notebook uses BlazingText algorithm to solve the problem.

cp99-leaderboard-analysis.ipynb: Analysis of the leaderboard studying the performance of the model built.

Folders

input: Contains all datasets needed to run the notebooks.

train: Contains code needed to create the PyTorch model.

data: Temporary folder to store model resources.

Other files

website.html: Website to make predictions using the model. The endpoint is not online and it cannot be tested without setting it up.

proposal.pdf: Project proposal.

report.pdf: Project report.