Disaster Response NLP Pipeline

Dependencies
Project Introduction
Instructions for running the scripts
Project Structure
File Descriptions
Results
Licensing, Authors, and Acknowledgements

List of Dependencies

The code should run with no issues using Python versions 3. Other libraries used in this project are:

numpy
pandas
flask
nltk
pickle
matplotlib
scikit-learn
sqlalchemy

Project Introduction

The task of this project is to analyze disaster messages from Figure Eight and build a Machine Learning model that classifies disaster messages. The data set contains real messages that were sent during disaster events. A machine learning pipeline is created to categorize these events so that one can send the messages to an appropriate disaster relief agency. The project also includes a web app where an emergency worker can input a new message and get classification results in several categories.

Instructions for running the scripts

Run the following commands in the project's root directory to set up your database and model. s
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Go to http://0.0.0.0:3001/

Project Structure

app folder
1. templates folder
  1. go.html
  2. master.html
2. run.py
data folder
1. disaster_categories.csv
2. disaster_messages.csv
3. DisasterResponse.db
4. process_data.py
models folder
1. classifier.pkl
2. train_classifier.py
jupyter notebooks folder
1. categories.csv
2. messages.csv
3. ETL Pipeline Preparation.ipynb
4. ML Pipeline Preparation.ipynb
sample images folder
README.md

File Descriptions

The app folder contains files necessary for the functioning of the Web app. The templates folder contains two html files (go.html is used to render the information about the training data in the form of bar graphs and the classification results into 36 different categories while master.html is used to render the Web page). The run.py file runs the flask Web app.
The data folder contains two csv files (disaster_messages.csv contains disaster messages and disaster_categories.csv contains 36 different categories into which disaster messages can be classified) and a sql database file DisasterResponse.db that contains the cleaned and processed disaster messages for training the classification model. The process_data.py script merges the two csv files into a single file, cleans the disaster messages and stores the cleaned/processed messages into a sql database.
The train_classifier.py script inside the models folder contains the code to load the cleaned disaster messages from the sql database, creates new features (number of words in each message, number of characters in each message, number of non stopwords in each message), builds Machine Learning Pipeline, performs GridSearchCV to find the best hyperparameter for the classification model, evaluates the trained model on test set and then saves the trained model as a pickle file to deploy on the Web app. The classifier.pkl file contains the trained model as pickle file.
The jupyter notebooks folder contains two jupyter notebooks. ETL Pipeline Preparation.ipynb notebook performs Extract, Load and Transform task on the messages and categories csv files after merging these two files. The process_data.py script is prepared using ETL notebook. ML Pipeline Preparation.ipynb contains Machine Learning Pipeline to classify disaster messages into 36 different categories. The train_classifier.py script is prepared using this notebook.
The sample_images folder contains the images of visualizations from the ETL notebook and the working Web app for the purpose of quick demonstration in the results section below.