Disaster Response Pipeline Project

Summary

This repository contains a classification engine in a web app to identify how a message relates best to 36 different categories. The web app should look like this:

Data set

There are two datasets in this repository.

messages.csv The messages contains the our dependent variable. We want to be able to classify some message into 36 different categories.
categories.csv This data set contains the flagged categories, to train our model on.

Cleaning the data

We need to go through a few steps to make our data set usefull.

The categories column is a string with this structure: "related-0;request-0;offer-0;aid_related-0;medi..." Therefore we need to go through a couple steps:

Split the string between ";" and create separate columns per split out string
Collumn names need to be the string without the last two characters of the string
The value needs to be only the last character of the string

Finally we merge everything back together and drop the duplicates.

Instructions:

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Go to http://0.0.0.0:3001/

thijsessens/udacity_disaster

Disaster Response Pipeline Project

Summary

Data set

Cleaning the data

Instructions: