Disaster Response Pipeline

Description

This project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The dataset contains pre-labelled tweets and messages from real-life disaster events. It aims to build a Natural Language Processing (NLP) model to categorize messages.
Project is divided in the following key sections:
1. ETL pipeline - Extracts and processes data from source, and save it to a SQLite DB
2. ML pipeline - Builds a machine learning pipeline to train on the data, to classify text message in various categories
3. Web App - Generates and displays model predictions for user entered message in real time

To clone the git repository:

git clone https://github.com/jeena72/disaster-response-pipeline.git

You can run the following commands in the project's directory to set up the database, train model and save the model
- To run ETL pipeline to clean data and store the processed data in the database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run the ML pipeline that loads data from DB, trains classifier and saves the classifier as a pickle file python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command from inside "app/" directory to run web app
python run.py
Go to http://127.0.0.1:3001/

app/templates/*: templates/html files for web app

data/process_data.py: Extract Train Load (ETL) script for data cleaning, feature extraction, and storing data in a SQLite database

models/train_classifier.py: Model fitting script that loads data, trains a model, and saves it as a .pkl file

run.py: Script for launching the Flask web app

An example of a message categorization (predicted categories highlighted in green)