/disaster-response-pipeline

WebApp to identify labels of real-world distress messages | NLP, ETL | Udacity - Data Scientist Nanodegree

Primary LanguagePythonMIT LicenseMIT

Disaster Response Pipeline

Intro Pic

Table of Contents

  1. Description
  2. Getting Started
    1. Dependencies
    2. Installing
    3. Executing Program
  3. Acknowledgement
  4. Screenshots

Description

  • This project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The dataset contains pre-labelled tweets and messages from real-life disaster events. It aims to build a Natural Language Processing (NLP) model to categorize messages.

  • Project is divided in the following key sections:

    1. ETL pipeline - Extracts and processes data from source, and save it to a SQLite DB
    2. ML pipeline - Builds a machine learning pipeline to train on the data, to classify text message in various categories
    3. Web App - Generates and displays model predictions for user entered message in real time

Getting Started

Dependencies

  • Python 3.9+
  • ML libraries: numPy, pandas, sciki-Learn, nltk
  • Other libraries: sqlalchemy, joblib, flask, plotly

Installing

To clone the git repository:

git clone https://github.com/jeena72/disaster-response-pipeline.git

Executing Program:

  1. You can run the following commands in the project's directory to set up the database, train model and save the model

    • To run ETL pipeline to clean data and store the processed data in the database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
    • To run the ML pipeline that loads data from DB, trains classifier and saves the classifier as a pickle file python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
  2. Run the following command from inside "app/" directory to run web app
    python run.py

  3. Go to http://127.0.0.1:3001/

Important Files

app/templates/*: templates/html files for web app

data/process_data.py: Extract Train Load (ETL) script for data cleaning, feature extraction, and storing data in a SQLite database

models/train_classifier.py: Model fitting script that loads data, trains a model, and saves it as a .pkl file

run.py: Script for launching the Flask web app

Acknowledgements

  • Udacity for the Data Science Nanodegree Program
  • Figure Eight for providing the relevant dataset to train the model

Screenshots

  1. An example of a message categorization (predicted categories highlighted in green)

Sample Input

  1. Web app home page with some visualization of distributions in dataset

Main Page

  1. Sample run of train_classifier.py with precision, recall etc. for each category

Train Classifier with Category Level Precision Recall