Disaster Response Pipeline Project

To analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.

A Udacity Data Scientist Nanodegree Project

Installation
Project Motivation
Execution
File Descriptions
Results
Licensing, Authors, and Acknowledgements

Installation

Python 3.5+ (I used Python 3.6)
Machine Learning Libraries: NumPy, SciPy, Pandas, Sciki-Learn
Natural Language Process Libraries: NLTK
SQLlite Database Libraqries: SQLalchemy
Web App and Data Visualization: Flask, Plotly

Execution

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Go to http://0.0.0.0:3001/

Project Motivation

There are three components for this project.

ETL Pipeline

In a Python script, process_data.py, write a data cleaning pipeline that:
Loads the messages and categories datasets
Merges the two datasets
Cleans the data
Stores it in a SQLite database

ML Pipeline

In a Python script, train_classifier.py, write a machine learning pipeline that:
Loads data from the SQLite database
Splits the dataset into training and test sets
Builds a text processing and machine learning pipeline
Trains and tunes a model using GridSearchCV
Outputs results on the test set
Exports the final model as a pickle file

Flask Web App

Adding data visualizations using Plotly in the web app.

File Descriptions

ETL Pipeline Preparation.ipynb : This notebook will help you to go through initial thought process behind ETL part of this project and how different procedure were tried and tested. The final outcome of this notebook is used to create process_data.py file.
ML Pipeline Preparation.ipynb : This jupyter notebook will help to understand, why and how a particular machine learning algorithm were choosen. This jupyter notebook represent initial thought process for creating a model pipeline. The final outcome of this notebook is used to create train_classifier.py file.

Markdown cells were used to assist in walking through the thought process for individual steps.

Results

The outcome of this project is to start from scratch with a dataset, create a ETL pipeline for data engineering job and create a Machine Learning pipeline to train a model which can read text data and predict 36 classification categories.
At the end, use that trained and tuned ML model and use to predict any new message and find which disaster category it will fit.
Create a front-end application using flask to showcase visualization and model disaster category prediction on a webpage.

Licensing, Authors, Acknowledgements

Must give credit to Figure 8 for the data. You can find the Licensing for the data and other descriptive information at the this link available here. Otherwise, feel free to use the code here as you would like! Thanks to Udacity Team for helping me to learn all these new things.

Minsifye/Disaster_Response_Project