This project is part of the Udacity Data Science Nano Degree Programand supported by Appen. This project will analyze a data set containing real messages that were sent during disaster events. Those messages are sent from social media or from disaster response organizations. This project will build a ETL pipeline to load and process data, and a machine learning pipeline to classify those messages so as to send them to an appropriate disaster relief agency.
There are three components in the project.
- Loads the message.csv and categories.csv files
- merges two datasets
- clean data
- stores it in a SQLite database
- Load cleaned data from database
- Build a test processing and maching learning pipline
- Used different models and evaluate accuracy
- Apply feature union to improve model
- Train and tunes a model using GridSearchCV
There is a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.
- README.md: read me file
- ETL Pipeline Preparation.ipynb: ETL pipeline preparation code
- ML Pipeline Preparation.ipynb: ML pipeline preparation code
- \app
- run.py: flask file to run the app
- \templates
- master.html: main page of the web application
- go.html: result web page
- \data
- disaster_categories.csv: categories dataset
- disaster_messages.csv: messages dataset
- DisasterResponse.db: disaster response database
- process_data.py: ETL process to clean up data
- \model
- train_classifier.py: classification code
- custom_extractor.py: python package that build a class to extract disaster related words
- classifier.pkl: model saved as a pickle file
git clone https://github.com/petitblue/disaster-response-pipeline.git
While in the project's root directory disaster-response-pipeline run the ETL pipeline that cleans and stores data in database.
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
Next, run the ML pipeline that trains the classifier and save it.
python model/train_classifier.py data/DisasterResponse.db model/classifier.pkl
Next, change directory into the app directory and run the Python file run.py.
cd app
python run.py
Finally, go to http://0.0.0.0:3001/ or http://localhost:3001/ in your web-browser. Type a message input box and click on the Classify Message button to see the various categories that your message falls into.