In order to run the codes in this project, the following libraries must be installed:
- Pandas
- Numpy
- Sci-kit Learn
- Flask
- SQL Alchemy
- Plotly
- NLTK
This project was done to complete the requirements for Udacity's Data Scientist Nanodegree. Using text data from Figure-8, a company specializing in data analytics and machine learning, the purpose was to classify messages that were created during a disaster into 36 categories to help in aid efforts.
The project is divided into 3 folders: one for data and data processing; another one is for building a machine learning pipeline; and the third is for the web app. There are also 3 screenshots for the final web app.
- Messages data: disaster_messages.csv
- Categories data: disaster_categories.csv
- SQL Database: DisasterResponse.db
- Jupyter notebook for building ETL pipeline: ETL Pipeline Preparation.ipynb
- Python script for processing the data: process_data.py
- Jupyter notebook for building a machine learning pipeline: ML Pipeline Preparation.ipynb
- Python script for training the classifier: train_classifier.py
- A pickle file that contains the trained model: classifier.pkl
- Python script for running the web app: run.py
- templates folder that contains 2 HTML files for the app front-end: go.html and master.html
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database:
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves it:
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database:
-
Run the following command in the app's directory to run your web app:
python run.py
-
Go to http://0.0.0.0:3001/
The final output of the project is an interactive web app that takes a message from the user as an input and then classifies it.
Thanks to Udacity for providing guidance to complete the project and thanks to Figure-8 for providing the data