- Project Motivation
- Requirements
- Installation Instructions
- Files Descriptions
- Results
- Licensing, Authors, and Acknowledgements
This project is part of the Data Science Nanodegree Program by Udacity in collaboration with figure Eight. The dataset contains pre-labelled tweet and messages from real life disaster events. The aim is to design a model to categorize massages on all 36 pre-defined categoties that can be sent to the appropriate disaster relief agency.
The code should run with no issues using Python versions 3 with the following libraries:
- Machine Learning: NumPY, Scipy, Pandas, sklearn
- Natural Language Process: NLTK
- SQLite Database: SQLalchemy
- Model Loading and Saving: Pickle
- Web App and Data Visualization: Flask, Plotly
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
- Data
- disaster_categories.csv + disaster_messages.csv - Datasets with all the necessary informations
- process_data.py - Code that reads and cleans the csv files and stores it in a SQL database.
- db_disaster_messages.db - Dataset created after the transformed and cleaned data from the disasters CSV files.
- Models
- train_classifer.py - Code necessary to load data and run the machine learning model, this will create a pickle file at the end (classifier.pkl)
- App
- run.py - Flask app and the user interface used to predict results and display them.
- templates - Folder containing the html template files
This is the expected frontpage from the website:
By inputting a sentence it should be able to see the categorie result:
There are other options for the pipeline in the ML Pipeline Preparation.ipynb. Feel free to change the build_model() function in the train_classifier.py file (models folder)
Must give credit to Figure Eigth for the data. Also, thank you the StackOverFlow community and Udacity for the training! Otherwise, feel free to use the code here as you would like!