In a Python script, process_data.py:
- Loads the messages and categories datasets
- Merges the two datasets
- Cleans the data
- Stores it in a SQLite database
In a Python script, train_classifier.py:
- Loads data from the SQLite database
- Splits the dataset into training and test sets
- Builds a text processing and machine learning pipeline
- Trains and tunes a model using GridSearchCV
- Outputs results on the test set
- Exports the final model as a pickle file
In this project, I apply skills I learned in Data Engineering Section to analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.
-
Run run.py directly if DisasterResponse.db and claasifier.pkl already exist.
-
Run the following commands in the project's root directory to set up your database and model.
- data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- python run.py