/capstone

Forest fire prediction model based on data published by P. Cortez and A. Morais in "A Data Mining Approach to Predict Forest Fires Using Meteorological Data"

Primary LanguageJupyter Notebook

#Wildfires are a contemporary issue with high visibility and cost around the world. This paper describes modeling conducted on a small dataset collected in Portugal to determine if wildfires can be predicted using machine learning. Unfortunately, the dataset was too small for a reliable model to be created, only 517 entries. Several different Scikit-learn models were built but there were issues with overfitting and no model produced test set accuracy over 55%. There was also an issue with the distribution of acres burned in the dataset. 247 of the entries, or roughly 47%, have an area burned of 0.0 hectares and the 50 percentile is 0.52 hectares meanwhile the max value for area is 1090.84 hectares. Overall the project was unsuccessful but an increase in the size of the dataset or potentially removing the data points that are outside of the fire season may result in better models.

#The current models are based upon the data published at https://www.kaggle.com/datasets/sumitm004/forest-fire-area and the report can be found at https://www.overleaf.com/read/ywfyfsvwdgkh#ddbd65 Models were built using 5 different Scikit-learn classifiers, DecisionTreeClassifier, RandomForestClassifier, SVC (Support Vector Classification), MLPClassifier (multi-layer perception), and AdaBoostClassifier.

#The model accounts for variables such as location, month, date, temperature, relative humidity, fine fuel moisture code, duff moisture code, drought code, and initial spread index as defined in the Fire Weather Index (https://www.nwcg.gov/publications/pms437/cffdrs/fire-weather-index-system). The intent is to eventually improve the model to be able to account for data related to fire history at a given location.