ECMWFCode4Earth/challenges_2020

Challenge #26 -Forecasting wildfire danger using deep learning

EsperanzaCuartero opened this issue · 21 comments

Challenge #26 - Forecasting wildfire danger using deep learning

Stream 2 - Machine-Learning and Artificial Intelligence

IMPORTANT: this challenge is eligible to apply for cloud credits from WEkEO. Please specify the cloud resources you think will be needed in your proposal.

Goal

The project aims to explore whether a deep learning model could be used to predict wildfire danger at various lead times.

Mentors and skills

  • Mentors: @cvitolo @tianranZH
  • Skills required
    • Experience with Deep Learning
    • Understanding of wildfire danger

Challenge description

The Global ECMWF Fire Forecasting (GEFF) system uses Numerical Weather Predictions to drive a number of empirical models to predict forest fire danger indices up to 10/15 days ahead. The current system, written in Fortran, works in both reanalysis and forecast (deterministic and probabilistic) mode. The most widely used fire danger indices were originally designed by Canadian fire experts and later calibrated to better predict fire regimes and patterns occurring in Europe. This means the performance of the forecasts varies widely across the world, working well only in regions like North America and Europe. There might be underlying factors and phenomena ignored or not well captured by the GEFF system, or usage of outdated input which prevents the fire danger forecasts from performing equally well in other regions of the globe. 

A machine learning (deep learning) approach (U-Net etc) could be used to explore the relationship between the weather information and the expected fire danger and gain valuable insights. 

This project will focus on the following:

  1. Build a model to forecast fire danger using the same inputs used by GEFF
  2. Explore possible improvements to the model by including additional inputs, e.g. SMOS (soil moisture and vegetation), other relevant remote sensing data, etc
  3. Explore the possibility to extend predictions to longer time scales (e.g. seasonal) 

We would like interested developers to provide a clear implementation plan, including a description of the model to be used and a validation strategy.

Suggested deliverables and milestones:

  1. Build a DL model with reported accuracy, training speed using the inputs and output of GEFF.
  2. Use external validation data (ground based meteorological observations, satellite based product [GFAS]) to evaluate the performance of DL model and GEFF, propose potential solutions in DL model to improve the accuracy compared to external validation data.
  3. Update the DL model with the agreed solution from deliverable2 and explore the possibility to extend DL model from deliverable3 to longer time scales.
  4. Final report of established DL model.

We value proposals that are:

  • clearly described, including a timeline for deliverables and milestones,
  • technically feasible within 4 months,
  • proposing a scientifically-sound approach,
  • applicable to any place on Earth
  • open source, well documented and easy to maintain

Are historical data will be provided and if yes, which data and how long will be historie. Futhermore, how many fire events (targets ML training) will be provided.

Hi @anton-stgt ,
Depending on the objectives, various historical datasets could be used.
If the goal is to predict (probability of) fire occurrence, one option is to use CAMS fire radiative power but there are many others (see also this kaggle dataset that provides almost 2M events).
If the goal is to predict (potential) fire danger, then we could use FWI-ERA5.

Hi , If I have well understood, Target Variable could be selected among CAMS fire or FWI-ERA5, concerning the features of interest can you please indicate which is the reference data-set and the list of main weather variables . Thanks

Hi @saferplaces
In terms of training input, you can have a look at table one in this paper: https://journals.ametsoc.org/doi/pdf/10.1175/JAMC-D-15-0297.1
Other than the input listed here, you are welcome to explore any other open data open data from the Copernicus Climate Change (https://cds.climate.copernicus.eu/#!/home) and Atmosphere Monitoring Service (https://atmosphere.copernicus.eu/data). Thanks.

Thanks a lot for your prompt reply, can you share the available historical daily input dataset used by GEFF system indicator?
Thanks a lot

Hi @saferplaces, the inputs for GEFF are in the ERA5 dataset available from the Copernicus Climate Data Store.

Hi @saferplaces, the inputs for GEFF are in the ERA5 dataset available from the Copernicus Climate Data Store.

Dear cvitolo, thanks for you rreply,
looking at the paper the following variables :

  • Vegetation Cover
  • vegetation Stage
  • Fuel Model
  • etc

are those variable available in ERA5?

Whatever is not available from the Climate Data Store, will be made available to the successful candidates at the beginning of the coding period.

Join us for our LIVE @ecmwf Summer of Weather Code Ask Me Anything session on 1 April 2020 at 2 pm (CET) (tomorrow).

Get infos first hand from the #ESoWC2020 organisers, mentors and former #ESoWC participants.
➡️Sign up

What is the ground truth we are supposed to use for this project/ where can it be found?

What is the ground truth we are supposed to use for this project/ where can it be found?

Hi @RamaniLachyan
We are expecting two stages of this project:

  1. use GEFF model output as labels for training, this is to evaluate the ability of DL model to reproduce GEFF
  2. use satellite observation data as labels for training, for example the output from GFAS, this is a more challenging step since the output of GEFF (fire danger, the chance of there is a fire) is not directly same with output of GFAS (observation of fire intensity), so creating a method to convert GFAS output into useful information for this training will be essential, and challenging as well.

You can find more information of GFAS here:
https://atmosphere.copernicus.eu/global-fire-emissions
https://apps.ecmwf.int/datasets/data/cams-gfas/

Please also keep in mind that this is our suggestion to the proposal. We are very welcome with other good ideas for improving fire danger forecasting.

Hope it helps.

Depending on the spatial extent of this challenge, one might consider combining historical data with satellite imagery. For example the BC government publishes spatial data on historical fires (accessed here through the bcdata R package):

library(bcdata)
library(dplyr)

fires <- bcdc_query_geodata('fire-perimeters-historical') %>% 
  filter(FIRE_YEAR == 2019) %>% 
  collect()
fires
#> Simple feature collection with 156 features and 17 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 524349.6 ymin: 388236.1 xmax: 1794982 ymax: 1670523
#> CRS:            3005
#> # A tibble: 156 x 18
#>    id    FIRE_NUMBER VERSION_NUMBER FIRE_YEAR FIRE_CAUSE FIRE_LABEL
#>  * <chr> <chr>                <int>     <int> <chr>      <chr>     
#>  1 WHSE~ C10085          2019042501      2019 Person     2019-C100~
#>  2 WHSE~ C10092          2019081501      2019 Person     2019-C100~
#>  3 WHSE~ C10094          2019050701      2019 Person     2019-C100~
#>  4 WHSE~ C10193          2019051201      2019 Person     2019-C101~
#>  5 WHSE~ C10205          2019080902      2019 Person     2019-C102~
#>  6 WHSE~ C20025          2019040501      2019 Person     2019-C200~
#>  7 WHSE~ C21414          2019081501      2019 Person     2019-C214~
#>  8 WHSE~ C40192          2019050801      2019 Person     2019-C401~
#>  9 WHSE~ C40408          2019053001      2019 Lightning  2019-C404~
#> 10 WHSE~ C40589          2019061501      2019 Lightning  2019-C405~
#> # ... with 146 more rows, and 12 more variables: FIRE_SIZE_HECTARES <dbl>,
#> #   SOURCE <chr>, GPS_TRACK_DATE <chr>, LOAD_DATE <chr>, FIRE_DATE <chr>,
#> #   CREATION_METHOD <chr>, FEATURE_CODE <chr>, OBJECTID <int>,
#> #   SE_ANNO_CAD_DATA <chr>, FEATURE_AREA_SQM <dbl>, FEATURE_LENGTH_M <dbl>,
#> #   geometry <MULTIPOLYGON [m]>

Created on 2020-04-06 by the reprex package (v0.3.0)

Just stumbled across this issue and thought I'd offer this given the interest in ground truth data.

Thank you. Are we expected to generate our own GEFF outputs as I cannot seem to find any public GEFF-reanalysis dataset (https://confluence.ecmwf.int/pages/viewpage.action?pageId=73017108)

Thanks for the suggestion @boshek, that's very valuable!

Hi @RamaniLachyan, GEFF reanalysis data was recently migrated to the Copernicus Climate Data Store: https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-fire-historical?tab=overview

Hi
For the DL model outputs, do we need to model the all list of variables as described here http://datastore.copernicus-climate.eu/c3s/published-forms/c3sprod/cems-fire-historical/Fire_In_CDS.pdf
or can we concentrate only on the following main outputs:

FWI Model

Fire Weather Index - FWI
Fire Severity
Danger Rating

Mc-Arthur 5

Fire Danger Index

NFDRS

Burning Index

thank you

Hi @fadouaeddounia !
In the first instance, I would suggest to focus on the FWI system. If time allows, other systems can be explored.

Only 4 days left to apply to be part of ECMWF Summer of Weather Code 2020.
Application deadline: Wednesday, 22 April 2020 at 23:59 (BST).
Submit your proposal here.

Do we have an idea about the ballpark accuracy levels of the existing FORTRAN implementation of GEFF with the ground truth? Could be for any time scales or any geography. Just trying to understand what kind of benchmarks we need to reproduce with the DL models.

Dear All, we have a doubts concerning the Real value of the target variable to be predicted by ML/DL model, for example in case we would like to train and build a model for FWI target variable, which are the "real" or observation value of the FWI to be used for training the model? I was thinking to use the GEFF Reanalysis FWI value is it correct? are available other FWI observation from Satellite or other surveys? Thanks a lot Regards Stefano

Do we have an idea about the ballpark accuracy levels of the existing FORTRAN implementation of GEFF with the ground truth? Could be for any time scales or any geography. Just trying to understand what kind of benchmarks we need to reproduce with the DL models.

@lazyoracle we expect to see at least 80% accuracy in Europe

Dear All, we have a doubts concerning the Real value of the target variable to be predicted by ML/DL model, for example in case we would like to train and build a model for FWI target variable, which are the "real" or observation value of the FWI to be used for training the model? I was thinking to use the GEFF Reanalysis FWI value is it correct? are available other FWI observation from Satellite or other surveys? Thanks a lot Regards Stefano

@saferplaces FWI is not observed, it is an index of 'potential danger' derived from meteorological observations. In terms of ground truth you can use: 1) GEFF reanalysis FWI or 2) FWI calculated from ground based meteorological observations.
If time allows, it would be interesting to see how the basic DL model can be adapted to predict 'real danger', in that case the ground truth can be observed burned areas or fire radiative power (from satellites).

We value proposals that are: clearly described, including a timeline for deliverables and milestones, technically feasible within 4 months, proposing a scientifically-sound approach, applicable to any place on Earth open source, well documented and easy to maintain

What does an "open source" proposal imply?