OCHA-DAP/ds-raster-pipelines

FloodScan - Do we want HDX rasters processed in this repo?

Closed this issue · 3 comments

Do we want the HDX FloodScan rasters processed in this repository?

The daily rasters pushed to HDX will follow a different process than what has been done so far in the repo.

The final raster product pushed to HDX will be a zip file containing 90 daily COGS covering the the tif from the latest run and previous 90 days. Each daily raster will contain 2 bands: SFED, SFED_BASELINE

Process Summary

Calculate baseline raster(s) band(s)

This only needs to be updated 1x per year.

  1. Read in COGS SFED band for last 10 years (not including current year)
  2. Smooth SFED values with rolling centered mean on SFED values with a window of 10 days (+/-5)
  3. Take the composite mean of the smoothed values per day of year (DOY) over the entire 10 year stack - this will result in either 365 rasters or raster bands/layers depending on your raster model

Merge baseline to near-realtime (NRT) rasters.

  1. Read in last 90 days of tifs from internal pipeline (produced) in this repo
  2. Merge SFED baseline band onto those tifs based on DOY.
  3. export - zip and send to HDX

More info

  • Alot of the above along w/ additional details is on the confluence spec page for FloodScan
  • This process has already been done in R in the ds-floodscan-ingest repository. The creation of baseline was done here and the merging/exporting of cogs is illustrated in this notebook
    • it's worth noting that i had to write some more complex code to batch various processing steps of the raster baseline calculations as I was running it locally. I think with the extra power we get from databricks we will be able to remove alot of this complexity.

im asking about the repo cause i'd be happy to take a stab at transforming my R code to python pipeline code.

I'd vote to create a separate repo for Floodscan HDX outputs (both the rasters and the zonal stats). It seems to me that these HDX products extend the outputs from our existing pipelines and are differentiated enough that they won't generalize well with what we have already.

Closing as we decided to do HDX processing in https://github.com/OCHA-DAP/hdx-floodscan