nasaharvest/cropharvest

Improved crop-mask integration

gabrieltseng opened this issue · 1 comments

We want to improve the integration between cropharvest and crop-mask.

On the CropHarvest side this consists of:

  • Renaming the tif files to follow a <location>_<date> naming convention instead of coupling them to the labels.geojson
  • Rewriting the Engineer to handle variable length tifs, instead of expecting 12-month inputs
  • Storing tifs to a google cloud bucket

This has the advantage of not requiring tif files to be downloaded before updating the dataset, which should make it easier to contribute new datasets.

cc @ivanzvonkov

You probably already know this but the Engineer can handle variable timesteps, it does so here: https://github.com/nasaharvest/crop-mask/blob/951b14621838d70eb95f284bf92984aaf35d4cb1/src/ETL/dataset.py#L109

I supposed the real question may be: should we reexport all the unexported tifs in the 24 month timestep style starting from January to make it easy to run models from September to September for example.