Generalizing exporter
ivanzvonkov opened this issue · 3 comments
ivanzvonkov commented
Exporting labels
- (1)
labels
should be made input to the export_for_labels() function - Why: crop-mask needs the ability to use the exporter for files outside the public labels.geojson
- (2)
labels
should be checked for "start_date" and "end_date" and used if they are found - Why: The way that start and end date are computed may vary by dataset, storing this information in the labels prior to exporting leaves the exporter agnostic to this
- (3) The exporter class should have
dest_bucket
parameter where user can specify the destination GCP bucket - Why: Storing tifs on Google Cloud is easier than Google Drive
- (4) Exported tifs should have a canonical name
- Suggested:
f"min_lat={min_lat}_min_lon={min_lon}_max_lat={max_lat}_max_lon={max_lon}_dates={start_date}_{end_date}_all"
(where theall
indicates all bands are being exported, not just Sentinel 2) - Why: Making the tifs agnostic to the datasets they are derived from makes it possible to change the underlying dataset without having to reexport all tifs (example use case: partially labeled CEO project csv -> fully labeled CEO project csv)
- (5) export_for_labels should have
check_gcp
option which would check if the tif about to be exported already exists on Google Cloud - Why: There's no need to reexport a file already exported before. (This is like
checkpoint
but for cloud storage) - (6) export_for_labels should have
check_ee
option which would check if the tif about to be exported is currently in the earth engine queue. - Why: No need to export tifs already in the Earth Engine queue
Exporting region
- (1) Ability to export any region for any start and end date should be available (generally following this method signature: https://github.com/nasaharvest/crop-mask/blob/831d020d38a794afb07f8d270ab79ed6ad603232/src/ETL/ee_exporter.py#L143)
- Why: This is what's used for generating the data for a crop mask (generalized version of export test)
- (2)
credentials
should be made as input to the region exporter - Why: crop-mask uses RegionExporter inside Google Cloud, where it uses a special earth engine account for which the credentials must be passed manually
ivanzvonkov commented
I can work on 3,5,6
gabrieltseng commented
I'll work on removing the concept of a data_folder
, as per #56 (comment)
ivanzvonkov commented
I'll work on removing the concept of a
data_folder
, as per #56 (comment)
Here's the "label management" code in crop-mask for reference: https://github.com/nasaharvest/crop-mask/blob/96b56c50e238e836cab00699e522bb28812469a0/src/ETL/dataset.py#L47