sentinel-hub/eo-grow

[BUG] Issues running the batch_to_eopatch pipeline

Closed this issue · 5 comments

Question

I have successfully run the batch download pipeline and would like to convert the batch tiles to eopatches. After locally fixing #12 I've managed to run the batch_to_eopatch pipeline, but I get the following exception in the logs:

Summary of exceptions

    LoadUserDataTask (LoadUserDataTask-29825b248e7b11ecbc3b-f57730fc0853):
        14 times:

        TypeError: execute() missing 1 required positional argument: 'eopatch'

Which is weird, because the LoadUserDataTask is the first Task and no eopatch arguments should be expected.

Here is my config:

{
  "pipeline": "eogrow.pipelines.batch_to_eopatch.BatchToEOPatchPipeline",
  "folder_key": "data",
  "mapping": [
    {"batch_files": ["B01.tif"], "feature_type": "data", "feature_name": "B01", "multiply_factor": 1e-4},
    {"batch_files": ["B02.tif"], "feature_type": "data", "feature_name": "B02", "multiply_factor": 1e-4},
    {"batch_files": ["B03.tif"], "feature_type": "data", "feature_name": "B03", "multiply_factor": 1e-4},
    {"batch_files": ["B04.tif"], "feature_type": "data", "feature_name": "B04", "multiply_factor": 1e-4},
    {"batch_files": ["B05.tif"], "feature_type": "data", "feature_name": "B05", "multiply_factor": 1e-4},
    {"batch_files": ["B06.tif"], "feature_type": "data", "feature_name": "B06", "multiply_factor": 1e-4},
    {"batch_files": ["B07.tif"], "feature_type": "data", "feature_name": "B07", "multiply_factor": 1e-4},
    {"batch_files": ["B08.tif"], "feature_type": "data", "feature_name": "B08", "multiply_factor": 1e-4},
    {"batch_files": ["B8A.tif"], "feature_type": "data", "feature_name": "B8A", "multiply_factor": 1e-4},
    {"batch_files": ["B09.tif"], "feature_type": "data", "feature_name": "B09", "multiply_factor": 1e-4},
    {"batch_files": ["B10.tif"], "feature_type": "data", "feature_name": "B10", "multiply_factor": 1e-4},
    {"batch_files": ["B11.tif"], "feature_type": "data", "feature_name": "B11", "multiply_factor": 1e-4},
    {"batch_files": ["B12.tif"], "feature_type": "data", "feature_name": "B12", "multiply_factor": 1e-4},
    {"batch_files": ["CLP.tif"], "feature_type": "data", "feature_name": "CLP", "multiply_factor": 0.00392156862745098},
    {"batch_files": ["CLM.tif"], "feature_type": "mask", "feature_name": "CLM"},
    {"batch_files": ["dataMask.tif"], "feature_type": "mask", "feature_name": "dataMask"}
  ],
  "userdata_feature_name": "BATCH_INFO",
  "userdata_timestamp_reader": "eogrow.utils.batch.read_timestamps_from_orbits",
  "**global_settings": "${config_path}/sentinel2_l1c_batch_config.json"
}

Let me know if you need to see what sentinel2_l1c_batch_config.json looks like.

The data is there:

image

Ah, the eopatch is Optional[EOPatch] but apparently we forgot to add a default value.

Thanks for the hint. I tried setting the default to None and got a new error:

❯ eogrow 01_batch_to_eopatch.json
INFO eogrow.core.pipeline:216: Running BatchToEOPatchPipeline
INFO eogrow.core.area.base:176: Loading grid from cache/grid_test_area_BatchAreaManager_0.2_0.004_1_10.0_0.gpkg
INFO eogrow.core.pipeline:159: Searching for Ray cluster
INFO eogrow.core.pipeline:164: No cluster found, pipeline will not use Ray.
INFO eogrow.core.pipeline:174: Starting EOExecutor for 14 EOPatches
  0%|                                                                                                             | 0/14 [00:00<?, ?it/s]Warning 1: TIFFReadDirectory:Sum of Photometric type-related color channels and ExtraSamples doesn't match SamplesPerPixel. Defining non-color channels as ExtraSamples.
Warning 1: TIFFReadDirectory:Sum of Photometric type-related color channels and ExtraSamples doesn't match SamplesPerPixel. Defining non-color channels as ExtraSamples.
Warning 1: TIFFReadDirectory:Sum of Photometric type-related color channels and ExtraSamples doesn't match SamplesPerPixel. Defining non-color channels as ExtraSamples.
  0%|                                                                                                             | 0/14 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/Users/mlubej/.pyenv/versions/surs/bin/eogrow", line 33, in <module>
    sys.exit(load_entry_point('eo-grow', 'console_scripts', 'eogrow')())
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/mlubej/work/projects/sh-project/eo-grow/eogrow/cli.py", line 80, in main
    pipeline.run()
  File "/Users/mlubej/work/projects/sh-project/eo-grow/eogrow/core/pipeline.py", line 220, in run
    finished, failed = self.run_procedure()
  File "/Users/mlubej/work/projects/sh-project/eo-grow/eogrow/core/pipeline.py", line 263, in run_procedure
    finished, failed, _ = self.run_execution(workflow, exec_args)
  File "/Users/mlubej/work/projects/sh-project/eo-grow/eogrow/core/pipeline.py", line 185, in run_execution
    execution_results = executor.run(**executor_run_params)
  File "/Users/mlubej/work/projects/sh-project/eo-learn/core/eolearn/core/eoexecution.py", line 187, in run
    full_execution_results = self._run_execution(processing_args, workers, processing_type)
  File "/Users/mlubej/work/projects/sh-project/eo-learn/core/eolearn/core/eoexecution.py", line 219, in _run_execution
    return submit_and_monitor_execution(process_executor, self._execute_workflow, processing_args)
  File "/Users/mlubej/work/projects/sh-project/eo-learn/core/eolearn/core/eoexecution.py", line 398, in submit_and_monitor_execution
    results[future_order[future]] = future.result()
  File "/Users/mlubej/.pyenv/versions/3.8.7/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/Users/mlubej/.pyenv/versions/3.8.7/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Users/mlubej/.pyenv/versions/3.8.7/lib/python3.8/logging/__init__.py", line 2123, in shutdown
    h.close()
  File "/Users/mlubej/work/projects/sh-project/eo-grow/eogrow/core/logging.py", line 253, in close
    self.local_file.close()
  File "/Users/mlubej/work/projects/sh-project/eo-grow/eogrow/utils/fs.py", line 90, in close
    self.copy_to_remote()
  File "/Users/mlubej/work/projects/sh-project/eo-grow/eogrow/utils/fs.py", line 103, in copy_to_remote
    fs.copy.copy_file(self._filesystem, self._local_path, self._remote_filesystem, self._remote_path)
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/fs/copy.py", line 142, in copy_file
    copy_file_if(
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/fs/copy.py", line 221, in copy_file_if
    copy_file_internal(
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/fs/copy.py", line 277, in copy_file_internal
    _copy_locked()
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/fs/copy.py", line 270, in _copy_locked
    dst_fs.upload(dst_path, read_file)
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/fs_s3fs/_s3fs.py", line 774, in upload
    self.client.upload_fileobj(
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/boto3/s3/inject.py", line 537, in upload_fileobj
    future = manager.upload(
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/s3transfer/manager.py", line 329, in upload
    return self._submit_transfer(
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/s3transfer/manager.py", line 524, in _submit_transfer
    self._submission_executor.submit(
  File "/Users/mlubej/.pyenv/versions/3.8.7/envs/surs/lib/python3.8/site-packages/s3transfer/futures.py", line 474, in submit
    future = ExecutorFuture(self._executor.submit(task))
  File "/Users/mlubej/.pyenv/versions/3.8.7/lib/python3.8/concurrent/futures/thread.py", line 181, in submit
    raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown

We discovered that the issue is not in multithreading but instead lies in reading tiffs with ImportFromTiffTask. Investigating further.

Fixed in #15