stac-utils/stac-asset

Multiple error messages when failing fast

Closed this issue · 10 comments

          @gadomski the only concern I have is that, although it prints only one stack trace, the number of ERROR: lines can vary, and it's unclear why they might sometimes be the same. Here is the output I got with the fix:
$ stac-asset -v DEBUG download --fail-fast --path-template '${satl:outcome_id}' --http-timeout 5 --max-concurrent-downloads 5
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
ERROR: SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                                                                
10/13:   0%|▌                                                                                                                                              | 2.02M/468M [00:01<00:54, 8.99MB/s, 4 errors, 23 skips]Traceback (most recent call last):
  File "/home/denis/git/stac-asset/env/bin/stac-asset", line 8, in <module>
    sys.exit(cli())
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/denis/git/stac-asset/src/stac_asset/_cli.py", line 188, in download
    asyncio.run(
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/denis/git/stac-asset/src/stac_asset/_cli.py", line 298, in download_async
    output = await download()
  File "/home/denis/git/stac-asset/src/stac_asset/_cli.py", line 281, in download
    return await _functions.download_item_collection(
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 355, in download_item_collection
    await downloads.download(messages)
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 152, in download
    raise error
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 145, in download
    results = await asyncio.gather(*tasks)
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 176, in download_with_lock
    return await download.download(messages=messages)
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 59, in download
    raise error
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 49, in download
    await download_asset(
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 431, in download_asset
    raise error
  File "/home/denis/git/stac-asset/src/stac_asset/_functions.py", line 419, in download_asset
    await client.download_href(
  File "/home/denis/git/stac-asset/src/stac_asset/client.py", line 120, in download_href
    raise err
  File "/home/denis/git/stac-asset/src/stac_asset/client.py", line 102, in download_href
    async for chunk in self.open_href(
  File "/home/denis/git/stac-asset/src/stac_asset/client.py", line 78, in open_href
    async for chunk in self.open_url(
  File "/home/denis/git/stac-asset/src/stac_asset/http_client.py", line 165, in open_url
    async for chunk, _ in response.content.iter_chunks():
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/streams.py", line 66, in __anext__
    rv = await self._stream.readchunk()
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/streams.py", line 441, in readchunk
    await self._wait("readchunk")
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/streams.py", line 311, in _wait
    with self._timer:
  File "/home/denis/git/stac-asset/env/lib/python3.10/site-packages/aiohttp/helpers.py", line 735, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
10/13:   0%|▌                                                                                                                                              | 2.02M/468M [00:01<04:01, 2.02MB/s, 4 errors, 23 skips]

Originally posted by @drnextgis in #204 (comment)

@drnextgis this one's trickier to reproduce — I was using the following to reproduce #201:

$ stac-asset download --fail-fast --max-concurrent-downloads 5 https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2alb-ta/items/LC09_L2SP_070011_20240717_20240719_02_A1_TOA 
ERROR: LC09_L2SP_070011_20240717_20240719_02_A1_TOA[MTL.json] - ContentTypeError: the actual content type does not match the expected: actual=text/html, expected=application/json 
2/7: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.96k/1.96k [00:00<00:00, 8.46kB/s, 1 errors, 0 skips]Traceback (most recent call last):
-- >8 --

This only produces one error message. Do you have an example that hits a public API that I can use as a reproducer?

Looks like throwing the --http-timeout 5 in there, means that all tasks fail near simultaneously.

@drnextgis I still can't reproduce, even after using --http-timeout to trigger an error:

$ stac search https://planetarycomputer.microsoft.com/api/stac/v1 -c sentinel-2-l2a --max-items 1 | stac-asset download --http-timeout 1 --max-concurrent-downloads 5 --fail-fast - > /dev/null
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[B01] - TimeoutError:                                                                                                                                                                                                                            
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[AOT] - TimeoutError:                                                                                                                                                                                                                            
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[B02] - TimeoutError:                                                                                                                                                                                                                            
ERROR: S2B_MSIL2A_20240723T204839_R057_T27XVJ_20240723T230733[B04] - TimeoutError:                                                                                                                                                                                                                            
4/6:   3%|██████▊                                                                                                                                                                                                                                      | 668k/22.7M [00:00<00:12, 1.82MB/s, 4 errors, 0 skips]Traceback (most recent call last):
  File "/Users/gadomski/Code/stac-asset/.venv/bin/stac-asset", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_cli.py", line 188, in download
    asyncio.run(
  File "/usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_cli.py", line 298, in download_async
    output = await download()
             ^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_cli.py", line 281, in download
    return await _functions.download_item_collection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 355, in download_item_collection
    await downloads.download(messages)
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 152, in download
    raise error
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 145, in download
    results = await asyncio.gather(*tasks)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 176, in download_with_lock
    return await download.download(messages=messages)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 59, in download
    raise error
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 49, in download
    await download_asset(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 431, in download_asset
    raise error
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/_functions.py", line 419, in download_asset
    await client.download_href(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/client.py", line 120, in download_href
    raise err
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/client.py", line 102, in download_href
    async for chunk in self.open_href(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/client.py", line 78, in open_href
    async for chunk in self.open_url(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/planetary_computer_client.py", line 96, in open_url
    async for chunk in super().open_url(
  File "/Users/gadomski/Code/stac-asset/src/stac_asset/http_client.py", line 165, in open_url
    async for chunk, _ in response.content.iter_chunks():
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 66, in __anext__
    rv = await self._stream.readchunk()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 441, in readchunk
    await self._wait("readchunk")
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 311, in _wait
    with self._timer:
  File "/Users/gadomski/Code/stac-asset/.venv/lib/python3.12/site-packages/aiohttp/helpers.py", line 735, in __exit__
    raise asyncio.TimeoutError from None
TimeoutError
4/6:   3%|██████▊

There are a couple of ERROR: lines but I think that's fine, it's not suprising to me that more than one task got to fire off it's error message to the queue before they get cancelled:

if messages:
await messages.put(
ErrorAssetDownload(key=key, href=href, path=path, error=error)
)
raise error

The only piece I'm not sure about is the duplicate asset keys in your error messages — I can't reproduce those, and it's strange to me.

I'll take a closer look at this issue today.

I've added some progress_bar.write statements to _cli.py and config.py:

diff --git a/src/stac_asset/_cli.py b/src/stac_asset/_cli.py
index 69e196d..3e10d04 100644
--- a/src/stac_asset/_cli.py
+++ b/src/stac_asset/_cli.py
@@ -380,6 +380,8 @@ async def report_progress(messages: Optional[MessageQueue]) -> None:
                 name = f"{owners[message.key]}[{message.key}]"
             else:
                 name = f"[{message.key}]"
+            progress_bar.write(json.dumps(owners))
+            progress_bar.write(str(message.path).split("/")[-1])
             progress_bar.write(
                 f"ERROR: {name} - {type(message.error).__name__}: {message.error}",
                 file=sys.stderr,
diff --git a/src/stac_asset/config.py b/src/stac_asset/config.py
index 9a85fae..d19775c 100644
--- a/src/stac_asset/config.py
+++ b/src/stac_asset/config.py
@@ -12,7 +12,7 @@ DEFAULT_S3_REGION_NAME = "us-west-2"
 DEFAULT_S3_RETRY_MODE = "adaptive"
 DEFAULT_S3_MAX_ATTEMPTS = 10
 DEFAULT_HTTP_CLIENT_TIMEOUT = 300
-DEFAULT_HTTP_MAX_ATTEMPTS = 10
+DEFAULT_HTTP_MAX_ATTEMPTS = 1
 
 
 @dataclass

Here's the output:

{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082256950303Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                          
{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082257077301Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                          
{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082257091447Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError:                                                                                          
{"preview": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "analytic": "SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8", "metadata": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8", "thumbnail": "SL-newsat26-20240714T082256583786Z-micro-L0-v0.5.8"}
SL-newsat26-20240714T082256964448Z-micro-L0-v0.5.8_analytic.tiff                                                                                                             
ERROR: SL-newsat26-20240714T082256569641Z-micro-L0-v0.5.8[analytic] - TimeoutError: 

The assets reported as failing by the CLI do not match the actual assets that are failing. It seems like the issue might be with the owners value, which remains consistent across all failures.

@gadomski does it ring any bells?

Here is the command I use:

$ stac-asset -v DEBUG download --fail-fast --path-template '${satl:outcome_id}' --http-timeout 5 --max-concurrent-downloads 10

From what I understand, the owners dictionary tracks information about the most recently started downloads (StartAssetDownload). At the time of a failure, owners contains an item with the same key as the failed asset (since all the items I’m downloading have identical set of asset name), but its value points to a different STAC item's ID.

One straightforward approach to address the issue would be to:

  • Introduce an owner_id attribute to ErrorAssetDownload
  • Eliminate the owners attribute and use something like:
    name = f"{message.owner_id}[{message.key}]"
    instead of the current implementation found here.

@drnextgis thanks for digging in, I think I've got a fix in #206 ... can you check on your side?

It did the trick, thank you @gadomski! Looking forward for the new release.