Hawaii jobs for INSAR_ISCE_TEST are failing
jhkennedy opened this issue · 8 comments
These jobs are failing with what looks like memory/disk error issues and JPL would like us to investigate.
previously Hawaii jobs have failed with disk space issues, so we restricted c6id.xlarge
(dropping c5d.xlarge
) to ensure more disk was avialble.
[
{
"job_id": "fa5bea5e-cfb4-407c-b2a5-44f74b7bb0b1",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19224,
"granules": [
"S1A_IW_SLC__1SDV_20230205T043052_20230205T043122_047096_05A65E_2987",
"S1A_IW_SLC__1SDV_20230205T043120_20230205T043148_047096_05A65E_5D6F"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230112T043053_20230112T043123_046746_059A9B_C858",
"S1A_IW_SLC__1SDV_20230112T043120_20230112T043148_046746_059A9B_7AB8"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/fa5bea5e-cfb4-407c-b2a5-44f74b7bb0b1/fa5bea5e-cfb4-407c-b2a5-44f74b7bb0b1.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2473.484
]
},
{
"job_id": "2278d437-c706-42cd-8a2b-bdbc13e77909",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19223,
"granules": [
"S1A_IW_SLC__1SDV_20230217T043052_20230217T043122_047271_05AC44_293F"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230124T043053_20230124T043123_046921_05A089_864A"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/2278d437-c706-42cd-8a2b-bdbc13e77909/2278d437-c706-42cd-8a2b-bdbc13e77909.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2527.366
]
},
{
"job_id": "bb26de21-14bd-4332-a11b-c3a00417b274",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19223,
"granules": [
"S1A_IW_SLC__1SDV_20230217T043052_20230217T043122_047271_05AC44_293F"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230205T043052_20230205T043122_047096_05A65E_2987"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/bb26de21-14bd-4332-a11b-c3a00417b274/bb26de21-14bd-4332-a11b-c3a00417b274.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2522.194
]
},
{
"job_id": "4af0661f-b73c-4e41-97e2-602c6ac2190b",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19224,
"granules": [
"S1A_IW_SLC__1SDV_20230217T043052_20230217T043122_047271_05AC44_293F",
"S1A_IW_SLC__1SDV_20230217T043119_20230217T043147_047271_05AC44_56B5"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230205T043052_20230205T043122_047096_05A65E_2987",
"S1A_IW_SLC__1SDV_20230205T043120_20230205T043148_047096_05A65E_5D6F"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/4af0661f-b73c-4e41-97e2-602c6ac2190b/4af0661f-b73c-4e41-97e2-602c6ac2190b.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2522.408
]
},
{
"job_id": "7f9e744d-f1ba-40cd-af4a-104536ce2a42",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19224,
"granules": [
"S1A_IW_SLC__1SDV_20230124T043053_20230124T043123_046921_05A089_864A",
"S1A_IW_SLC__1SDV_20230124T043121_20230124T043148_046921_05A089_74AF"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230112T043053_20230112T043123_046746_059A9B_C858",
"S1A_IW_SLC__1SDV_20230112T043120_20230112T043148_046746_059A9B_7AB8"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/7f9e744d-f1ba-40cd-af4a-104536ce2a42/7f9e744d-f1ba-40cd-af4a-104536ce2a42.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
491.125
]
},
{
"job_id": "5c6d99a0-43c6-4a05-9a4a-d22e65ddf7e0",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19223,
"granules": [
"S1A_IW_SLC__1SDV_20230205T043052_20230205T043122_047096_05A65E_2987"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230112T043053_20230112T043123_046746_059A9B_C858"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/5c6d99a0-43c6-4a05-9a4a-d22e65ddf7e0/5c6d99a0-43c6-4a05-9a4a-d22e65ddf7e0.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2522.276
]
},
{
"job_id": "bebc172b-710a-4407-8bb9-89f8b000a7ad",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19223,
"granules": [
"S1A_IW_SLC__1SDV_20230124T043053_20230124T043123_046921_05A089_864A"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230112T043053_20230112T043123_046746_059A9B_C858"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/bebc172b-710a-4407-8bb9-89f8b000a7ad/bebc172b-710a-4407-8bb9-89f8b000a7ad.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2522.679
]
},
{
"job_id": "58a5889b-bc79-428d-b523-3ad847b434e5",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19223,
"granules": [
"S1A_IW_SLC__1SDV_20230205T043052_20230205T043122_047096_05A65E_2987"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230124T043053_20230124T043123_046921_05A089_864A"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/58a5889b-bc79-428d-b523-3ad847b434e5/58a5889b-bc79-428d-b523-3ad847b434e5.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2528.036
]
},
{
"job_id": "33b345e8-dbcf-47e9-8e56-c5c7d4ad65bb",
"job_type": "INSAR_ISCE_TEST",
"request_time": "2023-02-28T18:16:36+00:00",
"status_code": "FAILED",
"user_id": "cmarshak",
"name": "Hawaii_124_beta_GMAO",
"job_parameters": {
"estimate_ionosphere_delay": true,
"frame_id": 19224,
"granules": [
"S1A_IW_SLC__1SDV_20230205T043052_20230205T043122_047096_05A65E_2987",
"S1A_IW_SLC__1SDV_20230205T043120_20230205T043148_047096_05A65E_5D6F"
],
"secondary_granules": [
"S1A_IW_SLC__1SDV_20230124T043053_20230124T043123_046921_05A089_864A",
"S1A_IW_SLC__1SDV_20230124T043121_20230124T043148_046921_05A089_74AF"
],
"weather_model": "GMAO"
},
"logs": [
"https://hyp3-a19-jpl-contentbucket-1wfnatpznlg8b.s3.us-west-2.amazonaws.com/33b345e8-dbcf-47e9-8e56-c5c7d4ad65bb/33b345e8-dbcf-47e9-8e56-c5c7d4ad65bb.log"
],
"expiration_time": "2023-08-28T00:00:00+00:00",
"processing_times": [
2527.33
]
}
]
Re-run jobs:
https://hyp3-a19-jpl.asf.alaska.edu/jobs/a6853106-61a8-443a-bebe-591004ff0b47
https://hyp3-a19-jpl.asf.alaska.edu/jobs/5ffdb30c-82e4-4a52-a416-7911c5ec6b27
https://hyp3-a19-jpl.asf.alaska.edu/jobs/49bb3d2e-a925-4576-8fd1-3d7cb0b9b444
https://hyp3-a19-jpl.asf.alaska.edu/jobs/f84dc37a-1ced-4bd3-81f1-6138f5d1a3be
https://hyp3-a19-jpl.asf.alaska.edu/jobs/2d47a40e-514b-4d16-b314-998e05e75932
https://hyp3-a19-jpl.asf.alaska.edu/jobs/ceff9a0f-0938-4195-88e5-ade6d144b348
https://hyp3-a19-jpl.asf.alaska.edu/jobs/3bd036c6-fe7c-4a23-a95e-b2d3e681524a
https://hyp3-a19-jpl.asf.alaska.edu/jobs/7836e441-e46d-4eba-8d69-19378513bf65
https://hyp3-a19-jpl.asf.alaska.edu/jobs/5c2c8d14-c757-4d98-b91c-c5f6578fbb3e
All those jobs failed again, as expected.
First two jobs failed because Host EC2 instance terminated.
after 21600 seconds. Is 21600 our cut-off? Checking on the rest...
Yep, all failed with this same error and 21600 s is the INSAR_ISCE_TEST troposphere step timeout length
It looks like a RAiDER error -- here is the log for the RAiDER step:
This line is throwing the error:
https://github.com/dbekaert/RAiDER/blob/b2d98bee9ad92f470993d17ff54fb9d15476e5f5/tools/RAiDER/aria/prepFromGUNW.py#L248
The failure path is the same for all of these jobs.
Since this is a RAiDER issue, I'm going to close this as done.