Duchy mill writing output blob error should be transient.
Closed this issue · 3 comments
renjiezh commented
Describe the bug
Error during writing output blob to storage by mill is categorized as permanent error thus fail the computation. However, the cause of it could be this instability of cloud storage and retry is possible to resolve it.
Steps to reproduce
Run stress test and there is chance to reproduce.
Component(s) affected
Duchy
Version
v0.5.5
Environment
halo-cmm-qa
Additional context
externalComputationId= 491229066843608164
“COMPUTATION_PARTICIPANT_FAILED","message":"Computation Participant failed. We encountered an internal error. Please try again.
T8bECdEZ2y8@aggregator-liquid-legions-v2-mill-daemon-deployment-86dfc7sv8gx: We encountered an internal error. Please try again.
com.google.cloud.storage.StorageException: We encountered an internal error. Please try again.
at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:409)
at com.google.cloud.storage.StorageImpl.lambda$internalCreate$2(StorageImpl.java:213)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.Retrying.run(Retrying.java:65)
at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1524)
at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:210)
at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:142)
at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:56)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
{
"code" : 503,
"errors" : [ {
"domain" : "global",
"message" : "We encountered an internal error. Please try again.",
"reason" : "backendError"
} ],
"message" : "We encountered an internal error. Please try again."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:570)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:406)
... 17 more
renjiezh commented
An instance of storage error with code 503
EsxcQyBbxQ8@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bcngdpf: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
com.google.cloud.storage.StorageException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1062)
at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$0(ResumableMedia.java:40)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.Retrying.run(Retrying.java:65)
at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$1(ResumableMedia.java:34)
at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:683)
at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:95)
at com.google.cloud.storage.Blob.writer(Blob.java:1027)
at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:58)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.http.HttpResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1055)
... 18 more
{
errorGroups: [1]
insertId: "lwaxv6u25774qnrl"
labels: {3}
logName: "projects/halo-cmm-qa/logs/stderr"
receiveTimestamp: "2024-06-05T04:13:56.564922711Z"
resource: {2}
severity: "ERROR"
sourceLocation: {1}
textPayload: "EsxcQyBbxQ8@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bcngdpf: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
com.google.cloud.storage.StorageException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1062)
at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$0(ResumableMedia.java:40)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.Retrying.run(Retrying.java:65)
at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$1(ResumableMedia.java:34)
at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:683)
at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:95)
at com.google.cloud.storage.Blob.writer(Blob.java:1027)
at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:58)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.http.HttpResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1055)
... 18 more
"
timestamp: "2024-06-05T04:13:53.368Z"
renjiezh commented
An instance of error code 502
B7zWvx_2Ipk@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bclgs2x: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 502 (Server Error)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>502.</b> <ins>That’s an error.</ins>
<p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.</ins>
com.google.cloud.storage.StorageException: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 502 (Server Error)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>502.</b> <ins>That’s an error.</ins>
<p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.</ins>
at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:409)
at com.google.cloud.storage.StorageImpl.lambda$internalCreate$2(StorageImpl.java:213)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.Retrying.run(Retrying.java:65)
at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1524)
at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:210)
at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:142)
at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:56)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 502 (Server Error)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>502.</b> <ins>That’s an error.</ins>
<p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.</ins>
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:570)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:406)
... 17 more