world-federation-of-advertisers/cross-media-measurement

Duchy mill writing output blob error should be transient.

Closed this issue · 3 comments

Describe the bug
Error during writing output blob to storage by mill is categorized as permanent error thus fail the computation. However, the cause of it could be this instability of cloud storage and retry is possible to resolve it.

Steps to reproduce
Run stress test and there is chance to reproduce.

Component(s) affected
Duchy

Version
v0.5.5

Environment
halo-cmm-qa

Additional context

externalComputationId= 491229066843608164

“COMPUTATION_PARTICIPANT_FAILED","message":"Computation Participant failed. We encountered an internal error. Please try again.
T8bECdEZ2y8@aggregator-liquid-legions-v2-mill-daemon-deployment-86dfc7sv8gx: We encountered an internal error. Please try again.
com.google.cloud.storage.StorageException: We encountered an internal error. Please try again.
	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:409)
	at com.google.cloud.storage.StorageImpl.lambda$internalCreate$2(StorageImpl.java:213)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1524)
	at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:210)
	at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:142)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:56)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
{
  "code" : 503,
  "errors" : [ {
    "domain" : "global",
    "message" : "We encountered an internal error. Please try again.",
    "reason" : "backendError"
  } ],
  "message" : "We encountered an internal error. Please try again."
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:570)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:406)
	... 17 more

An instance of storage error with code 503

EsxcQyBbxQ8@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bcngdpf: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
com.google.cloud.storage.StorageException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1062)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$0(ResumableMedia.java:40)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$1(ResumableMedia.java:34)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:683)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:95)
	at com.google.cloud.storage.Blob.writer(Blob.java:1027)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:58)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.http.HttpResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1055)
	... 18 more
{
errorGroups: [1]
insertId: "lwaxv6u25774qnrl"
labels: {3}
logName: "projects/halo-cmm-qa/logs/stderr"
receiveTimestamp: "2024-06-05T04:13:56.564922711Z"
resource: {2}
severity: "ERROR"
sourceLocation: {1}
textPayload: "EsxcQyBbxQ8@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bcngdpf: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
com.google.cloud.storage.StorageException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1062)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$0(ResumableMedia.java:40)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$1(ResumableMedia.java:34)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:683)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:95)
	at com.google.cloud.storage.Blob.writer(Blob.java:1027)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:58)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.http.HttpResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1055)
	... 18 more
"
timestamp: "2024-06-05T04:13:53.368Z"

An instance of error code 502

B7zWvx_2Ipk@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bclgs2x: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

com.google.cloud.storage.StorageException: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:409)
	at com.google.cloud.storage.StorageImpl.lambda$internalCreate$2(StorageImpl.java:213)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1524)
	at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:210)
	at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:142)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:56)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:570)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:406)
	... 17 more

Fixed by PR #1731