googleapis/java-storage

Google Cloud Storage: unexpected exception after successfully writing to a Google Cloud Storage object

danking opened this issue · 2 comments

Environment details

From build.gradle:

implementation(group: 'com.google.cloud', name: 'google-cloud-storage', version: '2.26.1')
  • OS: Ubuntu 22.04
  • JVM: OpenJDK 1.8.0_382

Steps to reproduce

Our groups run thousands of jobs a day generating millions of requests to Google Cloud Storage. Among these jobs, we have now twice observed (ergo it is very very rare) this error. In at least one of these cases, the file we were uploading was, nonetheless fully uploaded with its correct contents even though we received this error message client-side.

Caused by: is.hail.relocated.com.google.cloud.storage.StorageException: Missing Range header in response
	|> PUT https://storage.googleapis.com/upload/storage/v1/b/1-day/o?name=parallelizeAndComputeWithIndex/al3OJfYZMMNoi9F2jvcAf0jBirVTayRVqro03dnIa1g%3D/result.7028&uploadType=resumable&upload_id=ADPycdv7y3A6GjTh6Kv7vrUu2ap2Kv0peLVWsVTAghIs7RCZk9X3fI1BDkeHag1cd9g3etP2sS4f13bN6iJPU_sbnRnyRE91VPtjUpuYLPqmOq13
	|> content-range: bytes */*
	|  
	|< HTTP/1.1 308 Resume Incomplete
	|< content-length: 0
	|< content-type: text/plain; charset=utf-8
	|< x-guploader-uploadid: ADPycdv7y3A6GjTh6Kv7vrUu2ap2Kv0peLVWsVTAghIs7RCZk9X3fI1BDkeHag1cd9g3etP2sS4f13bN6iJPU_sbnRnyRE91VPtjUpuYLPqmOq13
	|  
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:185) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:117) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:98) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionQueryTask.call(JsonResumableSessionQueryTask.java:100) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.query(JsonResumableSession.java:57) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.lambda$put$0(JsonResumableSession.java:73) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.Retrying.lambda$run$0(Retrying.java:102) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.RetryHelper.run(RetryHelper.java:76) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.Retrying.run(Retrying.java:99) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.put(JsonResumableSession.java:68) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.ApiaryUnbufferedWritableByteChannel.internalWrite(ApiaryUnbufferedWritableByteChannel.java:114) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.ApiaryUnbufferedWritableByteChannel.writeAndClose(ApiaryUnbufferedWritableByteChannel.java:65) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.UnbufferedWritableByteChannelSession$UnbufferedWritableByteChannel.writeAndClose(UnbufferedWritableByteChannelSession.java:40) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.DefaultBufferedWritableByteChannel.close(DefaultBufferedWritableByteChannel.java:166) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.StorageByteChannels$SynchronizedBufferedWritableByteChannel.close(StorageByteChannels.java:119) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.StorageException.wrapIOException(StorageException.java:179) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.BaseStorageWriteChannel.close(BaseStorageWriteChannel.java:84) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.$anonfun$close$2(GoogleStorageFS.scala:310) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
...

Code example

I cannot give you a simple code example because this is deeply integrated into our system. We're effectively doing the following:

val writer = storage.writer(blobInfo, options:_*)
// ... in a loop:
writer.write(bb)
writer.close()

Stack trace

is.hail.relocated.com.google.cloud.storage.StorageException: Missing Range header in response
	|> PUT https://storage.googleapis.com/upload/storage/v1/b/1-day/o?name=parallelizeAndComputeWithIndex/al3OJfYZMMNoi9F2jvcAf0jBirVTayRVqro03dnIa1g%3D/result.7028&uploadType=resumable&upload_id=ADPycdv7y3A6GjTh6Kv7vrUu2ap2Kv0peLVWsVTAghIs7RCZk9X3fI1BDkeHag1cd9g3etP2sS4f13bN6iJPU_sbnRnyRE91VPtjUpuYLPqmOq13
	|> content-range: bytes */*
	|  
	|< HTTP/1.1 308 Resume Incomplete
	|< content-length: 0
	|< content-type: text/plain; charset=utf-8
	|< x-guploader-uploadid: ADPycdv7y3A6GjTh6Kv7vrUu2ap2Kv0peLVWsVTAghIs7RCZk9X3fI1BDkeHag1cd9g3etP2sS4f13bN6iJPU_sbnRnyRE91VPtjUpuYLPqmOq13
	|  
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:185) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:117) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionFailureScenario.toStorageException(JsonResumableSessionFailureScenario.java:98) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSessionQueryTask.call(JsonResumableSessionQueryTask.java:100) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.query(JsonResumableSession.java:57) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.lambda$put$0(JsonResumableSession.java:73) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.Retrying.lambda$run$0(Retrying.java:102) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.RetryHelper.run(RetryHelper.java:76) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.Retrying.run(Retrying.java:99) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.JsonResumableSession.put(JsonResumableSession.java:68) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.ApiaryUnbufferedWritableByteChannel.internalWrite(ApiaryUnbufferedWritableByteChannel.java:114) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.ApiaryUnbufferedWritableByteChannel.writeAndClose(ApiaryUnbufferedWritableByteChannel.java:65) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.UnbufferedWritableByteChannelSession$UnbufferedWritableByteChannel.writeAndClose(UnbufferedWritableByteChannelSession.java:40) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.DefaultBufferedWritableByteChannel.close(DefaultBufferedWritableByteChannel.java:166) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.StorageByteChannels$SynchronizedBufferedWritableByteChannel.close(StorageByteChannels.java:119) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.StorageException.wrapIOException(StorageException.java:179) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.relocated.com.google.cloud.storage.BaseStorageWriteChannel.close(BaseStorageWriteChannel.java:84) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.$anonfun$close$2(GoogleStorageFS.scala:310) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.doHandlingRequesterPays(GoogleStorageFS.scala:280) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.$anonfun$close$1(GoogleStorageFS.scala:310) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
	at is.hail.services.package$.retryTransientErrors(package.scala:182) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.GoogleStorageFS$$anon$2.close(GoogleStorageFS.scala:308) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at java.io.FilterOutputStream.close(FilterOutputStream.java:159) ~[?:1.8.0_382]
	at is.hail.utils.package$.using(package.scala:677) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.FS.writePDOS(FS.scala:441) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.FS.writePDOS$(FS.scala:440) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.io.fs.RouterFS.writePDOS(RouterFS.scala:3) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.$anonfun$main$4(Worker.scala:124) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.$anonfun$main$4$adapted(Worker.scala:124) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.$anonfun$main$13(Worker.scala:178) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
	at is.hail.services.package$.retryTransientErrors(package.scala:182) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Worker$.main(Worker.scala:177) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Main$.main(Main.scala:14) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	at is.hail.backend.service.Main.main(Main.scala) ~[gs:__hail-test-ezlis_dking_jars_ch4g3zvqceyo_09526a168d57dac1a26f8caa4ab49593931ed2ef.jar.jar:0.0.1-SNAPSHOT]
	... 12 more

Any additional information below

I realize that without a simple reproduction you can't diagnose and solve this issue. I have a much simpler ask: what does this error mean? What, broadly, triggers this error? Does this error indicate some sort of internal server error at Google? Is it safe to retry calling close or would I need to open a new writer, write all the bytes, and close again?

The bug that caused this error to be raised was fixed in v2.27.0, I recommend upgrading to the latest version to get the fix.

As to the question of what does this error mean:
This error happens when attempting to finalize the object in GCS a request failed, before retrying the request in the library the library will query GCS to determine if the previous request succeeded and we just didn't get the response or how many bytes GCS did receive. Once we know the state the upload is in from GCS' perspective the client will reconcile and attempt to finalize again. Unfortunately, there was a bug in the reconciliation code that affected v2.25.0, v2.26.0 and v2.26.1 that has since been fixed.

To your question of how might you recover from this error if you needed to yourself?
When an exception happens during close() the WriteChannel will be in a terminal state and will not accept any more write calls and calling close again will be a no-op. Under the covers, the operations attempting to finalize the object will be retried in accordance with the RetrySettings provided to the StorageOptions builder, however if that retry budget (defaults of the budget are documented in https://cloud.google.com/storage/docs/retry-strategy#java) is exhausted there is nothing more the client itself could do.

To recover from an exception during close() you would first want to attempt to get the object and see if the object had in fact been created, if not attempting to create the object again.

Hey @BenWhitehead ! Thanks for the detailed and instructive response. We'll update to 2.27.1 immediately. I'm glad this ultimately only needed an update to our google storage client library!

Thank you kindly!
Dan