GaxError Exception occurred in retry method
Closed this issue · 9 comments
Hi,
While using Gax in google-cloud-language development, both my local testing and our Travis CI build sometimes get the following error. Retrying the call usually succeeds.
The backtrace pasted below is from Job #1433, which should contain all details about the environment.
Thank you!
Google::Gax::RetryError: GaxError Exception occurred in retry method that was not classified as transient, caused by 14:{"created":"@1472169812.007530609","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1472169812.007501620","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:352:in `rescue in block (2 levels) in retryable'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:346:in `block (2 levels) in retryable'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:345:in `loop'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:345:in `block in retryable'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:264:in `call'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:264:in `block in catch_errors'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:226:in `call'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:226:in `block in create_api_call'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:252:in `call'
/home/travis/.rvm/gems/ruby-2.2.5/gems/google-gax-0.4.4/lib/google/gax/api_callable.rb:252:in `block in create_api_call'
/home/travis/build/GoogleCloudPlatform/google-cloud-ruby/google-cloud-language/lib/google/cloud/language/v1beta1/language_service_api.rb:198:in `call'
/home/travis/build/GoogleCloudPlatform/google-cloud-ruby/google-cloud-language/lib/google/cloud/language/v1beta1/language_service_api.rb:198:in `annotate_text'
/home/travis/build/GoogleCloudPlatform/google-cloud-ruby/google-cloud-language/lib/google/cloud/language/service.rb:80:in `block in annotate'
/home/travis/build/GoogleCloudPlatform/google-cloud-ruby/google-cloud-language/lib/google/cloud/language/service.rb:105:in `execute'
/home/travis/build/GoogleCloudPlatform/google-cloud-ruby/google-cloud-language/lib/google/cloud/language/service.rb:80:in `annotate'
/home/travis/build/GoogleCloudPlatform/google-cloud-ruby/google-cloud-language/lib/google/cloud/language/project.rb:176:in `annotate'
/home/travis/build/GoogleCloudPlatform/google-cloud-ruby/google-cloud-language/acceptance/language/text_test.rb:60:in `block (3 levels) in <top (required)>'
Are errors like this to be expected? Do clients need to do anything to handle these errors?
Hmm. It looks like a gRPC error occurred there (and GAX thinks it's not a retryable error, so it raises an exception).
I've never seen such error as long as I tried locally, so I am not sure why it happens on you or Travis. description":"Secure read failed"
looks weird -- it happens some exceptional network failure happens.
Looked into the message a bit more, the core reason of the failure seems to be "description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235
, which is seen at the end of the first line. And that happens when recvmsg
(in C) returns 0.
According to http://pubs.opengroup.org/onlinepubs/009695399/functions/recvmsg.html,
If no messages are available to be received and the peer has performed an orderly shutdown, recvmsg() shall return 0.
I am not sure why the connection was closed by the peer (i.e. google service), but I think that would be a network trouble on the machine.
Network trouble on the Google service? Or the client?
I'm thinking about the client-side trouble.
In the case of Travis, I believe that the network connection from the Travis instance will be managed/supervised by Travis itself, and the connection could be forcibly closed by them for some reasons (time consuming, for example).
Ugh, I've heard that some troubles happened on Google's service-side very recently and that might have caused this issue. They fixed the code, so, please double check if it still reproduces.
I can no longer reproduce, and I haven't seen it on Travis CI today either. Thanks for your help!
I'm seeing this error in both error_reporting and logging when running locally and on GAE. The error isn't consistent, and it doesn't seem to prevent my gRPC requests to go through successfully.