cloudfoundry-attic/dea_ng

DEA getting stuck draining

Closed this issue · 4 comments

CF 231

I see the messages below endlessly.

There is a warden process running but not logging anything.

monit has every job as "not monitored"

==> drain.log <==
I, [2016-03-17T21:24:00.597506 #7966]  INFO -- : Drain script invoked with job_check_status hash_unchanged
I, [2016-03-17T21:24:00.598481 #7966]  INFO -- : Sending signal USR2 to DEA.
I, [2016-03-17T21:24:00.598543 #7966]  INFO -- : Hey BOSH, call me back in 5s.

==> dea_next.log <==
{"timestamp":1458249840.5994282,"message":"caught SIGUSR2","log_level":"warn","source":"Dea::Bootstrap","data":{},"thread_id":47110787463560,"fiber_id":47110803950780,"process_id":4459,"file":"/var/vcap/packages/dea_next/lib/dea/lifecycle/signal_handler.rb","lineno":25,"method":"block (3 levels) in setup"}
{"timestamp":1458249840.6003687,"message":"Evacuating (first time: false; can shutdown: true)","log_level":"info","source":"EvacuationHandler","data":{},"thread_id":47110787463560,"fiber_id":47110803950780,"process_id":4459,"file":"/var/vcap/packages/dea_next/lib/dea/lifecycle/evacuation_handler.rb","lineno":15,"method":"evacuate!"}
{"timestamp":1458249842.9788294,"message":"stat-collector.info-retrieval.failed","log_level":"error","source":"Dea::StatCollector","data":{"handle":null,"error":"Warden::Protocol::ProtocolError","backtrace":["/var/vcap/packages/dea_next/vendor/cache/warden-634ebb21eb01/warden-protocol/lib/warden/protocol/base.rb:86:in `rescue in safe'","/var/vcap/packages/dea_next/vendor/cache/warden-634ebb21eb01/warden-protocol/lib/warden/protocol/base.rb:82:in `safe'","/var/vcap/packages/dea_next/vendor/cache/warden-634ebb21eb01/warden-protocol/lib/warden/protocol/base.rb:96:in `wrap'","/var/vcap/packages/dea_next/vendor/cache/warden-634ebb21eb01/warden-protocol/lib/warden/protocol/buffer.rb:14:in `request_to_wire'","/var/vcap/packages/dea_next/vendor/cache/warden-634ebb21eb01/em-warden-client/lib/em/warden/client/connection.rb:83:in `call'","/var/vcap/packages/dea_next/vendor/cache/warden-634ebb21eb01/em-warden-client/lib/em/warden/client.rb:40:in `call'","/var/vcap/packages/dea_next/lib/container/container.rb:193:in `call'","/var/vcap/packages/dea_next/lib/container/container.rb:171:in `info'","/var/vcap/packages/dea_next/lib/dea/stat_collector.rb:26:in `retrieve_stats'","/var/vcap/packages/dea_next/lib/dea/stat_collector.rb:69:in `block in promise_retrieve_stats'","/var/vcap/packages/dea_next/lib/dea/promise.rb:92:in `call'","/var/vcap/packages/dea_next/lib/dea/promise.rb:92:in `block in run'"]},"thread_id":47110787463560,"fiber_id":47110802638620,"process_id":4459,"file":"/var/vcap/packages/dea_next/lib/dea/stat_collector.rb","lineno":28,"method":"rescue in retrieve_stats"}

We have created an issue in Pivotal Tracker to manage this. You can view the current status of your issue at: https://www.pivotaltracker.com/story/show/115900135.

Also I have evacuation_bail_out_time_in_seconds: 1 in my deployment manifest.

Thanks @fraenkel Sorry I missed that commit.