IIIF/website

AWS Monitoring emails

Closed this issue · 1 comments

Just noting this to have a backup of the investigation. I received a number of AWS Elastic beanstalk warnings over Sunday (24/11/2019) warning of issues at 10:00 and 09:00am. Investigating the logs it doesn't look like there were lots of error messages in the access logs to cause this. It looked more like there was an issue with the elastic beanstalk network.

Specifically the elastic beanstalk console was showing:

Environment health has transitioned from Ok to Severe. None of the instances are sending data

Looking into the healthd daemon.log you can see the following:

W, [2019-11-24T09:27:37.182397 #3207]  WARN -- : discarding statistic item after validation error (Invalid timestamp): {:id=>"0", :namespace=>"application", :timestamp=>1574587510, :data
=>"{\"duration\":10,\"latency_histogram\":[[0.012,1],[0.019,1]],\"http_counters\":{\"status_200\":2,\"request_count\":2}}"}
W, [2019-11-24T09:27:37.182538 #3207]  WARN -- : discarding statistic item after validation error (Invalid timestamp): {:id=>"1", :namespace=>"application", :timestamp=>1574587520, :data
=>"{\"duration\":10,\"latency_histogram\":[[0.014,1],[0.015,1]],\"http_counters\":{\"status_200\":2,\"request_count\":2}}"}
W, [2019-11-24T09:27:37.182571 #3207]  WARN -- : discarding statistic item after validation error (Invalid timestamp): {:id=>"2", :namespace=>"application", :timestamp=>1574587530, :data
=>"{\"duration\":10,\"latency_histogram\":[[0.023,1],[0.024,1]],\"http_counters\":{\"status_200\":2,\"request_count\":2}}"}

and also:

W, [2019-11-24T09:31:07.787781 #3207]  WARN -- : sending message(s) failed: (Seahorse::Client::NetworkingError) Net::ReadTimeout
W, [2019-11-24T09:31:28.117712 #3207]  WARN -- : sending message(s) failed: (Seahorse::Client::NetworkingError) Net::ReadTimeout
W, [2019-11-24T09:31:48.445648 #3207]  WARN -- : sending message(s) failed: (Seahorse::Client::NetworkingError) Net::ReadTimeout
W, [2019-11-24T09:32:08.773082 #3207]  WARN -- : sending message(s) failed: (Seahorse::Client::NetworkingError) Net::ReadTimeout

which roughly correspond to the time of the issue. So I'm chalking this up to a temporary network error.

The only other unusual thing is a large amount of request (around 5k) at around 05:02 UTC which appears to be a hacking attempt e.g:

x.x.x.x - - [24/Nov/2019:05:17:49 +0000] "GET /news/2016/09/16/812NSbkU';select%20pg_sleep(9);%20--%20/ HTTP/1.1" 404 399 "-" "Amazon CloudFront" "x.x.x.x , x.x.x.x "

but these all seemed to fail as expected and other than the slightly increased load doesn't seem to have had an affect.