CottageLabs/OpenArticleGauge

Site is down

Closed this issue · 8 comments

egh commented

I am not sure where to report this, so I am reporting it here. The howopenisit.org site is down.

What the .. you're right. I'm not sure why we weren't notified. Thanks, I'll fix it, and the alerting for it..

Site is back up

Cause of downtime: too many open sockets to elasticsearch and redis, by celery. If we're not pooling connections, now would be a fabulous time to start doing that. #115

Also the flower monitor for celery is not showing anything sensible even though celery is obviously up and crunching away. #113

More monitoring, of app itself, with a push notification to moi. #114

@egh Thanks again for taking the time to report this, this is very much the right place to do so!

Redis is full specifically because celery beat does not work. #116

egh commented

Thanks! I'll be sure to report any downtimes in the future. We have been hitting the API pretty hard for our www.richcitations.org project.

egh commented

The site is up, but it doesn't seem to be actually working.

Yeah, #116 ... it gives you 500's too I assume. Will fix it asap but might end up being tomorrow (8:40pm GMT here). We had to do a minor version celery upgrade recently to fix a bad memory leak in celery itself, I suspect that's caused the regular task component to fail, leading to the temporary datastore (Redis) being filled up incredibly quickly.

Ah well, heavy hitting will shine the spotlight on issues like that. It was a prototype but we'd very much like it to move on from that :).

egh commented

Thanks for looking into it! I'll hold off on processing until I hear back.