opendata-stuttgart/feinstaub-map-v2

Service dead? Or overload?

michapr opened this issue · 8 comments

Hi,

I get all the time only the map - without any data (or sensors).
What is wrong? I have ried different browsers (FireFox, Chrome)

Thanks,
Michael

We are working on this. As the traffic has increased massively in some data centers the internal networks are sometimes overloaded.
For the moment it seems like we could solve this for the registration page, but not for our main database.

Is the service still affected?
Upon closer inspection, the requests to https://maps.sensor.community/data/v2/data.24h.json return a 200 but are empty.

Edit: service came back

Database crash. And as the data centers are a little bit overloaded at the moment the recovery needed much more time than usual.

Maybe you guys could react to database crashes differently :) ?
Returning a 500 or better a 503 would indicate that the service is currently disrupted due to another service being disrupted. Maybe this helps people in their debugging efforts as well :)

We also had to disable the web services as otherwise the recreation would have needed much longer. So there wasn't a possibility to return a status code.
And just before someone is asking: Inserting a load balancer would result in a longer time a connection is open (while normal operation). This would result in more parallel connections. And this would result in other problems.

Hi,
about what number of parallel connections we are speaking here?
And what is the "normal" connection time for a process?
Thank you for your support!
Michael

The average:
11.500 sensors sending 2 measurements every 145 seconds: 158 new connections/second

But in reality this can go up to around 1000 parallel connections (i.e. all Raspi systems sending exactly every full minute, NTP synced ...)

Normal processing time is lower than 100 milliseconds. This depends also on the connection speed (TCP handshake, TLS handshake if enabled).

We also had to disable the web services as otherwise the recreation would have needed much longer. So there wasn't a possibility to return a status code.
And just before someone is asking: Inserting a load balancer would result in a longer time a connection is open (while normal operation). This would result in more parallel connections. And this would result in other problems.

Got it! Thank you for the clarification and efforts!
Cheers,

Max