cvmfs-contrib/cvmfs-servermon

critical geo error: HTTP Error 400: Bad Request

Closed this issue · 11 comments

@DrDaveD any ideas about this? It started happening recently, not sure what changed. Upgrading to cvmfs-servermon-1.23-1.24.obs.el7.noarch and doing sudo cvmfs_server update-geodb did not help.

[Tue Apr 11 12:10:24.508467 2023] [mpm_prefork:notice] [pid 2789] AH00163: Apache/2.4.6 (CentOS) mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations
[Tue Apr 11 12:10:24.508490 2023] [core:notice] [pid 2789] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Tue Apr 11 12:10:32.701243 2023] [:error] [pid 20927] processed /etc/cvmfsmon/api.conf
[Tue Apr 11 12:10:32.701294 2023] [:error] [pid 20927] aliases: {'local': '127.0.0.1'}
[Tue Apr 11 12:10:32.701302 2023] [:error] [pid 20927] excludes: []
[Tue Apr 11 12:10:32.701313 2023] [:error] [pid 20927] limits: {'whitelist-warning': 48, 'updated-warning': 4, 'gc-warning': 2, 'gc-critical': 7, 'updated-critical': 18}
[Tue Apr 11 12:10:32.701320 2023] [:error] [pid 20927] subdirectories: {}
[Tue Apr 11 12:10:32.755886 2023] [:error] [pid 20926] cvmfs_geo: opened /var/lib/cvmfs-server/geo/GeoLite2-City.mmdb
[Tue Apr 11 12:19:40.528038 2023] [:error] [pid 20926] cvmfs_geo: opened /var/lib/cvmfs-server/geo/GeoLite2-City.mmdb
[Tue Apr 11 12:25:40.610251 2023] [:error] [pid 20926] cvmfs_geo: closed old /var/lib/cvmfs-server/geo/GeoLite2-City.mmdb
[Tue Apr 11 12:57:21.347786 2023] [:error] [pid 20926] [remote 162.158.95.90:132] mod_wsgi (pid=20926): Exception occurred processing WSGI script '/var/www/wsgi-scripts/cvmfs-server/cvmfs-api.wsgi'.
[Tue Apr 11 12:57:21.347861 2023] [:error] [pid 20926] [remote 162.158.95.90:132] Traceback (most recent call last):
[Tue Apr 11 12:57:21.347896 2023] [:error] [pid 20926] [remote 162.158.95.90:132]   File "/var/www/wsgi-scripts/cvmfs-server/cvmfs-api.wsgi", line 18, in application
[Tue Apr 11 12:57:21.348006 2023] [:error] [pid 20926] [remote 162.158.95.90:132]     return cvmfs_api.dispatch(api_func, path_info, repo_name, version, start_response, environ)
[Tue Apr 11 12:57:21.348039 2023] [:error] [pid 20926] [remote 162.158.95.90:132]   File "/usr/share/cvmfs-server/webapi/cvmfs_api.py", line 23, in dispatch
[Tue Apr 11 12:57:21.348315 2023] [:error] [pid 20926] [remote 162.158.95.90:132]     return cvmfs_geo.api(path_info, repo_name, version, start_response, environ)
[Tue Apr 11 12:57:21.348339 2023] [:error] [pid 20926] [remote 162.158.95.90:132]   File "/usr/share/cvmfs-server/webapi/cvmfs_geo.py", line 313, in api
[Tue Apr 11 12:57:21.348541 2023] [:error] [pid 20926] [remote 162.158.95.90:132]     onegood, indexes = geosort_servers(now, gir_rem, servers, trycdn)
[Tue Apr 11 12:57:21.348560 2023] [:error] [pid 20926] [remote 162.158.95.90:132]   File "/usr/share/cvmfs-server/webapi/cvmfs_geo.py", line 205, in geosort_servers
[Tue Apr 11 12:57:21.348585 2023] [:error] [pid 20926] [remote 162.158.95.90:132]     gir_server = name_geoinfo(now, server)
[Tue Apr 11 12:57:21.348598 2023] [:error] [pid 20926] [remote 162.158.95.90:132]   File "/usr/share/cvmfs-server/webapi/cvmfs_geo.py", line 173, in name_geoinfo
[Tue Apr 11 12:57:21.348617 2023] [:error] [pid 20926] [remote 162.158.95.90:132]     gir = gir['location']
[Tue Apr 11 12:57:21.348646 2023] [:error] [pid 20926] [remote 162.158.95.90:132] KeyError: 'location'

A geodb lookup on the server seems to work fine:

$ mmdblookup --file /var/lib/cvmfs-server/geo/GeoLite2-City.mmdb --ip <public IP of server>  location

  {
    "accuracy_radius": 
      1000 <uint16>
    "latitude": 
      43.631900 <double>
    "longitude": 
      -79.371600 <double>
    "time_zone": 
      "America/Toronto" <utf8_string>
  }

Also the python socket lookup seems to work fine:

$ python
Python 2.7.5 (default, Nov 16 2020, 22:23:17) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> ai = socket.getaddrinfo("<hostname>",80,0,0,socket.IPPROTO_TCP)
>>> print(ai)
[(2, 1, 6, '', ('<public IP>', 80))]

This is a cvmfs error, not cvmfs-servermon, as far as I can tell. The messages at 12:10:32 are cvmfs-servermon but not the errors at 12:57:21.

I noticed an error looking up many addresses at CERN this week, they had been removed from the Maxmind DB. I submitted a correction which is supposed to be released on Friday. I wonder if this is related. Probably not. More likely it is related to this fix that went last minute into cvmfs-server-2.10.1 which allowed for the location to be empty. That was in response to cvmfs/cvmfs#3186 which was for missing location info on a Cloudflare IP address. The address showing in your stack trace is also a Cloudflare address.

Oh I misinterpreted the output.

Thanks for the tip, that should fix it, I'll give it a try.

But that being the case, if the error is with a cvmfs-server geolookup for a client with a cloudflare IP, I'm not sure why servermon would flag a critical geo error:

$ curl "http://localhost/cvmfsmon/api/v1.0/all&format=details"
{"CRITICAL": {"geo": [{"msg": "HTTP Error 400: Bad Request",
                       "repo": "cvmfs-config.computecanada.ca"}]},
...

I guess the python error was a red herring, the real issue I was trying to address was:

$ curl "http://localhost/cvmfsmon/api/v1.0/all&format=details"
{"CRITICAL": {"geo": [{"msg": "HTTP Error 400: Bad Request",
                       "repo": "containers.computecanada.ca"}]},

Since updating to 2.10.1 fixed the other issue, now I don't see any httpd errors so I am not sure what the issue is.

[Tue Apr 11 14:55:19.060111 2023] [mpm_prefork:notice] [pid 2789] AH00171: Graceful restart requested, doing restart
[Tue Apr 11 14:55:19.131302 2023] [lbmethod_heartbeat:notice] [pid 2789] AH02282: No slotmem from mod_heartmonitor
[Tue Apr 11 14:55:19.132634 2023] [mpm_prefork:notice] [pid 2789] AH00163: Apache/2.4.6 (CentOS) mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations
[Tue Apr 11 14:55:19.132653 2023] [core:notice] [pid 2789] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Tue Apr 11 14:55:46.260873 2023] [:error] [pid 19679] processed /etc/cvmfsmon/api.conf
[Tue Apr 11 14:55:46.260927 2023] [:error] [pid 19679] aliases: {'local': '127.0.0.1'}
[Tue Apr 11 14:55:46.260936 2023] [:error] [pid 19679] excludes: []
[Tue Apr 11 14:55:46.260947 2023] [:error] [pid 19679] limits: {'whitelist-warning': 48, 'updated-warning': 4, 'gc-warning': 2, 'gc-critical': 7, 'updated-critical': 18}
[Tue Apr 11 14:55:46.260955 2023] [:error] [pid 19679] subdirectories: {}
[Tue Apr 11 14:55:46.324773 2023] [:error] [pid 19678] cvmfs_geo: opened /var/lib/cvmfs-server/geo/GeoLite2-City.mmdb
[Tue Apr 11 14:56:57.531524 2023] [:error] [pid 19679] processed /etc/cvmfsmon/api.conf
[Tue Apr 11 14:56:57.531579 2023] [:error] [pid 19679] aliases: {'local': '127.0.0.1'}
[Tue Apr 11 14:56:57.531592 2023] [:error] [pid 19679] excludes: []
[Tue Apr 11 14:56:57.531611 2023] [:error] [pid 19679] limits: {'whitelist-warning': 96, 'updated-warning': 4, 'gc-warning': 2, 'gc-critical': 7, 'updated-critical': 18}
[Tue Apr 11 14:56:57.531622 2023] [:error] [pid 19679] subdirectories: {}

The actual CVMFS geo API seems to be working okay:

$ curl http://localhost/cvmfs/containers.computecanada.ca/api/v1.0/geo/`hostname -f`/cvmfs-s1fnal.opensciencegrid.org,cvmfs-stratum-one.cern.ch,cvmfs-stratum-one.ihep.ac.cn,cvmfs-s1-arbutus.computecanada.ca,cvmfs-s1-beluga.computecanada.ca,`hostname -f`
6,5,1,4,3,2

Oh I think I see the problem. The probe that the cvmfs-servermon geo test uses happens to have cvmfs-stratum-one.cern.ch in its test query, and that's in the range of addresses that disappeared from the maxmind DB. That was probably a bad choice. If you replace your `hostname -f` with cvmfs-stratum-one.cern.ch do you see the failure? Can you find an older copy of /var/lib/cvmfs-server/geo/GeoLite2-City.mmdb to install on your stratum 1 until the fixed one comes out on Friday?

You had cvmfs-stratum-one.cern.ch in the list of servers to sort, and in that case the geo api just assumes it is far away (that's why 2 is on the end of your list) but the cvmfs-servermon geo probe uses it also as the reference address (after the /geo/`, before the list of stratum 1s) and that is required to be able to be in the database.

Oh yeah:

$ curl http://localhost/cvmfs/containers.computecanada.ca/api/v1.0/geo/cvmfs-stratum-one.cern.ch/cvmfs-s1fnal.opensciencegrid.org,cvmfs-stratum-one.cern.ch,cvmfs-stratum-one.ihep.ac.cn,cvmfs-s1-arbutus.computecanada.ca,cvmfs-s1-beluga.computecanada.ca,`hostname -f`
Bad Request: remote addr not found in database

$ curl http://localhost/cvmfs/containers.computecanada.ca/api/v1.0/geo/`hostname -f`/cvmfs-s1fnal.opensciencegrid.org,cvmfs-stratum-one.cern.ch,cvmfs-stratum-one.ihep.ac.cn,cvmfs-s1-arbutus.computecanada.ca,cvmfs-s1-beluga.computecanada.ca,`hostname -f`
6,5,1,4,3,2

This was also more confusing due to cvmfs/cvmfs#3255 since normally I would just test with an arbitrary proxy string but that didn't work anymore.

I don't have an older copy of the mmdb but we can wait until Friday. For some reason only one s1 server is affected and it seems to be fine aside from the geoapi monitoring.

The other s1 doesn't have an older copy of the mmdb?

It must have cleared up with a geodb update.