prometheus/client_python

start_http_server incapable to listen on IPv6 sockets

zajdee opened this issue · 23 comments

This is a very similar issue to #132 and #69.

#69 was closed as a bug in cPython. That bug was resolved in cPython 3.8, however client_python has since migrated to WSGI.

This is how it all behaves with the latest Python (3.8.5):

# python3.8
Python 3.8.5 (default, Jul 23 2020, 07:58:41)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import prometheus_client
>>> prometheus_client.start_http_server(port=9097, addr='::')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/prometheus_client/exposition.py", line 78, in start_wsgi_server
    httpd = make_server(addr, port, app, ThreadingWSGIServer, handler_class=_SilentHandler)
  File "/usr/local/lib/python3.8/wsgiref/simple_server.py", line 154, in make_server
    server = server_class((host, port), handler_class)
  File "/usr/local/lib/python3.8/socketserver.py", line 452, in __init__
    self.server_bind()
  File "/usr/local/lib/python3.8/wsgiref/simple_server.py", line 50, in server_bind
    HTTPServer.server_bind(self)
  File "/usr/local/lib/python3.8/http/server.py", line 138, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "/usr/local/lib/python3.8/socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
socket.gaierror: [Errno -9] Address family for hostname not supported

The issue lies with the ThreadingWSGIServer class, which defaults to AF_INET address family.
A workaround similar to bottlepy/bottle@af547f4 can be implemented in exposition.py:

# diff -u /usr/local/lib/python3.8/site-packages/prometheus_client/exposition.py.broken /usr/local/lib/python3.8/site-packages/prometheus_client/exposition.py
--- /usr/local/lib/python3.8/site-packages/prometheus_client/exposition.py.broken	2020-07-23 11:28:12.198472090 +0000
+++ /usr/local/lib/python3.8/site-packages/prometheus_client/exposition.py	2020-07-23 11:29:07.506269752 +0000
@@ -75,7 +75,12 @@
 def start_wsgi_server(port, addr='', registry=REGISTRY):
     """Starts a WSGI server for prometheus metrics as a daemon thread."""
     app = make_wsgi_app(registry)
-    httpd = make_server(addr, port, app, ThreadingWSGIServer, handler_class=_SilentHandler)
+    server_cls = ThreadingWSGIServer
+    if ':' in addr: # Fix wsgiref for IPv6 addresses.
+        if getattr(server_cls, 'address_family') == socket.AF_INET:
+            class server_cls(server_cls):
+                address_family = socket.AF_INET6
+    httpd = make_server(addr, port, app, server_cls, handler_class=_SilentHandler)
     t = threading.Thread(target=httpd.serve_forever)
     t.daemon = True
     t.start()

And voila, now it works as expected:

# python3.8
Python 3.8.5 (default, Jul 23 2020, 07:58:41)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import prometheus_client
>>> prometheus_client.start_http_server(port=9097, addr='::')
>>> prometheus_client.start_http_server(port=9098)
>>>
# netstat -ltnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
(...)
tcp        0      0 0.0.0.0:9098            0.0.0.0:*               LISTEN      20262/python3.8
(...)
tcp6       0      0 :::9097                 :::*                    LISTEN      20262/python3.8
(...)

This is a non-invasive fix which doesn't break any existing code while at the same time provides those of us who wish our exporters listened on IPv6 sockets.

Is it possible to adjust exposition.py this way, please? Thank you.

This code doesn't appear to be correct, a colon in a hostname does not mean that it is an IPv6 address. Nor do I think we should have hacks like this to workaround issues in libraries below us.

As with the previous issue this appears to be a bug in cPython, and should be fixed there.

DNS hostnames cannot contain the colon sign (:), see for example the following resources:

The only valid case where the address would contain colon is the IPv6 address, e.g. :: or ::1.

It would be great if you were proposing solutions to actual problems people see, such as the start_http_server, which is widely used, being incapable of listening on IPv6, not just blaming cPython or stating that others' bugfixes code are hacks.

DNS is intended to 8-bit clean, technically any character (including null) can appear in a label.

If we're passing a v6 address in to a function which doesn't mention only working on v4, then that's a bug in that function. The issue here appears to be that WSGIServer has broken v6 somehow. This repo has no v4 or v6 specific code, we're entirely relying on cPython to do the right thing.

DNS is intended to 8-bit clean, technically any character (including null) can appear in a label.

In theory, there is no difference between theory and practice. But, in practice, there is.
And in practice, there is no colon in any valid DNS label.

we're entirely relying on cPython to do the right thing.

And the users of this repo are entirely relying on your code doing the right thing. Like binding to valid addresses. Which doesn't work now.

Questions for you:

  • Do you prefer to have a pure clean code or a working code?
  • Why not to implement a workaround for a bug in WSGIServer code, if such a possibility exists, until the moment WSGIServer gets fixed?
  • How can I, as a user, avoid using WSGIServer? Apparently this library used python's http.server before, which, if it was kept, would have now worked even with IPv6. With the move to WSGI, it's broken (again).

Thank you for answering these questions.

How can I, as a user, avoid using WSGIServer?

Noone is forcing you to use it. In general start_http_server is meant to get you up and running quickly, in most serious applications I'd expect exposition to be via whatever existing http server the application uses.

Why not to implement a workaround for a bug in WSGIServer code, if such a possibility exists, until the moment WSGIServer gets fixed?

I prefer to see problems fixed at their source, rather than workarounds layered throughout the stack. Is there a bug tracking this on the cPython side?

Is there a bug tracking this on the cPython side?

Yes. https://bugs.python.org/issue20215

That doesn't appear to be related to WSGI, are you sure that's the right issue?

zloo commented

That doesn't appear to be related to WSGI, are you sure that's the right issue?

It is the correct issue. WSGI does the same thing as prometheus_client - blindly passes whatever it has down to HTTPServer, which then passes everything to TCPServer, which defaults to AF_INET unless subclassed and changed.

So the link @zajdee provided is indeed the correct issue, unless you claim that fixing it at client_python is wrong but fixing it in wsgi is good, which would be weird. :)

Well, I'm thinking about this all over and over again.
The root of the issue is within the TCPSocket code, which is obviously incapable of listening on both protocols.

In some operating systems and with a specific configuration, an IPv6 socket can listen for both IPv4 and IPv6 connections, IF the configuration allows so. The disadvantage is that the application sees that IPv4 connection as if it was initiated from the IPv6 address prefix ::ffff:0:0:/96 where client's IPv4 address is mapped in the last 32 bits of that prefix.

An alternative to using just an IPv6 socket to handle both protocol connections is to use two sockets, based on current system's configuration, and multiplex connection processing.

As for this specific problem, the workaround provided in my original post would help for the case where an IPv6 address would be provided (e.g. ::, ::1, 2001:db8:dead::beef). Only in the case of binding to :: together with the OS-specific configuration (as mentioned above) would the server accept IPv4 connections on that socket too.

For hostnames resolving to address families of both protocols, the workaround wouldn't help at all.

This is definitely something to be fixed in the code of TCPSocket, as both client_python and WSGIServer just pass the address down the stack. But it isn't.

@brian-brazil: there is no light at the end of the tunnel to have this fixed in TCPSocket any time soon. Considering all of the above, including your statement that there is currently no IPv4/IPv6 specific code, would you be willing to make any change to the repo to at least enable listening on ::?

Thank you.

It is the correct issue.

I'm a bit confused then here. What exactly was fixed in cPython 3.8?

http.server was fixed to allow binds on IPv6 addresses, e.g. ::.
TCPSocket is still broken.

# python3 -V
Python 3.7.3
# python3 -m http.server --bind ::
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.7/http/server.py", line 1262, in <module>
    test(HandlerClass=handler_class, port=args.port, bind=args.bind)
  File "/usr/lib/python3.7/http/server.py", line 1230, in test
    with ServerClass(server_address, HandlerClass) as httpd:
  File "/usr/lib/python3.7/socketserver.py", line 452, in __init__
    self.server_bind()
  File "/usr/lib/python3.7/http/server.py", line 137, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "/usr/lib/python3.7/socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
socket.gaierror: [Errno -9] Address family for hostname not supported
# python3.8 -V
Python 3.8.5
# python3.8 -m http.server --bind ::
Serving HTTP on :: port 8000 (http://[::]:8000/) ...
^C
Keyboard interrupt received, exiting.

Do you have a link to how they fixed it? I'd rather not reinvent the wheel.

Well. Yes, I have a link, but you will probably not like the patch. (It has the same disadvantages as the workaround proposed in this bug report.)

https://bugs.python.org/file39390/ipv6.patch
https://bugs.python.org/issue24209
https://docs.python.org/3.8/library/http.server.html
https://stackoverflow.com/questions/25817848/python-3-does-http-server-support-ipv6

As far as I see, we could also add an extra parameter to explicitely allow binding on IPv6.

As far as I see, we could also add an extra parameter to explicitely allow binding on IPv6.

start_http_server is meant to be very very simple to get someone exposing metrics quickly, in principle it shouldn't even be allowing an address to be specified. If you've more complex needs, it's better to use your own http server rather than growing start_http_server into a does-everything solution.

Well. Yes, I have a link, but you will probably not like the patch.

Hmm, that doesn't look like the patch that was used.
python/cpython@f289084 seems to be it, which was rejected as an idea in https://bugs.python.org/issue20215. In fact it looks like a https://bugs.python.org/file39390/ipv6.patch like approach was removed from the code.

This seems a bit messy on the Python end, so I'd like to get a better handle on how cPython intendeds to approach this.

There seem to be two generations of the http.server IPv6-related patches.
python/cpython#10595
python/cpython#11767

And yes, it's a bit messy. Dual-stack support is hard.

I'm going to need to think a bit on this I think.

May I ask why you want to listen only on v6 and not v4?

Well, I'd prefer to listen on both protocols where applicable, e.g. when you specify a hostname which resolves to addresses of multiple protocols (e.g. localhost resolves to 127.0.0.1 and ::1).

If no bind address is specified, I prefer to listen on both protocols. This can be achieved two ways:

  • In case of bindv6only=0 sysctl value on Linux, one can bind to :: (IPv6 ANY address) and the socket will process both IPv4 and IPv6 connections
  • A more general approach is to concurrently listen to IPv4 ANY (0.0.0.0) and IPv6 ANY (::) with the IPv6 ANY having the socket option of IPV6_V6ONLY set (on Linux) and having each of the sockets process connections of their respective protocol families.

This seems way too low level for this library to me though and you're right this should all be handled elsewhere.

If we at least enable listening on IPv6 :: (ANY) while having the bindv6only OS-wide sysctl option set to 0, the process using start_http_server(..., addr='::') will be able to accept and process connections of both protocol families.

This seems way too low level for this library to me though and you're right this should all be handled elsewhere.

Yeah, that's too low level for this library in general (before considering that start_http_server is meant to be simplistic) and it's also Linux specific.

The challenge is that Python has changed things around with no explanation, and in contradiction of their discussions, so I don't have anything to really work from here.

As this is the first time this has come up I'd suggest using a http server that does what you need and serving metrics from there, rather than trying to extend start_http_server.

So it looks like there is still no solution to easily export some metrics using this client over IPv6 without relying on a separate http server ?

For what it's worth, while some comments here say this should be fixed in python 3.8 I've tried 3.8 3.9 and 3.10(.0a5) to no avail (same address family not supported error), so this isn't quite over yet...

EDIT: nevermind, it's the former implemetation that would have been fixed if it had been kept up to now, not wsgi's ; I read that too fast. Sorry for the noise.

So python fixed it in http.server only in the test/interactive function like this:

def _get_best_family(*address):
    infos = socket.getaddrinfo(
        *address,
        type=socket.SOCK_STREAM,
        flags=socket.AI_PASSIVE,
    )
    family, type, proto, canonname, sockaddr = next(iter(infos))
    return family, sockaddr

def test(HandlerClass=BaseHTTPRequestHandler,
         ServerClass=ThreadingHTTPServer,
         protocol="HTTP/1.0", port=8000, bind=None):
    ...
    ServerClass.address_family, addr = _get_best_family(bind, port)
    ...
    with ServerClass(addr, HandlerClass) as httpd:

Instanciating a WSGIServer, HTTPServer or even TCPServer would not benefit from the fix so it's a misinterpretation to think that the original code using HTTPServer directly would have been fixed on newer versions - it's apparently up to applications to pick the address family and set it for their instance of the class (?!)

The python issue tracking it https://bugs.python.org/issue24209 was apparently really just about python -m http.server and not about users of the library...

I've opened PR #657 with a similar kludgy implementation; feel free to just close everything if you don't care /shrug