idlesign/systemd-logging

Format-specifiers in input cause segfault

Closed this issue · 11 comments

This issue is related to #3, but for different inputs. There might be more cases in which the library is called with wrong inputs.

For the input below adding a c format specifier to the exception message causes a segfault when logging that exception.
This is harder to reproduce with format specifiers like %d, but seems to always happen for %s.

Does it happen with the latest version? Yes (1.0.0).

How to reproduce:

#!/usr/bin/env python3
import logging
from systemdlogging.toolbox import SystemdFormatter, SystemdHandler

systemd_handler = SystemdHandler()
systemd_handler.syslog_id = "test"
systemd_handler.setFormatter(SystemdFormatter())
logging.getLogger().addHandler(systemd_handler)

try:
    raise Exception("Hi %s")  # <- the %s causes the crash
except:
    logging.exception("hi")

print("Exception was handled correctly.")

Actual output: Segmentation fault (core dumped)

Expected output: Exception was handled correctly.

Thank you for the report.

Will try to look into it in a week.

Quick fix is in master. Feel free to try it out.

Thanks!

I'm wondering if it is possible and makes sense to escape the percent signs instead of stripping them (c uses %% to escape percent signs).

I'm wondering if it is possible and makes sense to escape the percent signs instead of stripping them (c uses %% to escape percent signs).

It's possible offcourse but that'd take more resources than brutal stripping.

I would assume, given that this behavior has only been reported now, that the common case is no percentage signs in the the message. In that case the replacement takes the same amount of time. If there are percentage signs, of course replacing it with the empty string is faster, but we are talking about nanoseconds here.

$ python3 -m timeit '"Hi %f there %d %s %p %% %x".replace("%", "").encode()'
2000000 loops, best of 5: 154 nsec per loop
$ python3 -m timeit '"Hi %f there %d %s %p %% %x".replace("%", "%%").encode()'
2000000 loops, best of 5: 176 nsec per loop
$ python3 -m timeit '"Hi nf there nd ns np nn nx".replace("%", "").encode()'
5000000 loops, best of 5: 79.2 nsec per loop
$ python3 -m timeit '"Hi nf there nd ns np nn nx".replace("%", "%%").encode()'
5000000 loops, best of 5: 79 nsec per loop

I was thinking that in some cases removing the % in the error message could cause confusion. For instance if we do a remainder by zero and we raise the (fictional) error ValueError("Cannot compute iterations % foo"), then we really want to have the % in the message.

Alternatively, may I suggest:

args = {
    "MESSAGE": msg,
    'PRIORITY': PRIORITY_MAP.get(level, level),
}

args.update(context or {})

return send(
    *[arg for key, val in args.items() for arg in (b"%s=%s", str(key).encode(), str(val).encode())],
    None
) == 0

You would have to do profiling to see if this is more efficient, but not doing string interpolation might be faster.

Ah, I thought you're taking about escaping certain specifiers like %s, %d with regexp.
Sure we can do %%. I'll return to it a little bit later.

In master.

Thanks! This works for me.

1.0.1 is out. Closing this.