systemd/python-systemd

systemd.journal.seek_tail() doesn't seem to

iam-TJ opened this issue · 4 comments

Whilst building a simple script to monitor a Distributed Replicated Block Device (DRBD) connection so we can trigger actions based on the link dropping and reconnecting we believe we've found a bug.
We use a select.poll() to monitor the journal waiting for a specific static kernel log message from the drbd module. poll() returns but when we call systemd.journal.get_next() it returns an empty dict object.
Added basic debug print into systemd/journal.py::get_next()

        r = super(Reader, self)._next(skip)
        print('Reader._next() returned ', r)
        if r:
            entry = super(Reader, self)._get_all()
            if entry:
                entry['__REALTIME_TIMESTAMP'] = self._get_realtime()
                entry['__MONOTONIC_TIMESTAMP'] = self._get_monotonic()
                entry['__CURSOR'] = self._get_cursor()
                return self._convert_entry(entry)
        return dict()

and we see:

(3, 1)]
Reader._next() returned  False
{}

We're using the example code from https://www.freedesktop.org/software/systemd/python-systemd/journal.html#example-polling-for-journal-events

Our code is:

#!/usr/bin/env python3
import sys
import select
from systemd import journal
j = journal.Reader()
j.seek_tail()
j.add_match('MESSAGE=' + sys.argv[1])
p = select.poll()
p.register(j, j.get_events())
print(p.poll(None))
print(j.get_next())

Called with:

# /etc/drbd.d/python3-monitor-connection.py "peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )"

Further investigation seems to show that journal.seek_tail() isn't seeking to the end of the journal at all. If we remove the journal.add_match(MESSAGE='...') but have _TRANSPORT='kernel', on repeated invocations we see the same messages from 5 days ago being reported.

Another user has used our script to independently confirm this same problem.

Appears this may be a long-standing issue in libsystemd.

systemd/systemd#17662

And mentioned in this example gist code:

https://gist.github.com/portante/ff7fb429c6f973aab377f7bb77b0ffdb#file-example_journal_reader-c-L22

...
  // For some reason, fetching previous then next avoids getting an initial
  // huge chunk of records.  If one calls sd_journal_seek_tail() and then
  // calls sd_journal_next() we receive the huge chunk.
  r = sd_journal_previous(j);
  if (r < 0) {
    fprintf(stderr, "Failed to get previous journal entry: %s\n", strerror(-r));
    return 1;
  }

  r = sd_journal_next(j);
...

Yeah, python-systemd is just a wrapper around the libsystemd code. It does do some extra stuff, e.g. conversions, but it doesn't change fundamental things like how seeking and entry retrieval works. So if there's something to fix here, it needs to be handled on the libsystemd side. Let's track this in the other bug.