systemd/systemd

PID1 getting stuck printing "systemd[1]: Time has been changed" continuously

utezduyar opened this issue · 60 comments

By setting the date to sometime in 2038, we can make systemd getting stuck printing systemd[1]: Time has been changed over and over again.

Seems like timerfd the manager creates has always an event for us to execute. I am not sure if this is a problem specific to our 32 bit setup (kernel / libc / systemd) or a generic problem. Would be nice if someone else can test it on their 32 bit setup.

[root@axis-00408cecff49 /mnt/flash/root]2394# timedatectl
      Local time: Mon 2015-01-19 04:15:27 
  Universal time: Mon 2015-01-19 04:15:27 UTC
        RTC time: Mon 2015-01-19 04:15:27
       Time zone: n/a (, +0000)
 Network time on: n/a
NTP synchronized: no
 RTC in local TZ: no
[root@axis-00408cecff49 /mnt/flash/root]2394# timedatectl set-time '2038-01-19 03:14'
[root@axis-00408cecff49 /mnt/flash/root]2394# timedatectl
      Local time: Tue 2038-01-19 03:14:02 
  Universal time: Tue 2038-01-19 03:14:02 UTC
        RTC time: Tue 2038-01-19 03:14:03
       Time zone: n/a (, +0000)
 Network time on: n/a
NTP synchronized: no
 RTC in local TZ: no
[root@axis-00408cecff49 /mnt/flash/root]2394# timedatectl
      Local time: Tue 2038-01-19 03:14:05 
  Universal time: Tue 2038-01-19 03:14:05 UTC
        RTC time: Tue 2038-01-19 03:14:06
       Time zone: n/a (, +0000)
 Network time on: n/a
NTP synchronized: no
 RTC in local TZ: no
[root@axis-00408cecff49 /mnt/flash/root]2394# timedatectl
      Local time: Tue 1965-07-20 06:36:29 
  Universal time: Tue 1965-07-20 06:36:29 UTC
        RTC time: Tue 1965-07-20 06:36:28
       Time zone: n/a (, +0000)
 Network time on: n/a
NTP synchronized: no
 RTC in local TZ: no

Here PID1 prints systemd[1]: Time has been changed over and over again.

[ 3215.290409] systemd[1]: Time has been changed
[ 3215.293361] systemd[1]: Time has been changed
[ 3215.297541] systemd[1]: Time has been changed
[ 3215.306810] systemd[1]: Time has been changed
[ 3215.309365] systemd[1]: Time has been changed
[ 3215.314226] systemd[1]: Time has been changed
[ 3215.317302] systemd[1]: Time has been changed
[ 3215.319286] systemd[1]: Time has been changed
[ 3215.322474] systemd[1]: Time has been changed
[ 3215.325552] systemd[1]: Time has been changed
[ 3215.329174] systemd[1]: Time has been changed

Well, on Linux 32bit, time_t is 32bit iirc, hence this cannot work afaics...

internally, systemd always calculates with 64bit values, but the kernel APIs might use something else.

Did you check what sizeof(time_t) prints on your arch?

sizeof(time_t) prints 4

IRC with Lennart:

poettering
1:21 umut: see my comment ob the issue
1:22 umut: note that the x32 arch changes time_t to be 64bit, only the old i386 still uses time_t as 32bit
1:22 umut: but you use "cris" or so, iirc?
1:22 umut: better check what time_t is defined to there
umut
1:23 time_t is 4 bytes
poettering
1:23 umut: https://lwn.net/Articles/643234/
1:24 umut: yeah, then you have a problem... you first need to fix your kernel
poettering
1:24 actually your whole abi...
umut
1:24 poettering, ok, so bottom line is time_t needs to be 8 bytes?
woky_
poettering
1:43 umut: yeah, i guess, if you want your dates to go > 2038, then yeah, time_t needs to grow to 64bit

Note that this bug also hits on 32bit MIPS and 32bit ARM...

Tjo!

I've found a workaround for this issue, but it requires patches in both kernel and systemd:
The systemd patch is to use relative time for the timerfd, and the kernel patch allows a TFD_TIMER_CANCEL_ON_SET on a relative timeout.

diff --git a/fs/timerfd.c b/fs/timerfd.c
index b94fa6c..8ec3aeb 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -134,7 +134,7 @@ static void timerfd_setup_cancel(struct timerfd_ctx *ctx, int flags)
 {
        if ((ctx->clockid == CLOCK_REALTIME ||
             ctx->clockid == CLOCK_REALTIME_ALARM) &&
-           (flags & TFD_TIMER_ABSTIME) && (flags & TFD_TIMER_CANCEL_ON_SET)) {
+           (flags & TFD_TIMER_CANCEL_ON_SET)) {
                if (!ctx->might_cancel) {
                        ctx->might_cancel = true;
                        spin_lock(&cancel_lock);

And in systemd:

diff --git a/src/core/manager.c b/src/core/manager.c
index a1f37bb..d444539 100644
--- a/src/core/manager.c
+++ b/src/core/manager.c
@@ -343,7 +343,7 @@ static int manager_setup_time_change(Manager *m) {
         if (m->time_change_fd < 0)
                 return log_error_errno(errno, "Failed to create timerfd: %m");

-        if (timerfd_settime(m->time_change_fd, TFD_TIMER_ABSTIME|TFD_TIMER_CANCEL_ON_SET, &its, NULL) < 0) {
+        if (timerfd_settime(m->time_change_fd, TFD_TIMER_CANCEL_ON_SET, &its, NULL) < 0) {
                 log_debug_errno(errno, "Failed to set up TFD_TIMER_CANCEL_ON_SET, ignoring: %m");
                 m->time_change_fd = safe_close(m->time_change_fd);
                 return 0;

I think it shouldn't be the case that once we reach the magic day, user space stops booting.

Especially as the kernel since 3.19 actually allows dates past 2038 from a battery-backed RTC, which could have a bad date due to not being initialized or by being maliciously set by an attacker to DOS the system.

We cannot work around a broken ABI in userspace. To fix the 2038 problem for good you need to update the ABI of your arch to provide a 64bit time_t. I will close this now, as I see nothing to fix here in systemd: when the ABI (in the kernel and in glibc) is updated to provide 64bit time_t then systemd should start working after a simple recompile (possibly adding a new define, a la -D_FILE_OFFSET_BITS=64, that is however currently not defined by glibc). Either way, systemd is fine with either 64bit or 32bit time_t, and the fixes really need to happen outside of systemd.

True, you may not be able to work around this issue, but systemd could do better than completely locking up printing error messages when it happens. The bug is that PID 1 completely locks up, not that the time is incorrect. I have the exact issue that @antijn describes with a misbehaving RTC, and it makes my system unbootable.

Could this be reopened with an emphasis on fixing the lockup, even though the time can't be fixed?

I've sent the patch for the kernel to Tomas Gleixner who's the maintainer, no response yet though. If he's at LinuxCon Europe in Dublin next week, I'm going to ask him in person. My plan was to resubmit my workaround for systemd when upstream Linux has queued the kernel change. Perhaps that can also be handled at LinuxCon if Lennart is there. Much easier to explain some issues in person...

@antijn, thank you. That is a good approach.

I am not at LinuxCon. Also, Gleixner is pretty aggressive towards me, so I am pretty sure I will not do anything on this issue.

Too bad, but anyway, since any solution depends on the relative timerfd support getting upstream in the kernel first, I'll keep on plugging at that front, and come back to you when the patch in systemd actually has a chance of working.

Hmmm, hit this bug while booting a previously working cubox-i (arm). I'm thinking the battery for the RTC went bad, the RTC battery in the cubox-i has a historically short life. I've replaced two out of three, this will make three out of three.

So yeah, I'd say this is of significant importance, if a RTC battery fail can cause a non-booting system.

I applied both patches and it resolved my rtc/fail-to-boot scenario.

@cbxbiker61 hmm? i doubt that a failing battery would put the rtc into the year 2038...

Hard to say, all I know is that:

  1. It consistently failed to boot, spamming the Time has been changed
  2. I applied the patches
  3. It booted fine and there was an error reading the /dev/rtc
  4. After the system time had stabilized i did a hwclock --systohc
  5. Next reboot the system seemed to read the rtc ok on boot

Either way, this is something to fix in the kernel. We will not question the time information returned by the kernel. If it is not correct and a driver returns rubbish, then the kernel should be fixed, not systemd. Also, time_t being 32bit is a kernel issue too, it's nothing we can work around in userspace.

rzr commented

If it helps community, let me share my experience on this annoying behavior, my ARMv5 NAS was stuck in this alert loop and then refused to boot (I got the logs using netconsole) because of faulty RTC.
To escape this situation I had to boot debian wheezy (without systemd) to be able to use ntpdate and hwclock -w and then I was able to boot jessie again.

Unfortunately, discussion on this has petered out on LKML with no clear solution, my patch is still in limbo. It is a real problem, and yes, it is a kernel problem, and it will only be completely solved when 32bit userspace is gone.

But it also a fact that the number of RTC models that fail in this fashion due to no/bad battery is increasing. There's probably also going to be more problems if manufacturers move towards using super-caps instead of batteries...

I just ran into this on a Solidrun Ltd I4P-300-D (Freescale i.MX6 Quad ARMv7). I'd appreciate something being done to ensure I don't need to manually intervene (occasionally) after a reboot occurs.

I didn't set my clock forward manually, so either:

  • the same message spam can be triggered due to another issue, Or
  • occasionally some automated process causes the time to jump forward.

To summarize: this isn't just something that affects people who manually set their clocks forward, it appears to affect normal users as well.

Can you try to reproduce that with a kernel that has RTC_HCTOSYS not set? My understanding of the issue is that the RTC subsystem properly sets a system time that is 64 bits then, because userspace and other part of the kernel are not ready to handle 64 bit time. Having userspace instead of the kernel setting the system time can solve that issue. If that is not the case, we have discussed adding an option to make sure the RTC subsystem doesn't set a time that doesn't fit on 32bits.

This just bit me hard after my CMOS battery died on an Asus x86 machine. I had to work around it with a variant of this: armbian/build#111 (comment)

Please, don't break the boot this badly on x86 machines! Even booting a LiveUSB and resetting the clock from under that environment didn't fix the issue.

I just hit this issue as well.

# date
Thu Nov 25 13:48:15 EWT 1943
# date +%s
-823673430

Is there really no way to detect and recover from a clearly bogus RTC date?

rzr commented

One workaround would be to force the date to be after time of the released software running on system... There werent any systemd in 1943 as far as know ;)

@amoskvin sounds like an undetected overflow bug in the kernel... Really, there's nothing to fix here in systemd, the kernel should really validate the time after reading it out of the RTC and makes sure the time doesn't overflow and is unreasonably low.

While the year 2038 issue certainly needs to be fixed in the kernel and glibc on 32 bit systems, maybe it would be useful to have systemd agree with the rest of the system on the size of time_t. This would probably solve this issue.
I also think that not setting RTC_HCTOSYS is sufficient enough but I didn't try to reproduce that particular issue.

@alexandrebelloni hmm? we use time_t as glibc defines it... systemd is not doing anything special there...

Again, because this appears hard to understand: there's nothing to fix in systemd: we use the same time_t as everybody else on the system, the way libc defines it. If your libc/kernel define it to 32bit, then you run into the 2038 problem, with systemd, and with everything else on the system.

@poettering The issue isn't that systemd should somehow divine the correct time when the RTC is wrong or support times past 2038 with a 32-bit time_t. It's that systemd completely locks up when it doesn't like the time it gets.

@fazjaxton nope. It's the kernel that is confused, not systemd. It continuously sends us notifications that the clock changed via our TFD_TIMER_CANCEL_ON_SET timerfd, as soon as the clock overflows. We hence start spinning processing these events.

This really needs to be fixed in the kernel, so that it doesn't continously notify us on clock overruns...

@poettering Ah, sorry. I guess both the kernel and I are confused. :) Thanks for the clarification.

@poettering yeah, I was confused by your first comment on the issue. This definitively needs fixing in the kernel, hopefully even before the 2038 issue is solved.

BTW, the two work-around patches @antijn posted above won't solve the issue in a way we could support out-of-the-box in systemd. Whether the kernel is patched or not cannot be detected, and thus we wouldn't know when to follow the old code paths in systemd, and when to follow the new ones. The new ones will only work with a patched kernel after all, and break things entirely on older kernels...

@poettering the problem is that systemd overtakes its role. When a whole system can boot and work reliably with a bad date, even to reach the point where it can ask for the proper date over the network, it should be allowed to do this. And systems have been reliably working this way for decades. You may not remember MS-DOS, but in its early versions the user was asked to enter the date at end of boot. And yes, a broken RTC can cause any invalid date to be reported and with regular systems it's not an issue. I'm switching to another distro because a debian 8 cannot boot on my board for this precise reason. Thank you for deliberately ignoring issues your software cause to end users.

@wtarreau again, the key of the issue is that when the clock overrun takes place timerfd with TFD_TIMER_CANCEL_ON_SET keeps firing continously and hence systemd busy loops. Its a kernel issue that this keeps firinG continously and needs to be fixed there. Systemd cares in no way if the clock is bogus, we just end up busy looping because of a fucked kernel interface.

So the root cause of this clearly is that the init system should not need to use timerfd in the first place. I don't know why systemd needs this while legacy init hasn't needed it for decades, but if there was a way to disable this dependency on such a recent (and possibly still broken) feature, it would definitely solve this issue.

rzr commented

If it's a kernel bug as @poettering suggested , if it's not done already can someone report and then link the report to this one, thanks.

Also downstream bugs are welcome too.

@rzr kernels have always and will always have bugs. It's important to be able to workaround them in userspace. As a lot of systems can boot regardless of this bug it seems reasonable to expect systemd to simply disable the non-working feature when it detects that it loops and only emit a message "kernel bug detected, fix your system". Anyway it's not my problem anymore I've switched away from Debian 8 and have no problems now.

@wtarreau well, we expect bugs to be fixed where they are, and not hacked around in systemd. Sorry.

Sure, let's never check any syscall's return code nor errno since the lower side may never fail and is expected to be perfect. Now I get a clearer view how this whole thing is designed, that's enough to scare me away from it. For me that's the difference between a quick proof-of-concept and production code.

@wtarreau hmm? we check all return codes and errnos here and pretty much everywhere else. Except, that in this case the kernel reports no error, it just keeps firing the timerfd. But anyway, I think this discussion is pointless. You apparently have no idea what you are talking about, but just want to vent frustration. Let's just stop this then here, and please find a different place to vent that, this is a bug tracker.

Just like the kernel disables spurious IRQs, systemd should disable spurious timerfd. There's no blackmagic here. Basically manager_clock_watch() could start with something more or less like this
(assuming it's the one responsible for this endless loop) :

   gettimeofday(&now, NULL);
   if (m->last_change == now.tv_sec) {
      m->change_count++;
   }
   else {
      if (m->change_count > 1)
         log("Time changed %d times in the last second\n", m->change_count);
      if (m->change_count >= 100) {
         log("Timerfd is broken on this system, disabling time change detection\n");
         manager_listen_stop(m);
         return 0;
      }
      m->change_count = 0;
      m->last_change = now.tv_sec;
   }

And yes it's a workaround for a kernel bug, just like kernel works around hardware bugs. But it allows end users who didn't chose systemd not the kernel version to boot a system with a dead battery.

I just ran into this problem on a brand new ARM based board embedded system with an RTC that had it's clock way out at year 2128 . The system was fine other than systemd keep spewing this message and wouldn't allow any user logins.
I totally have to agree that an otherwise perfectly operating system shouldn't be rendered useless just because of something so silly, wtarreau's suggestion sounds reasonable.

Can someone affected by the issue try to compile a kernel with that patch:
alexandrebelloni/linux@b7048bf
and activate that option.

I believe this will solve the issue however, I can't test as I'm not running systemd on 32bit platforms.

Note that I'm still not sure this is the correct way to go as unsetting RTC_HCTOSYS should have the same effect and I'm pretty sure no distribution actually need it but still activates it.

@alexandrebelloni, unless I'm mistaken, this will only fix the issue at boot but will not prevent the system from going down just afterwards, so it can even be worse in some cases. For example if your system boots at 1/1/2038, it will suddenly die in this endless loop 18 days later. The real issue is the loop itself. I agree that timerfd should not trigger in loops, but the fact that it's a known design limitation should be taken into account by the caller, especially since this code is only there to detect time changes. Another solution based on your patch could be to tell systemd to stop using timerfd if the time is close to this limit. You may also make timerfd return an error if the current time is past this limit, but I believe that in this case systemd will loop on timerfd errors (and that doesn't solve the issue of running userland on existing boards shipping with their own kernels working fine with other distros).

@wtarreau: for systemd, this will indeed only solve the issue at boot but this also solve other issues with other parts of userspace so I will probably queue it for v4.6. Anyway in the current state, 32bit platforms will fail on 19/1/2038. At least, this solves the current issue.

I'll switch the default to n once libc and critical applications have switched to a 64bit time.

Note that I've tried to reproduce the issue and I can't reproduce the infinite loop. It loops for a while but the boot process seems to continue.

On Sat, Feb 20, 2016 at 05:14:03AM -0800, Alexandre Belloni wrote:

@wtarreau: for systemd, this will indeed only solve the issue at boot but
this also solve other issues with other parts of userspace so I will probably
queue it for v4.6.

I agree.

Anyway in the current state, 32bit platforms will fail on 19/1/2038.

That's precisely the problem I was discussing. I think it doesn't leave
enough room to ensure reliable operations for a while. In the worst case
the system will fail 5 minutes after booting. Maybe you should instead
introduce the offset as a config argument so that users can choose how
far from the end of the world they want to be. For example some may
decide that anything 2037 and above is an error so that it grants them
at least one year of undisturbed uptime.

At least, this solves the current issue.

It depends if my system boots more than 5 minutes before wrapping or not :-)
If it's only 10 minutes before wrapping, your patch will not catch it and
my system will die 10 minutes after boot.

I'll switch the default to n once libc and critical applications have
switched to a 64bit time.

Note that I've tried to reproduce the issue and I can't reproduce the
infinite loop. It loops for a while but the boot process seems to continue.

Interesting. In my case the loop happens on the serial port, so the messages
are printed slowly and that takes ages. Also Lennart suggested that timerfd
keeps ringing (which I think is true). Maybe something fixes the system time
at some point on your system (ntp etc) ?

Willy

On 20/02/2016 at 05:46:43 -0800, Willy Tarreau wrote :

Anyway in the current state, 32bit platforms will fail on 19/1/2038.

That's precisely the problem I was discussing. I think it doesn't leave
enough room to ensure reliable operations for a while. In the worst case
the system will fail 5 minutes after booting. Maybe you should instead
introduce the offset as a config argument so that users can choose how
far from the end of the world they want to be. For example some may
decide that anything 2037 and above is an error so that it grants them
at least one year of undisturbed uptime.

I don't believe any serious person will let its system run one year with
a bogus time. I'm not trying to solve the case when we are really in
2038.

At least, this solves the current issue.

It depends if my system boots more than 5 minutes before wrapping or not :-)
If it's only 10 minutes before wrapping, your patch will not catch it and
my system will die 10 minutes after boot.

Yeah but it will have 10 minutes to get the real date and time. Which
will hopefully be enough for NTP. I can make that one day but it may
hide that issue during development. I don't think being able to
configure the offset adds any value to that workaround.

Interesting. In my case the loop happens on the serial port, so the messages
are printed slowly and that takes ages. Also Lennart suggested that timerfd
keeps ringing (which I think is true). Maybe something fixes the system time
at some point on your system (ntp etc) ?

I'm pretty sure it was not ntp as the board didn't have any network
connectivity. I used a debootstrapped debian stretch. I'm not sure if they
have another mechanism to set the time (I know that poky will set the time to
the build date). I'll send the patch so that we can discuss this on the
mailing list.

Alexandre Belloni

Again, please fix the kernel to handle this less awfully. The kernel is borked here, systemd is not. If the option is to fix it properly in the kernel or work around in systemd, then the former is the right solution. Both are Open Source projects, hence there's really no reason to work around borkedness in other open source software.

If you have hardware that comes up with the clock set to 2038 on an empty battery, then that too sounds like something the kernel should fix...

@poettering I agree that the kernel needs fixing and I'm trying to work on that.
@wtarreau What is your failing board? What RTC does it have?

From the reports I see on this bug, this is mainly the pcf8523 (present on the cubox-i) and I've spotted an issue in its driver. Can people affected tell which RTC they are using?

I'd like to get more reports and I would prefer those reports to be on the rtc mailing list:
https://groups.google.com/forum/#!forum/rtc-linux

rzr commented

My lsproduo ships rs5c372a, I'll may write a longer report to list. once I am able to hack on it.

On Sat, Feb 20, 2016 at 03:40:43PM -0800, Alexandre Belloni wrote:

@poettering I agree that the kernel needs fixing and I'm trying to work on that.
@wtarreau What is your failing board? What RTC does it have?

A clearfog A1. I suspect the RTC is not yet well supported as it reports
year 2100 something. I have not investigated much since, I've switched
to another distro in the mean time. I still don't know what RTC runs on
this board as I really have no care at all for the time being correct
on a test platform, the RTC being among the things I usually disable
for a faster boot time.

Regarding your point about 5-10mn being enough for an NTP request, yes
that's a good point indeed. It can be a bit short if there's a large
fsck to run. Probably that a huge warning message should be printed
when this happens so that the admin has a chance to see it if the
system takes time to finish to boot.

Willy

The one instance I've seen this is on a custom board with a PCF8523 and it wasn't a dead battery. It appears that the first time the part is turned on the date and time is random. In this case the year register was set to 0x80.

Matt S.

I have an old EeePC T91MT 32bit laptop with no battery that sets the date to 2060 for whatever reason.

At least now I know I just need to set the clock in the bios and stop yanking out the power cable.

On Tue, Feb 23, 2016 at 02:50:30AM -0800, David C. Bishop wrote:

I have an old EeePC T91MT 32bit laptop with no battery that sets the date to 2060 for whatever reason.

At least now I know I just need to set the clock in the bios and stop yanking out the power cable.

Please also report the bug to your distro vendor, otherwise whatever the
final fix, there's no chance it will get backported if nobody reports the
issue or if it happens to affect too few users.

Willy

Seems to be resolved in Debian Testing's kernel 4.6 for the OLinuXino A20 Micro. So, save off your data and upgrade your kernel, if you're running Debian.

On my A20 hardware, a previous install of FBX-0.7 failed to boot with the "Time has been changed" error, while the current FBX-0.9 succeeds.

I didn't push it upstream but I don't know what vendors are doing with their kernel.

If it is not correct and a driver returns rubbish, then the kernel should be fixed, not systemd

https://en.wikipedia.org/wiki/Defensive_programming

Difference is that old school init is able to bring up the system to usable state, while systemd doesn't but ends up in a limbo.