dotnet/reactive

Failed to start monitoring system clock changes

ButchersBoy opened this issue · 8 comments

Occasionally see exception message "Failed to start monitoring system clock changes" and application termination on desktop PCs making heavy utilization of RX.

I really don't want to write this up again; full explanation is here:

https://dragablz.net/2015/06/09/dealing-with-rx-exception-failed-to-start-monitoring-system-clock-changes/

In summary PeriodicTimerSystemClockMonitor may occasionally "time out" and throw the error.

It seems with RX v3 the work around described (and successfully running in our production instances) will no longer be possible.

So can the code in PeriodicTimerSystemClockMonitor be re-assessed?

At the very least, it seems like we need to provide a mechanism that exposes the GetService<T> to consumers to enable overrides. The mechanism shouldn't require any overrides -- perhaps a RegisterService<T> call instead?

The ability to set PlatformEnglightenmentProvider.Current has been restored with an Obsolete attribute attached. It is a strong goal of vNext to remove this mechanism all together, so I'm keeping this issue open to track the underlying need.

Thank you.

Would love to see the underlying issue fixed, it is a big problem that using Observable.Timer under heavy load causes the whole service to crash because of the InvalidOperationException being thrown in PeriodicTimerSystemClockMonitor.NewTimer after 100 retries.

@baltie do you think the changes mentioned in this blog post are sufficient?

The only reservation I have is this comment from @ButchersBoy in the above post:

The solution requires a message pump (which apparently is why RX doesn’t use this method by default).

We are currently talking about the scope for v4, and changing the default behaviour of the internals (especially around timing) might be possible.

We are using Rx extensively throughout our system, also in some Windows services that doesn't include a message pump, so no, it wouldn't be sufficient.

We've made our own, custom, PeriodicTimerSystemClockMonitor that basically removes the retry mechanism and just informs what the difference were no matter the size when a new timer is launched. It works for our purposes and it's better than having the system die on us because of heavy load.

Moving to vNext so we can get 4.0 out asap.

The PeriodicTimerSystemClockMonitor is pretty convoluted and possibly inconsistent/racy:

  • It restarts the only timer upon a new event listener arrives to SystemClockChanged.
  • It cancels the only timer if any of the listener gets removed.
  • Has field _lastTime accessed by NewTimer and TimerChanged possibly concurrently, overwriting each other's values.
  • Does an event in general run add/remove under a lock, mutually excluding NewTimer with each other (but not with TimerChanged)?
  • Why is there an eager retry and not some backoff strategy?