Is it better to sync time asynchronously?
Closed this issue · 4 comments
bosh-agent/platform/linux_platform.go
Lines 548 to 550 in de93f41
Is it better if bosh-agent sync time asynchronously and continue do other setups?
Historically, a primary rationale for this behavior is to try and ensure the system is starting from a known and shared point in time.
A long while ago, there had been some instances where VMs were starting with incorrect times from hosts which then causes issue with director and TLS communications. Best effort synchronous behavior helps avoid that, and also helps avoid time changes occurring during or after job processes have started.
Are you seeing any particular issues with this approach that we can help with?
Thanks @dpb587-pivotal . From my understanding, best effort synchronous behavior means that bosh-agent doesn't care if sync-time fails. If there are issues with director and TLS communications when time is incorrect, then sync-time should not be a best effort behavior. Please forgive me if my understanding is not correct.
The below is the issue I hit:
Background:
If the VM is associated with Azure Standard Load Balancer, a load balancing rule for UDP (any port) is needed to trigger the UDP SNAT programming. (See: https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#preallocatedports)
Example 1 (ubuntu-trusty):
Without the LB rule, sync-time
(aka ntpdate
in ubuntu-trusty) will fail to connect to ntp server. VMs may start with incorrect times. Since ntpdate
doesn't retry, so the failure doesn't block bosh-agent, and bosh-agent continues to do other setups. The behavior is fine for me.
Example 2 (ubuntu-xenial):
Without the LB rule, sync-time
(aka chronyc
in ubuntu-xenial) also fails to connect to ntp server. But chronyc
retry for 10 times, which cost 100 seconds. It slows down the bosh-agent booting. If bosh-agent sync time asynchronously, bosh-agent can continue to do other setups.
Of course, adding LB rules for UDP port can make sync-time
work and avoid the 100s. But for those who doesn't want to or forget to add these rules. they will suffer extra 100s bosh-agent booting time. That's why I raised this issue. From your comments, it seems that synchronous behavior is preferred and the LB rules should be added to make sync-time
work to avoid potential issues.
I see - thank you for the additional details.
In theory, it might be nice to sync time only if we detect problems or can guess there may be time issues. But in practice, that is a bit difficult to catch all possible errors in the various components where those can be seen. So, currently it is best effort.
I agree the 100 second delay is not optimal. If possible, it would be preferable to ensure UDP traffic from the VM can be used. It sounds like load balancer rules can be adjusted somewhere to make that possible?
Yes, load balancer rules for UDP can make that possible.