wowczarek/ptpd

Changing active slave in bond causes high cpu usage

Closed this issue · 6 comments

Hi Wojciech,

Please, take a look if you get a chance. I tested this fork with and without HW support. HW works fine, but without HW when I change active slave in bond0 it stucks in ussing high CPU on the selected core.
OS: Debian 9.2, kernel: 4.9.0-4-amd64

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: pri0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: pri0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:b6:fd:87:2d:1f
Slave queue ID: 0

Slave Interface: sec0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:b6:fd:87:2d:20
Slave queue ID: 0

2017-12-04 05:00:19.124103 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.124115 ptpd[2970].bond0 (debug1) (slv) netRecvEvent: Flushed errqueue
2017-12-04 05:00:19.126336 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.126349 ptpd[2970].bond0 (debug1) (slv) ==> DelayResp message received, sequence 309
2017-12-04 05:00:19.126359 ptpd[2970].bond0 (debug1) (slv) ==> Handle DelayResp (309)
2017-12-04 05:00:19.126370 ptpd[2970].bond0 (debug1) (slv) Ignored DelayResp sequence 309 - wasn't waiting for one
2017-12-04 05:00:19.142836 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.142850 ptpd[2970].bond0 (debug1) (slv) ==> Sync message received, sequence 26307
2017-12-04 05:00:19.142862 ptpd[2970].bond0 (debug1) (slv) UpdateOffset: max delay hit: 0
2017-12-04 05:00:19.157694 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.157710 ptpd[2970].bond0 (debug1) (slv) Ignored message Sync received from 0 domain
2017-12-04 05:00:19.174081 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.174097 ptpd[2970].bond0 (debug1) (slv) ==> Sync message received, sequence 26308
2017-12-04 05:00:19.174109 ptpd[2970].bond0 (debug1) (slv) UpdateOffset: max delay hit: 0
2017-12-04 05:00:19.174120 ptpd[2970].bond0 (debug1) (slv) tuning outlier filter: delayMS
2017-12-04 05:00:19.174129 ptpd[2970].bond0 (debug1) (slv) delayMS filter autotune: counter 10, samples 10, outliers, 0 percentage 0, new threshold 1.50
2017-12-04 05:00:19.189049 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.189065 ptpd[2970].bond0 (debug1) (slv) Ignored message Sync received from 0 domain
2017-12-04 05:00:19.205329 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.205343 ptpd[2970].bond0 (debug1) (slv) ==> Sync message received, sequence 26309
2017-12-04 05:00:19.205355 ptpd[2970].bond0 (debug1) (slv) UpdateOffset: max delay hit: 0
2017-12-04 05:00:19.209316 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.209332 ptpd[2970].bond0 (debug1) (slv) ==> DelayReq message received, sequence 49305
2017-12-04 05:00:19.209342 ptpd[2970].bond0 (debug1) (slv) delayReq message received :
2017-12-04 05:00:19.209352 ptpd[2970].bond0 (debug1) (slv) ==> DelayResp message received, sequence 49305
2017-12-04 05:00:19.209362 ptpd[2970].bond0 (debug1) (slv) HandledelayResp : delayResp doesn't match with the delayReq.
2017-12-04 05:00:19.220408 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.220423 ptpd[2970].bond0 (debug1) (slv) Ignored message Sync received from 0 domain

echo 'sec0' > /sys/class/net/bond0/bonding/active_slave

2017-12-04 05:00:19.227626 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227639 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227651 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227661 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227674 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227690 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227702 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227712 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227724 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227734 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227745 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227755 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227766 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227776 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227787 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227797 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227808 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227818 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227829 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227839 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227850 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227860 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length
2017-12-04 05:00:19.227871 ptpd[2970].bond0 (debug1) (slv) handle: something
2017-12-04 05:00:19.227881 ptpd[2970].bond0 (debug1) (slv) Error: message shorter than header length

Hi,

This could be to do with Linux software TX timestamping and a TX timestamp being stuck in the socket error queue. The last commit from today is a potential fix.

You could alternatively force it to use the standard socket transport, since you are using software timestamping:

ptpengine:transport_implementation=socket_udpv4

To see all available transports, use:

ptpd -e ptpengine:transport_implementation

You could also try the libpcap transport (pcap_udpv4) if you have the libpcap dev package installed such as libpcap-dev, pcap-devel etc.

Hi,

Thanks for the answer, I tried the fix, but it said "No full SO_TIMESTAMPING software timestamp support" and fell back to udp socket transport which of course works.

Does this mean that the following combination is not supported?
ptpengine:interface = bond0
ptpengine:transport_implementation = linuxts_udpv4
ptpengine:hardware_timestamping = n

2018-03-12 10:43:57.740095 ptpd[32609].(info) transport: detectTTransport(bond0): probing for usable IPv4 transports
2018-03-12 10:43:57.740171 ptpd[32609].(error) testLinuxTimestamping(bond0): No full SO_TIMESTAMPING software timestamp support
2018-03-12 10:43:57.740188 ptpd[32609].(info) transport: detectTTransport(bond0): selected 'socket_udpv4'
2018-03-12 10:43:57.740901 ptpd[32609].(notice) ttransport.socket_udpv4: Transport 'Event' (bond0) started
2018-03-12 10:43:57.741011 ptpd[32609].(notice) ttransport.socket_udpv4: Transport 'General' (bond0) started

Thanks,
Sz.

Hi,

Looks like this configuration is not supported in your case. Likely the NIC driver does not fully implement SO_TIMESTAMPING in software. The only difference between this and the regular sockets, is transmission timestamps. It just so happens that the linuxts transports handle both hardware and software timestamps with SO_TIMESTAMPING. The hardware_timestamping=n option controls this. But since software timestamps are what you are after, you can just use the socket_udpv4 transport (it uses the "best" socket timestamp option available i.e. SO_TIMESTAMP or SO_TIMESTAMPNS). If you are using multicast, then the TX timestamps are acquired by looping multicast into the socket (it's a socket option). For unicast, ptpd manually "loops" the packets by sending them to itself, to estimate the TX timestamp. Not great, but works. Otherwise, the libpcap-based transports are a good way to go about it.

Thanks,
Wojciech

Hi Wojciech,

First of all thanks for dealing so long with my case. The card I used for testing and reproducing this issue was quite a widespread one (Intel X520). According to ethtool it supports SW timestamping.

04:00.0 Ethernet controller: Intel Corporation 82599 10 Gigabit Dual Port Backplane Connection (rev 01)

ethtool -T pri0
Time stamping parameters for pri0:
Capabilities:
hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
off (HWTSTAMP_TX_OFF)
on (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
none (HWTSTAMP_FILTER_NONE)
ptpv1-l4-sync (HWTSTAMP_FILTER_PTP_V1_L4_SYNC)
ptpv1-l4-delay-req (HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ)
ptpv2-l4-event (HWTSTAMP_FILTER_PTP_V2_L4_EVENT)
ptpv2-l4-sync (HWTSTAMP_FILTER_PTP_V2_L4_SYNC)
ptpv2-l4-delay-req (HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ)
ptpv2-l2-event (HWTSTAMP_FILTER_PTP_V2_L2_EVENT)
ptpv2-l2-sync (HWTSTAMP_FILTER_PTP_V2_L2_SYNC)
ptpv2-l2-delay-req (HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ)
ptpv2-event (HWTSTAMP_FILTER_PTP_V2_EVENT)
ptpv2-sync (HWTSTAMP_FILTER_PTP_V2_SYNC)
ptpv2-delay-req (HWTSTAMP_FILTER_PTP_V2_DELAY_REQ)

IXGBE driver version is 5.2.4.

I took a quick experiment now without bonding just to see:
ptpengine:interface = pri0
ptpengine:transport_implementation = linuxts_udpv4
ptpengine:hardware_timestamping = n

With the combination above "detectTTransport" seemingly can detect SOF_TIMESTAMPING, but after started up it continuously receives empty PTP messages.

2018-03-13 11:30:55.049693 ptpd[2652].(info) transport: detectTTransport(pri0): selected 'linuxts_udpv4'
2018-03-13 11:30:55.049894 ptpd[2652].(info) initTimestamping(pri0): Linux software timestamping enabled
2018-03-13 11:30:55.050186 ptpd[2652].(notice) ttransport.linuxts_udpv4: Transport 'Event' (pri0) started
2018-03-13 11:30:55.050307 ptpd[2652].(notice) ttransport.linuxts_udpv4: Transport 'General' (pri0) started
2018-03-13 11:30:55.050325 ptpd[2652].(debug1) clock: Clock driver syst configured
2018-03-13 11:30:55.050345 ptpd[2652].(debug1) clock: Clock driver syst configured
2018-03-13 11:30:55.050355 ptpd[2652].(debug1) clock: Clock driver syst configured
2018-03-13 11:30:55.050365 ptpd[2652].(debug1) state PTP_INITIALIZING
2018-03-13 11:30:55.050372 ptpd[2652].(debug1) manufacturerOUI: ff:ff:ff
2018-03-13 11:30:55.050378 ptpd[2652].(debug1) initData
2018-03-13 11:30:55.050387 ptpd[2652].(debug1) setPtpClockIdentity() called
2018-03-13 11:30:55.050394 ptpd[2652].(debug1) initClock
2018-03-13 11:30:55.050550 ptpd[2652].(debug1) State change request from PTP_POWERUP to PTP_INITIALIZING
2018-03-13 11:30:55.050569 ptpd[2652].(notice) Now in state: PTP_INITIALIZING
2018-03-13 11:30:55.261294 ptpd[2652].(debug1) state PTP_LISTENING
2018-03-13 11:30:55.261327 ptpd[2652].(debug1) Timer ANNOUNCE_RECEIPT started at 12.000
2018-03-13 11:30:55.261337 ptpd[2652].(debug1) State change request from PTP_POWERUP to PTP_LISTENING
2018-03-13 11:30:55.261358 ptpd[2652].(notice) Now in state: PTP_LISTENING
2018-03-13 11:30:56.076815 ptpd[2652].(debug1) Received an empty PTP message (this should not happen) - ignoring
2018-03-13 11:30:56.080547 ptpd[2652].(debug1) Timer STATUSFILE_UPDATE expired
2018-03-13 11:30:56.080588 ptpd[2652].(debug1) Timer STATUSFILE_UPDATE started at 1.000
2018-03-13 11:30:56.107662 ptpd[2652].(debug1) Timer ALARM_UPDATE expired
2018-03-13 11:30:56.107674 ptpd[2652].(debug1) updateAlarms()
2018-03-13 11:30:56.458920 ptpd[2652].(debug1) Received an empty PTP message (this should not happen) - ignoring
2018-03-13 11:30:56.458940 ptpd[2652].(debug1) Ignored message DelayResp received from 1 domain
2018-03-13 11:30:56.468275 ptpd[2652].(debug1) Received an empty PTP message (this should not happen) - ignoring
2018-03-13 11:30:56.471253 ptpd[2652].(debug1) Received an empty PTP message (this should not happen) - ignoring
2018-03-13 11:30:56.471268 ptpd[2652].(debug1) Ignored message DelayResp received from 1 domain
2018-03-13 11:30:56.474914 ptpd[2652].(debug1) Received an empty PTP message (this should not happen) - ignoring
2018-03-13 11:30:56.487185 ptpd[2652].(debug1) Received an empty PTP message (this should not happen) - ignoring

Anyway, thanks.
Szilard

OK, I see what is happening:

  • When using hardware timestamping, each bond member is queried for hw timestamp capabilities, and h/w timestamping is enabled on the physical bond members, because the bondX interface itself reports no h/w timestamping flags. Also the clocks on all bond members have to be synchronised to each other.
  • When using the linuxts transports with software timestamping on a physical interface, all SW timestamping flags are reported and timestamping is initialised correctly (but seems to be broken for you)
  • However, when attempting to use linuxts with software timestamping on a bonded interface, bond members are not inspected, because it will only drive the system clock.

So basically it is an expected error. However the socket_udpv4 transport still works OK, so I don't think there really is an issue. You just end up using a different transport - as far as performance is concerned, this only has any importance when you are using unicast, because of the following:

The only "special" thing the SOF_TIMESTAMPING_xx_SOFTWARE capabilities add over standard SO_TIMESTAMPNS is delivering packet transmission timestamps. When using the socket_xxx transports and multicast, the TX timestamps are instead acquired using SO_MULTICAST_LOOP - socket option that loops sent mcast packets back into the socket. This works very similarly to software TX timestamps. Only for unicast the TX timestamps are manually copied into the socket, which is not great but still delivers. Otherwise you can build ptpd with libpcap and use the pcap_xxx transports for software timestamping.

  • The empty messages are a result of software TX timestamps not being received, so most likely the NIC driver has issues with this. Many do - not many people use SO_TIMESTAMPING in software when hardware is available. Which only confirms that using socket_xxx or pcap_xxx transport is a safer bet in your case, if you absolutely need to do s/w time sync.

Do you have a specific need to use software timestamps vs. hardware timestamps? PTPd will still sync the system clock to the NIC for you. Using h/w timestamping is almost always a better option if available.

Thanks,
Wojciech

Hi Wojciech,

The background is majority of the NICs supported HW and I was really satisfied with hwtest branch (still using it). But some of old NICs had problem with HW support and I switched them to SW timestamping using the same hwtest branch, where I noticed the above mentioned high CPU usage after I changed active slave in bond.

Then I saw you had a possible correction and I just would have liked to try and confirm it is indeed a resolution to the problem, when I noticed active development moved to libcck branch.

Unfortunately, in the meantime I swapped those old cards out, so I was not able to try libcck on them, which was the only reason I tried using SW timestamping on a card with HW capability just to reproduce the issue with libcck as well, what I noticed originally with hwtest.

But I failed reproducing the "same" setup with libcck what I had with hwtest most likely due to my IXGBE driver. I really do not want to waste your time, I am more than satisfied with decent accuracy and stability of your HW feature. And if I ever will need SW somewhere once again the socket timestamping is also going to be fine as I use multicast.

Thanks again for your help.
Szilard