[202405][route]: flush_removing_entries: EntityBulker.flush remove entries failed, number of entries to remove: 1, status: SAI_STATUS_ITEM_NOT_FOUND
Opened this issue · 6 comments
Description
The bug happens when SWSS
is busy and task queue accumulates a several route configuration tasks one of which can cause a configuration failure (e.g., duplicate route).
When that happens, the rest of the good routes are being skipped from configuration, thus are missing in HW.
Further removal attempts of failed routes will lead to SAI errors like SAI_STATUS_ITEM_NOT_FOUND
.
Configuration is applied:
2024 Aug 9 18:09:10.952721 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101', 'SET', (('NULL', 'NULL'),))'
2024 Aug 9 18:09:10.961216 sonic DEBUG bgp#staticroutebfd: Received message : '('PortChannel101', 'SET', (('NULL', 'NULL'),))'
2024 Aug 9 18:09:10.961216 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'SET', (('NULL', 'NULL'),))'
2024 Aug 9 18:09:10.962915 sonic DEBUG bgp#staticroutebfd: Received message : '('PortChannel101|30.1.0.1/24', 'SET', (('NULL', 'NULL'),))'
2024 Aug 9 18:09:10.962915 sonic DEBUG bgp#staticroutebfd: get ip from intf key: valid True is_ipv4 True, if_name PortChannel101 ip 30.1.0.1
2024 Aug 9 18:09:10.970778 sonic INFO swss#supervisord: orchagent
2024 Aug 9 18:09:10.971098 sonic INFO swss#supervisord: intfmgrd sysctl: cannot stat /proc/sys/net/mpls/conf/PortChannel101/input: No such file or directory
2024 Aug 9 18:09:10.972285 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101', 'SET', (('vrf', ''),))'
2024 Aug 9 18:09:10.980776 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'SET', (('state', 'ok'),))'
2024 Aug 9 18:09:11.297308 sonic NOTICE swss#swssconfig: :- main: Loading config from JSON file:/ro.json...
2024 Aug 9 18:09:12.390991 sonic NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.61@PortChannel103,10.0.0.63@PortChannel104
2024 Aug 9 18:09:12.393372 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[5048]- mlnx_remove_next_hop_group_members: Remove NEXT_HOP_GROUP_MEMBER [OID:0x490000002D] [ID:1]
2024 Aug 9 18:09:12.393774 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[5048]- mlnx_remove_next_hop_group_members: Remove NEXT_HOP_GROUP_MEMBER [OID:0x90000002D] [ID:0]
2024 Aug 9 18:09:12.398391 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[1502]- mlnx_remove_next_hop_group: Remove NEXT_HOP_GROUP [OID:0x800000005] [ID:0]
2024 Aug 9 18:09:12.418011 sonic INFO swss#supervisord: orchagent
2024 Aug 9 18:09:12.420640 sonic NOTICE swss#orchagent: :- addIp2MeRoute: Create IP2me route ip:30.1.0.1
2024 Aug 9 18:09:12.424444 sonic NOTICE syncd#SDK: [SAI_NEIGHBOR.NOTICE] ./src/mlnx_sai_neighbor.c[154]- mlnx_create_neighbor_entry: Create NEIGHBOR_ENTRY: #0 DST_MAC_ADDRESS=[54:54:00:ad:48:00]
2024 Aug 9 18:09:12.425279 sonic NOTICE syncd#SDK: [SAI_NEIGHBOR.NOTICE] ./src/mlnx_sai_neighbor.c[240]- mlnx_create_neighbor_entry: Created NEIGHBOR_ENTRY 30.1.0.2RIF:0x4100000006
2024 Aug 9 18:09:12.426282 sonic NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 30.1.0.2, 54:54:00:ad:48:00 on PortChannel101
2024 Aug 9 18:09:12.427578 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP.NOTICE] ./src/mlnx_sai_nexthop.c[1183]- mlnx_create_next_hop: Create NEXT_HOP: #0 TYPE=IP sonic-net/sonic-buildimage#1 IP=30.1.0.2 sonic-net/sonic-buildimage#2 ROUTER_INTERFACE_ID=ROUTER_INTERFACE [OID:0x4100000006] [Type:DEFAULT, ID:1]
2024 Aug 9 18:09:12.428373 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP.NOTICE] ./src/mlnx_sai_nexthop.c[1385]- mlnx_create_next_hop: Created NEXT_HOP [OID:0x900000004] [ecmp:0x9]
2024 Aug 9 18:09:12.429760 sonic NOTICE swss#orchagent: :- addNextHop: Created next hop 30.1.0.2 on PortChannel101
2024 Aug 9 18:09:12.432419 sonic NOTICE swss#orchagent: :- doTask: Get port state change notification id:1000000000029 status:1
2024 Aug 9 18:09:12.432739 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet64 oper state set from down to up
2024 Aug 9 18:09:12.434173 sonic DEBUG lldp#lldpmgrd[35]: Running command: '['lldpcli', 'configure', 'ports', 'Ethernet68', 'lldp', 'portidsubtype', 'local', 'etp18a', 'description', 'ARISTA03T1:Ethernet1']'
2024 Aug 9 18:09:12.434225 sonic DEBUG lldp#lldpmgrd[35]: Running command: '['lldpcli', 'configure', 'ports', 'Ethernet64', 'lldp', 'portidsubtype', 'local', 'etp17a', 'description', 'ARISTA01T1:Ethernet1']'
2024 Aug 9 18:09:12.436481 sonic NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status UP to host interface Ethernet64
2024 Aug 9 18:09:12.436816 sonic NOTICE swss#orchagent: :- publish: EVENT_PUBLISHED: {"sonic-events-swss:if-state":{"ifname":"Ethernet64","status":"up","timestamp":"2024-08-09T15:09:12.436579Z"}}
2024 Aug 9 18:09:12.438549 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[1929]- set_dispatch_attrib_handler: Set OPER_STATUS, key:HOSTIF [OID:0x1C0000000D] [hostif_db[28]], val:true
2024 Aug 9 18:09:12.440293 sonic NOTICE swss#orchagent: :- doTask: Ethernet64 oper speed is 50000
2024 Aug 9 18:09:12.448656 sonic ERR swss#orchagent: :- meta_sai_validate_route_entry: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"10.1.0.32/32","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} already exists
2024 Aug 9 18:09:12.449134 sonic ERR swss#orchagent: :- flush_creating_entries: EntityBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_ITEM_ALREADY_EXISTS
2024 Aug 9 18:09:12.449590 sonic ERR swss#orchagent: :- addRoutePost: Failed to create route 10.1.0.32/32 with next hop(s) 30.1.0.2@PortChannel101
2024 Aug 9 18:09:12.450748 sonic ERR swss#orchagent: :- addRoutePost: Failed to create route 30.1.0.0/24 with next hop(s) 0.0.0.0@PortChannel101
Configuration is removed:
2024 Aug 9 18:10:06.257036 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'DEL', ())'
2024 Aug 9 18:10:06.257523 sonic DEBUG bgp#staticroutebfd: Received message : '('PortChannel101|30.1.0.1/24', 'DEL', ())'
2024 Aug 9 18:10:06.257872 sonic DEBUG bgp#staticroutebfd: get ip from intf key: valid True is_ipv4 True, if_name PortChannel101 ip 30.1.0.1
2024 Aug 9 18:10:06.285108 sonic DEBUG bgp#bgpcfgd: Received message : '('PortChannel101|30.1.0.1/24', 'DEL', ())'
2024 Aug 9 18:10:06.285492 sonic WARNING bgp#bgpcfgd: Delete key 'PortChannel101|30.1.0.1/24' is not supported for 'zebra set src' templates
2024 Aug 9 18:10:06.290529 sonic INFO systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
2024 Aug 9 18:10:06.298396 sonic INFO systemd[1]: Starting logrotate.service - Rotate log files...
2024 Aug 9 18:10:06.305718 sonic NOTICE syncd#SDK: [SAI_ROUTE.NOTICE] ./src/mlnx_sai_route.c[931]- mlnx_remove_route: Remove ROUTE_ENTRY 30.1.0.1 255.255.255.255
2024 Aug 9 18:10:06.314719 sonic NOTICE swss#orchagent: :- removeIp2MeRoute: Remove packet action trap route ip:30.1.0.1
2024 Aug 9 18:10:06.322976 sonic NOTICE bgp#fpmsyncd: message repeated 29 times: [ :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fe80::/64 :: eth0 ]
2024 Aug 9 18:10:06.322976 sonic NOTICE bgp#fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fd00::/80 :: docker0
2024 Aug 9 18:10:06.332575 sonic INFO systemd[1]: sysstat-collect.service: Deactivated successfully.
2024 Aug 9 18:10:06.332707 sonic INFO systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
2024 Aug 9 18:10:06.478214 sonic ERR swss#orchagent: :- meta_sai_validate_route_entry: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"30.1.0.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn't exist
2024 Aug 9 18:10:06.479278 sonic ERR swss#orchagent: :- flush_removing_entries: EntityBulker.flush remove entries failed, number of entries to remove: 1, status: SAI_STATUS_ITEM_NOT_FOUND
2024 Aug 9 18:10:06.479278 sonic ERR swss#orchagent: :- removeRoutePost: Failed to remove route prefix:30.1.0.0/24
Steps to reproduce the issue:
- Do config reload
config reload -y -f
- Wait for BGP route convergency to start
2024 Aug 9 18:09:04.467477 sonic NOTICE syncd#SDK: :- threadFunction: time span 256 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug 9 18:09:04.755004 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug 9 18:09:05.467611 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug 9 18:09:05.467707 sonic NOTICE syncd#SDK: :- threadFunction: time span 51 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug 9 18:09:05.493543 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[1929]- set_dispatch_attrib_handler: Set PACKET_ACTION, key:ROUTE_ENTRY 0.0.0.0 0.0.0.0, val:FORWARD
2024 Aug 9 18:09:05.927877 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug 9 18:09:06.467724 sonic NOTICE syncd#SDK: :- threadFunction: time span 480 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug 9 18:09:06.577540 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug 9 18:09:07.467829 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug 9 18:09:07.467915 sonic NOTICE syncd#SDK: :- threadFunction: time span 241 ms for 'bulkset:SAI_OBJECT_TYPE_ROUTE_ENTRY:1000'
2024 Aug 9 18:09:07.844777 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 1000 Routes: 1000 success, 0 not executed, 0 failed
2024 Aug 9 18:09:08.086475 sonic NOTICE syncd#SDK: [SAI_UTILS.NOTICE] ./src/mlnx_sai_utils.c[2337]- mlnx_bulk_statuses_print: [BULK_SET] 128 Routes: 128 success, 0 not executed, 0 failed
2024 Aug 9 18:09:08.328286 sonic NOTICE swss#orchagent: :- addRoutePost: Update Nexthop Group 10.0.0.61@PortChannel103,10.0.0.63@PortChannel104
- Do config
root@sonic:/home/admin# cat config.sh
#!/bin/bash
sudo config interface ip add PortChannel101 30.1.0.1/24
sudo ip neigh replace 30.1.0.2 lladdr 54:54:00:ad:48:00 dev PortChannel101
docker exec -ti swss swssconfig /ro.json
root@sonic:/# cat ro.json
[
{
"ROUTE_TABLE:10.1.0.32": {
"ifname": "PortChannel101",
"nexthop": "30.1.0.2"
},
"OP": "SET"
}
]
- Do clean up
root@sonic:/home/admin# cat cleanup.sh
#!/bin/bash
sudo ip neigh del 30.1.0.2 dev PortChannel101
sudo config interface ip remove PortChannel101 30.1.0.1/24
Describe the results you received:
2024 Aug 9 18:10:06.478214 sonic ERR swss#orchagent: :- meta_sai_validate_route_entry: object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"30.1.0.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000002"} doesn't exist
2024 Aug 9 18:10:06.479278 sonic ERR swss#orchagent: :- flush_removing_entries: EntityBulker.flush remove entries failed, number of entries to remove: 1, status: SAI_STATUS_ITEM_NOT_FOUND
Describe the results you expected:
No errors are expected
Output of show version
:
- N/A
Output of show techsupport
:
- N/A
Additional information you deem important (e.g. issue happens only occasionally):
The bug is reproducible via: sonic-mgmt/tests/route/test_duplicate_route.py
@prsunny, could you please help to check?
@dgsudharsan and @prsunny , issue transferred to sonic-swss
AFAIK, this is as per design that if one failure happens on the bulk, the rest may get skipped. @dgsudharsan , @prabhataravind , do you have any comments? This may be an enhancement item.
@prsunny can you please suggest on how we should proceed? The issue was captured by sonic-mgmt/tests/route/test_duplicate_route.py
. Without a fix, this test will be flaky
AFAIK, this is as per design that if one failure happens on the bulk, the rest may get skipped. @dgsudharsan , @prabhataravind , do you have any comments? This may be an enhancement item.
Yes this is as per bulker day-1 design. I'll see if there's an easy way to address this (i.e still continue with other entries even if one entry in the bulker fails).