sonic-net/sonic-sairedis

Question about SaiDiscovery and ViewTransition

ngoc-do opened this issue · 8 comments

Hello,

I have a basic question:

What is the purpose of SaiDiscovery?

Looks like SaiDiscovery discovers a bunch of rids for switch/ports... and they are then used for applyViewTransition(). rids found during SaiDiscovery will not be removed, but what is applyViewTransition for?

Thanks,
Ngoc

SaiDiscovery purpose is to find all RID values present on device after cold or warm boot, basically we want to know what objects are created by internal vendor when create_switch is execute. Also some new RIDs can pop up when we do a warm boot to new SAI.

applyViewTransition will translate any differences from old view to new view, basically orchagent (OA) compiles a switch configuration and pushes that to syncd, and every restart warm/cold, a new configuration can be compiled, which can be the same or different, and base on comparison logic between both views, difference will be made, and that difference must be execute on asic to match what OA requestes.

purpose is to have minimal dataplane disruption, since reconfiguring switch from scratch can take a long time

Thank you. I got the idea.

I would like to follow up this topic with more questions about sai discovery.

Currently we start supporting new asic type - fabric. Different from NPU asics, fabric doesn't participate in forming VLANs (and many other features). So, after chip init, syncd runs sai discovery and does a bunch of queries on fabric asic. One of the attributes it tries to collect is VLAN_MEMBER_LIST (and some other attributes) causes syncd to crash.

Of course, the problem here is at BCM SAI, but fixing this by BCM is not trivial and requires long time. They don't expect apps would query VLAN attributes or so on fabric asic.

Here one possible solution could be that for fabric asic, we don't do sai discovery until BCM SAI completely supports it.

    if (getSwitchType() != SAI_SWITCH_TYPE_NPU)
    {
        helperDiscover();
    }

What would you think?

This discovery is required for syncd, please fix brcm SAI. SAI specifies if attribute is not supported, then should return SAI_STATUS_NOT_SUPPORTED and not crash.

Thanks for getting back to me.

The problem is more than just using SAI_STATUS_NOT_SUPPORTED to avoid crashing.

More specifically, for example, with attribute SAI_VLAN_ATTR_MEMBER_LIST, SaiDiscovery

sai_status_t status = m_sai->get(mk.objecttype, mk.objectkey.key.object_id, 1, &attr);

will call

sai_status_t sai_metadata_generic_get_SAI_OBJECT_TYPE_VLAN(
    _In_ const sai_object_meta_key_t *meta_key,
    _In_ uint32_t attr_count,
    _Inout_ sai_attribute_t *attr_list)
{
    return sai_metadata_sai_vlan_api->get_vlan_attribute(meta_key->objectkey.key.object_id, attr_count, attr_list);
}

However, sai_metadata_sai_vlan_api is NULL, because BRCM SAI doesn't allocate it for fabric asics.

Look the code below at BRCM SAI.

sai_status_t
_brcm_sai_dm_init(bool wb, char *path)
{
    if (!BRCM_SAI_IS_FEAT_FABRIC_SWITCH())   // Allocate only if asic is NOT FABRIC
    {
        rv = _brcm_sai_alloc_lag();
        rv = _brcm_sai_alloc_switch_gport();
        rv = _brcm_sai_alloc_vlan();             --------> vlan_apis is not allocated for fabric asic
        rv = _brcm_sai_alloc_bridge_info();
        rv = _brcm_sai_alloc_buff_pools();
        rv = _brcm_sai_alloc_port_info();
        rv = _brcm_sai_alloc_system_port_info();
        rv = _brcm_sai_alloc_stp(wb);
        rv = _brcm_sai_alloc_mirror();
        rv = _brcm_sai_alloc_vrf();
        rv = _brcm_sai_alloc_rif();
        rv = _brcm_sai_alloc_wred();
        rv = _brcm_sai_alloc_queue_wred();
        rv = _brcm_sai_alloc_qos_map();
        rv = _brcm_sai_alloc_tunnel();
        rv = _brcm_sai_alloc_nat();
        rv = _brcm_sai_alloc_sched_group_info();
        rv = _brcm_sai_alloc_sched();
        rv = _brcm_sai_alloc_route_info();
        rv = _brcm_sai_alloc_nh_info();
        rv = _brcm_sai_alloc_nbr();
        rv = _brcm_sai_alloc_policer();
        rv = _brcm_sai_alloc_hostif();
        rv = _brcm_sai_alloc_nhg_info();
        rv = _brcm_sai_alloc_udf();
        rv = _brcm_sai_alloc_hash();
        rv = _brcm_sai_alloc_acl();
        rv = _brcm_sai_alloc_tam();
        rv = _brcm_sai_alloc_samplepacket();
        rv = _brcm_sai_alloc_l2mc_lag_info();
        rv = _brcm_sai_alloc_iso_group();
        rv = _brcm_sai_alloc_rpf_group_info();
        rv = _brcm_sai_alloc_ipmc_group_info();
        rv = _brcm_sai_alloc_mlag_isl_info();
        rv = _brcm_sai_alloc_mlag_sag_mirror_info();
        rv = _brcm_sai_alloc_samplepacket_prof_refs();
        rv = _brcm_sai_alloc_debug_counter(wb);
        rv = _brcm_sai_post_alloc_update(wb);
    }
}

From some view, BRCM is right. Why do we need apis for features an asic doesn't have? We should check if sai_metadata_sai_vlan_api == nullptr before we can call it in sai discovery, at least.

But given that there are so many APIs (PORT, HOSTIF, LAG) not allocated by BRCM SAI for fabric asics, does it make sense to do sai discovery for fabric asics?

i guess we could check for null, @lguohan whats your opinion here? we had similar discussion some time ago

OK, thanks.

I made opencomputeproject/SAI#1182 for checking null. Please help review. This needs to go up first before we can do the other supports for fabric asic.

that SAI pr was merged, closing