foss-for-synopsys-dwc-arc-processors/glibc

[glibc] Optimize memcopy to Improve EEMBC Network 2.0 ip_reassembly / nat

vineetgarc opened this issue · 0 comments

glibc memcpy showed as top hotspot when profiling EEMBC network 2.0 specifically in 2 sub-tests

# perf stat   gcc/bin/ip_reassembly.exe -autogo >/tmp/x

 Performance counter stats for 'gcc/bin/ip_reassembly.exe -autogo':

           1137.96 msec task-clock                #    0.965 CPUs utilized          
               229      context-switches          #    0.201 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               733      page-faults               #    0.644 K/sec                  
     1,137,637,340      cycles                    #    1.000 GHz                    
       347,444,844      instructions              #    0.31  insn per cycle         
        30,389,001      branches                  #   26.705 M/sec                  
           5042494      branch-misses             #   16.59% of all branches        

       1.179703860 seconds time elapsed
# perf record -c 10000 gcc/bin/ip_reassembly.exe -autogo >/tmp/x
#
# Samples: 117K of event 'cycles'
# Event count (approx.): 1176170000
#
# Overhead       Samples  Command          Shared Object      Symbol            
# ........  ............  ...............  .................  .................. ..................

  61.29%         72084  ip_reassembly.e  libc-2.32.so       [.] _wordcopy_fwd_aligned
  15.95%         18762  ip_reassembly.e  ip_reassembly.exe  [.] ip_input
   9.82%         11551  ip_reassembly.e  ip_reassembly.exe  [.] ip_reass
   2.47%          2905  ip_reassembly.e  ip_reassembly.exe  [.] m_cat
   1.86%          2185  ip_reassembly.e  libc-2.32.so       [.] memmove

ARC glibc port uses the generic implementation of memcpy/memset which are already decentbut can be optimized for ARC with

  • unaligned access
  • Double load/store
  • any other arch specific helpers such as clz etc.