doug-gilbert/sg3_utils

rescan-scsi-bus.sh: function "findremapped" is too slow when there is 1K luns on server,is there any way to solve it?

LiuXing108 opened this issue · 5 comments

There is a loop nesting in function "findremapped".
we use "scsi-rescan -f -u -m" to scan all devices.

while read -r hctl sddev id_serial_old ; do
    remapped=0
    ……
    # If udev events updated the disks already, but the multipath device isn't update
    # check for old devices to make sure we found remapped luns
    if [ -n "$mp_enable" ] && [ $remapped -eq 0 ]; then
      findmultipath "$sddev" $id_serial    
      if [ $? -eq 1 ] ; then
        remapped=1
      fi
    fi
    ……
done < $tmpfile

Been thinking about this one but have not found any way to make it substantially faster. I read that in environments using Unicode that adding 'export LC_ALL="C" ' at the top of the script can speed calls to standard Unix string searching utilities (e.g. grep). My locale is en_CA.UTF-8 (i.e. not Unicode) and that export made a small improvement. I'm open to ideas from others. Also could you quantify "too slow"?

we found there is a O(n^2) in this func which comes from "while" and "findmultipath", and we use this shell like "scsi-rescan -f -u -m", so if a lun has m paths, loop times equals to m * 1K * 1K。
now we use a temporary file to record information of mpath in advance :

getallmultipathinfo()
{
  local mp=
  local uuid=
  local dmtmp=
  local maj_min=
  local tmpfile=

  for mp in $($DMSETUP ls --target=multipath | cut -f 1) ; do
    [ "$mp" = "No" ] && break;
    maj_min=$($DMSETUP status "$mp" | cut -d  " " -f14)
    if [ ! -L /dev/mapper/${mp} ]; then
      echo "softlink /dev/mapper/${mp} not available."
      continue
    fi
    local ret=$(readlink /dev/mapper/$mp 2>/dev/null)
    if [[ $? -ne 0 || -z "$ret" ]]; then
      echo "readlink /dev/mapper/$mp failed. check multipath status."
      continue
    fi
    dmtmp=$(basename $ret)
    uuid=$(cut -f2 -d- "/sys/block/$dmtmp/dm/uuid")
    echo "$mp $maj_min $dmtmp $uuid" >> $TMPLUNINFOFILE
  done
}

findmultipath(){
  ……
  maj_min=$(cat "/sys/block/$dev/dev")
    mp=$(cat $TMPLUNINFOFILE | grep -w "$maj_min" | cut -d " " -f1)
    if [ -n "$mp" ]; then
      if [ -n "$find_mismatch" ] ; then
        uuid=$(cat $TMPLUNINFOFILE | grep -w "$maj_min" | cut -d " " -f4)
        if [ "$find_mismatch" != "$uuid" ] ; then
          addmpathtolist "$mp"
          found_dup=1
        fi
      else
        # Normal mode: Find the first multipath with the sdev
        # and add it to the list
        addmpathtolist "$mp"
        return
      fi
    fi
  ……
}

findremapped(){
  ……
    udevadm_settle 2>&1 /dev/null
  echo "Done"

  getallmultipathinfo

  # See what changed and reload the respective multipath device if applicable
  while read -r hctl sddev id_serial_old ; do
  ……
}

“Also could you quantify "too slow"?”:I will e-mail you a result of contrast later on

Looks good. I think you need something like 'truncate -s 0 $TMPLUNINFOFILE' at the start of getallmultipathinfo() since rescan-scsi-bus.sh may be called more than once. Look forward to your timings report showing a significant improvement.

@doug-gilbert

We tested with “rescan-scsi-bus -f -u -m” on the original and optimized script. All SCSI devices have been mapped to dm device.

Here is the comparison of time spent before and after optimization:

Number of LUN Number of Path Before optimization After optimization
128 16 213m21s 2m35s
256 16 967m54s 5m27s

We found the most time exhausted on these two code lines in original script:

findmultipath()
{
    ……
    mp2=$($MULTIPATH -l "$mp" | egrep -o "dm-[0-9]+")
    mp2=$(cut -f2 -d- "/sys/block/$mp2/dm/uuid")
    ……
}

They spent almost 5 seconds when there are 128LUNs * 16paths, and 10~15 seconds when 256LUNs * 16paths each execute. And the outer loop should loop 2k or 4k times.
After optimization,we do not need these two lines any more.

I have placed this patch in svn revision 973 now mirrored. Perhaps you could check if that as been done accurately and that the impressive speed-up is still present.