COSIMA/access-om3

[0.25deg config] Auto-Generated Mask Table to Exclude Land Masks

Closed this issue · 3 comments

This feature automatically generates a mask table to exclude land masks, effectively masking out processors that contain only land points.

AUTO_MASKTABLE = True           !   [Boolean] default = False
                                ! Turn on automatic mask table generation to eliminate land blocks.
AUTO_IO_LAYOUT_FAC = 0          !   default = 0
                                ! When AUTO_MASKTABLE is enabled, io layout is calculated by performing integer
                                ! division of the runtime-determined domain layout with this factor. If the factor
                                ! is set to 0 (default), the io layout is set to 1,1.

Comments:

  1. When AUTO_MASKTABLE is enabled, the processor layout is determined automatically and cannot be modified manually,
    if (auto_mask_table) then
        if (layout(1) /= 0 .and. layout(1) /= auto_layout(1)) then
          call MOM_error(FATAL, "Cannot set LAYOUT or NIPROC when AUTO_MASKTABLE is enabled.")
        endif
        if (layout(2) /= 0 .and. layout(2) /= auto_layout(2)) then
          call MOM_error(FATAL, "Cannot set LAYOUT or NJPROC when AUTO_MASKTABLE is enabled.")
        endif
        layout(:) = auto_layout(:)
    endif
  1. Tuning the io layout,
...
     if (auto_io_layout_fac>0) then
       io_layout(1) = max(layout(1)/auto_io_layout_fac, 1)
       io_layout(2) = max(layout(2)/auto_io_layout_fac, 1)
...

where layout is the number processors in x and y directions.

  1. With a low core count for the ocn component, there may be no masked blocks available. Hence it cannot automatically eliminate any land blocks, which can cause an error. This wont be an issue for a production run.
  if (num_masked_blocks == 0) then
    call MOM_error(FATAL, "Couldn't auto-eliminate any land blocks. Try to increase the number "//&
        "of MOM6 PEs or set AUTO_MASKTABLE to False.")
  endif
  1. With a core count of 1344 for the ocn component, a performance speedup of around 22% can be achieved.

Is there a specific reason to only do this for 0.25 degree? I guess the performance different in 1 degree would be very small but maybe we still do it for consistency / simplicity?

I havent run a 1deg configuration to provide a performance speedup benchmark. To maintain consistency with the 0.25deg config, I suggest opening a new issue and a PR to add this feature to the 1deg config.

I havent run a 1deg configuration to provide a performance speedup benchmark. To maintain consistency with the 0.25deg config, I suggest opening a new issue and a PR to add this feature to the 1deg config.

I ran the 1deg config with this option and it did speed-up the run for higher core counts, nothing too impressive, but also not negligible (I can try to dig up the exact numbers if someone is interested).