Benchmark Adlink Ampere Altra Dev Kit - 64-core 2.2 GHz

Question

Benchmark Adlink Ampere Altra Dev Kit - 64-core 2.2 GHz

geerlingguy opened this issue a year ago · 3 comments

I have been sent an ADLINK Ampere Altra Dev Kit with a Q64-22 CPU for testing.

I've installed 96 GB of Samsung DDR4 3200 ECC RAM, and would like to see how it compares to some of its beefier bretheren (see #10 and #17).

Some interesting notes/observations are documented on the unofficial Ampere Altra Dev Kit wiki.

Answer 1 · 2023-10-09T22:51:15.000Z

Running with this repository's non-Ampere-optimized blis setup:

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   87236
NB     :     256
PMAP   : Row-major process mapping
P      :       8
Q      :       8
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       87236   256     8     8             734.91             6.0225e+02
HPL_pdgesv() start time Mon Oct  9 22:48:33 2023

HPL_pdgesv() end time   Mon Oct  9 23:00:48 2023

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.46720793e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

Results:

Total system power consumption (average): 140W
HPL Total: 602.25 Gflops
Efficiency: 4.30 Gflops/W

Answer 2 · 2023-10-09T23:08:01.000Z

Also going to try with Ampere's optimized config: https://github.com/AmpereComputing/HPL-on-Ampere-Altra

Answer 3 · 2023-10-10T02:57:47.000Z

Optimized version, 655.9 Gflops, at about 140W again. So 4.685 Gflops/W

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :  105000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       8 
Q      :       8 
PFACT  :   Right 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :  1ringM 
DEPTH  :       1 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4      105000   256     8     8            1176.66             6.5590e+02
HPL_pdgesv() start time Tue Oct 10 02:36:51 2023

HPL_pdgesv() end time   Tue Oct 10 02:56:27 2023

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.33469497e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================