/multichase

Primary LanguageCApache License 2.0Apache-2.0

Multichase - a pointer chaser benchmark
Multiload - a superset of multichase which runs latency, memory bandwidth, and loaded-latency

1/ BUILD

   - just type:

     $ make

2/ INSTALL

   - just run from current directory or copy multichase wherever you need to

3.1/ RUN Multichase

  - To get help

    $ multichase -h

   - By default, multichase will perform a pointer chase through an array
     size of 256MB and a stride size of 256 bytes for 2.5 seconds on a single
     thread:

     $ multichase

   - Pointer chase through an array of 4MB with a stride size of 64 bytes:

     $ multichase -m 4m -s 64

  - Pointer chase through an array of 1GB for 10 seconds (-n is the number of 0.5  second samples):

     $ multichase -m 1g -n 20

  - Pointer chase through an array of 256KB with a stride size of 128 bytes on 2 threads.
    Thread 0 accesses every 128th byte, thread 1 accesses every 128th byte offset by sizeof(void*)=8
    on 64bit architectures:

    $ multichase -m 256k -s 128 -t 2

3.2/ RUN Multiload

  - Latency Only (simple pointer chase)
    In this mode, Multiload can run any of the multichase commands above.
    A "-c" chase arg (other than chaseload) can be used or it will default to "simple".
    Using either "-c chaseload" and/or the "-l" load arguments will choose a different test mode.

    $ multiload

  - Bandwidth Only
    Multiload can run a memory bandwidth test using the "-l" load argument. The "-c" chase argument MUST NOT be used.
    Below command runs 5 samples (~2.5 seconds each), using 16 threads, using the glibc memcpy() function,
    using a 512M buffer per thread.

    $ multiload -n 5 -t 16 -m 512M -l memcpy-libc

  - Loaded Latency.
    Multiload can run 1 pointer chaser thread on logical cpu0 with multiple memory bandwidth load threads.
    The "-c chaseload" arg MUST be used. The "-l" arg MUST be used with one of the memory load arguments.
    Below command runs 5 samples (~2.5 seconds each), on 16 threads (1 chase, 15 stream-sum bandwidth loads),
    using a 512M buffer per thread. The chase thread uses a stride=16.

    $ multiload -s 16 -n 5 -t 16 -m 512M -c chaseload -l stream-sum

3.3/ RUN Pingpong & fairness

   - Pingpong: measure latency of exchanging a line between cores.
     To run, simply do:
    $ pingpong -u

   - Fairness: measure fairness with N threads competing to increment an atomic variable.
     To run, simply do:
    $ fairness