flame/blis

Cortex-A72 Flops?

jfikar opened this issue · 4 comments

jfikar commented

Hi, I'm running HPL-2.3 on a Cortex-A72 (Raspberry Pi 4) and I'm wondering, how fast is Cortex-A72 supposed to be. Wikipedia says Cortex-A72 should achieve only 2 Flop per clock per core (FP64).

But on a single core I'm getting like 3.2 Flop per clock per core (BLIS 0.9.0). So maybe Wikipedia is wrong and Cortex-A72 should actually achieve 4 Flop per clock per core as Cortex-A57?

What about Cortex-A73? Wikipedia says again only 2 here.

Thanks

Model B hits 1.8 GHz so 3.2 GF/s seems more like 2 flop/clock, no?

https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/

jfikar commented

3.2 Flop/cycle/core is not the GFlop/s score reported by HLP itself.

On all four cores I get 17.93GFlop/s, divided by 1.8GHz and divided by 4 cores gives 2.5 Flop/cycle/core. Although Wikipedia says it should be not greater than 2, which is the theoretical maximum.

But the cores are competing for resources. So I can run HPL on a single core as well. Then HPL shows 5.82GFlop/s, divided by 1.8GHz gives 3.2 Flop/cycle/core, as I reported earlier. This is definitely much larger than Wikipedia suggests.

Wikipedia is wrong, or is confusing. The post on the ARM forum is correct. Perhaps the problem with Wikipedia is that it is listing instructions does math operations, and the FMA instruction that does 2 flop/clock is the issue. Or its just plain wrong. I don't know. ARM is a more reliable source on this topic than Wikipedia.

jfikar commented

I've changed the Wikipedia page accordingly.