This is a test to see whether it's faster to directly read an atomic bool and decide what to do each time, or to cache a function pointer when possible.
This seems to suggest that they're equally fast, except when the atomic bool needs to be read and a function ptr called based on the result, which is pretty much what I expected.
My results (AMD Threadripper 2950X):
ptr true time: [2.3291 ns 2.3353 ns 2.3425 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
ptr false time: [2.1218 ns 2.1305 ns 2.1410 ns]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
cached ptr true time: [1.2003 ns 1.2055 ns 1.2130 ns]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
cached ptr false time: [1.1955 ns 1.1995 ns 1.2042 ns]
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
direct true time: [1.2014 ns 1.2039 ns 1.2064 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
direct false time: [1.1814 ns 1.1843 ns 1.1875 ns]
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe