Very minor Performance issue in comparison to CMSIS
Closed this issue · 4 comments
When working on #1 using the following code, I spotted the following;
Differences
-
First up is the read modify write for the
MODER
register which takes 2 more instructions in cppreg vs cmsis for cortex m0plus. This seems to be because for cppreg the address of the register is built via multiple immediate while in CMSIS it re-uses theMODER
address when accessingBSRR
. -
Secondly, it seems cppreg for writing a value to the
BSRR
does a simplerstr r1, [r3]
in cppreg instead of the potentially more expensivestr r2, [r3, #24]
instruction under CMSIS. The Cortex-M0+ Technical Reference Manual says there is no difference, but this may be different for M7 and more complicated architectures.
Potential Cause
I feel the first and second difference are somewhat related because in CMSIS the compiler is informed that MODER
and BSRR
are related via offsets from a base pointer, while in CPPReg they look totally unrelated. This results in the compiler having to "rebuild" the address twice for cppreg, once from immediates for MODER
and another from a hardcoded address stored in the .text section. Furthermore, it seems the two instruction difference is also due to the masking and applying the value
Solution
I view this as being caused by the architecture limitations of Cortex M and the design of cppreg.
-
Regarding the architecture, if you compile this for X86-64 or non thumb ARM then you see the issue goes away (the assembly is identical). I think this is because immediates in those ISA's/ARCH's can be huge due to, well, instructions being allowed to be very large too. This results in all stores/loads being done via full immediates instead of an immediate and offset or shifting immediate to build the address.
-
Regarding cppreg, you cannot specify that multiple registers are just an offset from each other instead of totally unrelated areas (CMSIS does this by placing a huge struct on the address). I do not see an easy way to give the compiler that sort of information either.
Real World Implications
To be frank, this difference in assembly from a performance standpoint is small, very small. Ideally the reason for the discrepancy can be verified/found with cppreg adopting the smaller of the two. But, writing to a register is very rarely a bottle neck unless you are bit banging, in which case you should probably be starting to seriously consider assembly instead.
This is interesting. I did a few modifications (see here).
The cppreg version is now only larger by one instruction compared to the CMSIS code. Here is a thought:
when we use the template form for the MODER
write call this calls the regular write function. Technically because in such a case the value, the offset, and the mask are compile-time constants part of the write implementation could be simplified (this is done in the super_write
of the modified code). That seems to be where the simplification occurs.
Two things:
- It will not be much work to implement such faster
write
when the template form is used and it seems it could bring some additional performance. - Regarding the offset between registers we could probably implement an abstract peripheral type or rather a cluster of register types but that is a bit more work and revisions.
So after careful checking it seems that the only difference in the implementation given in the previous comment is related to what @hak8or mentioned about registers between related through an offset.
I think this is great news because this means we obtain quasi-identical performance to a CMSIS implementation. We could create an issue for a register cluster implementation as mentioned above.
Awesome, in that case:
- May be worthwhile to implement
- Would be fantastic, I will create the issue. It sounds like a decent bit of changes to the API though so maybe it will be worthwhile to clump together the various potential API changes before starting.
In that case, I say we close this after implementing your point 1 with me saying that the grouping will be done in #7.
Agreed. I will however create an issue for the new access policies implementation (already available but needs to be merged).