MLXXXp/Arduboy2

Possible rewrite of 'drawFastVLine'?

Pharap opened this issue · 4 comments

This is more of a matter for the future than an immediate issue, but while doing something Arduboy related I noticed that drawFastVLine is still using the 'dumb' implementation:

Arduboy2/src/Arduboy2.cpp

Lines 547 to 555 in 2374247

void Arduboy2Base::drawFastVLine
(int16_t x, int16_t y, uint8_t h, uint8_t color)
{
int end = y+h;
for (int a = max(0,y); a < min(end,HEIGHT); a++)
{
drawPixel(x,a,color);
}
}

I've been aware for a while that theoretically it should be possible to do a fast version that simply fills the inbetween bytes with 0x00 or 0xFF and handles the ends of the lines specially.

When I have the time to do so, I'd like to have a go at attempting a faster (but potentially larger) implementation (regardless of whether or not it makes it into the library).

It probably won't be any time soon, but if such an implementation proved to be faster, and hypothetically let's assume it's larger and thus increases the library's progmem usage, would such a function be of interest?

I would have suggested adding it as an additional function rather than replacing the existing function, but as the existing function is drawFastVLine, it would be a bit odd for it not to hold the fastest implementation. It's unfortunate that it wasn't simply drawVLine, as then adding a secondary 'faster' version would have been less messy. (Though I suppose drawFasterVLine is always an option? Albeit a very 'quirky' one.)

Just to leave it here, here's what I've thought of so far:

constexpr uint8_t startYLookup[8]
{
	(0xFF >> 0), (0xFF >> 1), (0xFF >> 2), (0xFF >> 3),
	(0xFF >> 4), (0xFF >> 5), (0xFF >> 6), (0xFF >> 7),
};

uint8_t startByte = yLookup[startY % 8];

if(colour != 0)
	sBuffer[index] |= startByte;
else
	sBuffer[index] &= ~startByte;

I have yet to work out the remaining details or check that this does what I think it does, but hopefully it illustrates the idea that it should be possible to derive the start and end bytes from the coordinates and then fill the remaining columns with all on (0xFF) or all off (0x00) bytes.

Whether this is truly faster remains to be seen, but theoretically it should do less work than the drawPixel implementation.

if such an implementation proved to be faster, and hypothetically let's assume it's larger and thus increases the library's progmem usage, would such a function be of interest?

It would depend on how much more code the refactoring ended up producing. I don't like to increase the size of functions that have been commonly used in the past, for fear of a previously working sketch failing to compile due to increased code size.

However, I suspect most sketches using this function likely would 1) be small enough that the increased size would still be under the limit, or 2) still fit due to the fact that other areas of the library have been reduced since the sketch was released, thus offsetting the increased size of this function.

So again, we would have to see just how much larger the function gets and make a decision based on that, also weighing against the actual speed increase obtained. If any size increase ends up to be uncomfortably greater, I'm not opposed to adding a new function named, as proposed, drawFasterVLine() (with the documentation for drawFastVLine() updated to note that the new function should be used for new development).

We also have to take into account that drawFastVLine() is used internally for other rectangle and circle drawing functions. So if, due to size, a new drawFasterVLine() were implemented, would we also create new "fast" functions for all those that could benefit from it?

I likely won't get chance to trial anything for a little while yet, so this is going to be more of a slow-burn issue that might not come to fruition for a while. (And the other issues I raised are more important anyway.)

would we also create new "fast" functions for all those that could benefit from it?

Hypothetically speaking, if there was a notable speed increase, (and the increase were a regular increase rather than a circumstantial one,) then yes, because if drawFasterVLine were to have the potential to take N fewer cycles (compared to drawFastVLine) then functions like fillCircle would have the potential to multiply that saving.

For example, a call to fillCircle with an r is 1 should theoretically call drawFastVLine 5 times, meaning that a hypothetical fillCircleFaster (using drawFasterVLine instead of drawFastVLine) would potentially take 5 * N fewer cycles. (And theoretically the larger the circle drawn, the greater the potential saving would be, assuming that the cycle saving were consistent rather than circumstantial.)

Just now I noticed this issue. I think this is an important problem, so I did a PR #75