learn-more/findpattern-bench

Bug in learn_more pattern

Closed this issue · 3 comments

The +3 increment on both findpattern implementations for learn_more can move beyond the string pattern null terminator. This can cause undesired memory faults or potentially extra loops. I've fixed it by calculating the pattern width dyanmically with added constraint that the ?? style must be used, and not a single ?

inline uint64_t findPattern(const uint64_t rangeStart, size_t len, const char* pattern)
{
	size_t l = strlen(pattern);

	// l = 2 * b + (b - 1) . 2 chars per byte + b - 1 spaces between
	size_t patSize = (l + 1) / 3;
	uint8_t* patt_base = (uint8_t*)_alloca(patSize + 1);
	uint8_t* msk_base = (uint8_t*)_alloca(patSize + 1);
	uint8_t* pat = patt_base;
	uint8_t* msk = msk_base;

	l = 0;
	while (patSize) {
		if (*(uint8_t*)pattern == (uint8_t)'\?') {
			*pat++ = 0;
			*msk++ = '?';
		} else {
			*pat++ = getByte(pattern);
			*msk++ = 'x';
		}
		pattern += 3;
		l++;
		patSize--;
	}

	if (l > len)
		return NULL;

	*msk = 0;
	pat = patt_base;
	msk = msk_base;
	for (size_t n = 0; n < (len - l); ++n)
	{
		if (isMatch((uint8_t*)(rangeStart + n), patt_base, msk_base)) {
			return rangeStart + n;
		}
	}
	return NULL;
}

edit:
I see what you mean.

this is my final modified implementation: https://github.com/stevemk14ebr/PolyHook_2_0/blob/bb97d61aeb9409c9c2146ae10a3fd3590cde5fe7/sources/Misc.cpp#L3

If i disable the cheating-length check thing which uses a pattern style incompatible with these changes I get the following results on my extremely slow laptop:

Running tests on 15 different implementations
===========
Running learn_more
Finding pattern 0 x 1024 took 633.017 ms.
Finding pattern 1 x 1024 took 641.554 ms.
===========
Running learn_more v2
Finding pattern 0 x 1024 took 410.54 ms.
Finding pattern 1 x 1024 took 340.472 ms.
===========
Running fdsasdf
Finding pattern 0 x 1024 took 263.745 ms.
Finding pattern 1 x 1024 took 262.075 ms.
===========
Running DarthTon
Finding pattern 0 x 1024 took 318.062 ms.
Finding pattern 1 x 1024 took 341.577 ms.
===========
Running DarthTon v2
Finding pattern 0 x 1024 took 24.704 ms.
Finding pattern 1 x 1024 took 24.499 ms.
===========
Running Forza (Boyer-Moore Variant)
Finding pattern 0 x 1024 took 64.282 ms.
Finding pattern 1 x 1024 took 32.679 ms.
===========
Running Forza (SIMD With OpenMP)
Finding pattern 0 x 1024 took 20.925 ms.
Finding pattern 1 x 1024 took 23.095 ms.
===========
Running kokole
Finding pattern 0 x 1024 took 425.587 ms.
Finding pattern 1 x 1024 took 376.131 ms.
===========
Running mrexodia
Finding pattern 0 x 1024 took 1015.9 ms.
Finding pattern 1 x 1024 took 1032.42 ms.
===========
Running atom0s
Finding pattern 0 x 1024 took 449.693 ms.
Finding pattern 1 x 1024 took 443.606 ms.
===========
Running atom0s (mrexodia modification)
Finding pattern 0 x 1024 took 941.833 ms.
Finding pattern 1 x 1024 took 948.177 ms.
===========
Running mrexodia (horspool)
Finding pattern 0 x 1024 took 69.67 ms.
Finding pattern 1 x 1024 took 57.086 ms.
===========
Running dom1n1k_Patrick
Finding pattern 0 x 1024 took 422.518 ms.
Finding pattern 1 x 1024 took 424.873 ms.
===========
Running Michael
Finding pattern 0 x 1024 took 192.703 ms.
Finding pattern 1 x 1024 took 182.146 ms.
===========
Running superdoc1234
Finding pattern 0 x 1024 took 259.932 ms.
Finding pattern 1 x 1024 took 255.036 ms.
Done.

Which compared to the orig:

Running learn_more v2
Finding pattern 0 x 1024 took 311.11 ms.
Finding pattern 1 x 1024 took 278.121 ms.

Is slower :p but this probably fixes the long standing bug i discovered where some patterns failed to be found in every case. I assume this was due to the null not being checked and the finder randomly trying to match whatever garbage bytes were after until the next null.

Thanks :)

I do have another implementation that should work.