Endless loop in findPartialMatches, azure plugin code
jweingarten-rv opened this issue · 6 comments
Version
"@nut-tree/nl-matcher": "^3.0.2",
"@nut-tree/nut-js": "^4.3.0",
"@nut-tree/plugin-azure": "^2.1.0",
Short overview
While upgrading plugin-azure from version 1 to 2.1.0 we are running into an endless loop that boils down to a bug in findPartialMatches,js in @nut-tree\plugin-azure\dist\ocr\stringQueryProcessing.function.js.
node version:
v20.11.0
OS type and version:
Windows / Mac
Detailed error description
Full code sample to reproduce
While upgrading plugin-azure from version 1 to 2.1.0 we are running into an endless loop that boils down to a bug in findPartialMatches in @nut-tree\plugin-azure\dist\ocr\stringQueryProcessing.function.js.
During execution the function findPartialMatches is called with the following parameters:
line.text: Edit Actions: Clear All
line.words: ['Edit', 'Actions:', 'Clear', 'All']
config: {partialMatch: true}
searchWords: ['edit', 'action']
pm: []
Note that the line has Edit Actions: and we are searching for edit actions, no colon in the search.
When it loops through the code it ends up in
else {
idx += selectedWords.length;
selectedWords = [];
}
That will reset idx back to 0 and you are in an endless loop.
Here is a fix, not pretty and most likely not how you will fix it, that I am currently using so that I can stay on version 2.1.0:
function findPartialMatches(line, config, searchWords, pm) {
let selectedWords = [];
nut_js_1.providerRegistry.getLogProvider().debug(`Searching through ${line.words.length} words`);
for (let idx = 0; idx < line.words.length;) {
let cnt = -1;
const currentWord = line.words[idx];
const currentWordText = config?.caseSensitive ? currentWord.text : currentWord.text.toLocaleLowerCase();
if (currentWordText === searchWords[0]) {
selectedWords = [currentWord];
for (let i = 1; i < searchWords.length; ++i) {
const nextWord = line.words[idx + i];
const nextWordText = config?.caseSensitive ? nextWord.text : nextWord.text.toLocaleLowerCase();
if (nextWordText !== searchWords[i]) {
cnt = idx + i;
selectedWords = [];
break;
}
else {
selectedWords.push(nextWord);
}
}
if (selectedWords.length === searchWords.length) {
idx += selectedWords.length;
const wordsBoundingBox = (0, determineWordBoundingBox_function_1.determineWordBoundingBoxFunction)(selectedWords);
if (wordsBoundingBox != null) {
pm.push({
...line,
words: selectedWords,
boundingBox: (0, mapBoundingBoxToRegion_function_1.mapRegionToBoundingBox)(wordsBoundingBox)
});
nut_js_1.providerRegistry.getLogProvider().debug(`Found match ${selectedWords.map(word => word.text).join(" ")} at ${JSON.stringify(wordsBoundingBox)}`);
}
}
else {
idx = cnt !== -1 ? cnt : selectedWords.length
selectedWords = [];
}
}
else {
++idx;
}
}
}
Additional content
Please provide any (mandatory) additional data to reproduce the error (Dockerfiles etc.)
Hi @jweingarten-rv 👋
Thank you for this report and the detailed analysis!
I can confirm this bug and I'm currently evaluating possible ways to address this.
What's your expected outcome, given your input data?
Hi @s1hofmann,
well I actually was thinking about that as well a bit. To me the partialMatch setting is a bit unclear. Especially when I see that Azure comes back with an array of strings. Should in this case it pass evenhough there is a : in the text? I honestly am not sure.
The person that wrote the test, that uncovered the issue would most likely expect that Edit Actions would partially match Edit Actions: Clear All.
But does that imply that in case of a textLine search we just make sure that indexOf is >=0 anywhere in the line found? Someone could also expect that if you search for Edit Clear in a string of value Edit Actions: Clear All it should come back as found.
Not sure if there is a good solution without breaking existing use cases.
One option could be to add more options. In conjunction with partialMatch additional settings like ignoreNonAlphanumericCharacters
In my specific use case, I can go either way and make the test pass. Specifically its selecting a menu and I just added the : to the search string to make it pass.