seatgeek/fuzzywuzzy

`process.dedupe()` gives IndexError: list index out of range because of bug in `process.extractWithoutOrder()`

Thijsvandepoll opened this issue · 0 comments

Hi all,

I found a bug in process.extractWithoutOrder() which causes process.dedupe() to fail unexpectedly. The example:

process.dedupe(["BRITT JEFFREY S", "BRITT JEFFREY S.", "WIEDEMAN SCOTT", "WIEDERMANN SCOTT", "斯科特·维德曼", "杰弗里·S·布里特"])

which results in:

IndexError: list index out of range

The expected result here is:

dict_keys(['BRITT JEFFREY S.', 'WIEDERMANN SCOTT', '斯科特·维德曼', '杰弗里·S·布里特'])

I looked into the source code and I believe I found a bug in process.extractWithoutOrder() which sets the used (pre)processor different for the query then for the choices. I will create a merge request to fix this issue.