Calculating Levenshtein Distance for strings takes a long time
Closed this issue · 0 comments
Aclrian commented
Describe the bug
When calling package_parser.processing.migration.model._differ.distance_elements
with two lists of characters, it takes a long time to get a result.
To Reproduce
Run the test package-parser/tests/processing/migration/test_differ.py
and take a look at the time.
Expected behavior
It should be much faster.
Screenshots (optional)
No response
Additional Context (optional)
Possible Solutions:
- avoid strings in distance_elements (and use https://pypi.org/project/Levenshtein/ for them)
- use threads
- Instead of splitting by each character, split by case changing or by non-alphanumerical characters.