Safe-DS/API-Editor

Calculating Levenshtein Distance for strings takes a long time

Closed this issue · 0 comments

Describe the bug

When calling package_parser.processing.migration.model._differ.distance_elements with two lists of characters, it takes a long time to get a result.

To Reproduce

Run the test package-parser/tests/processing/migration/test_differ.py and take a look at the time.

Expected behavior

It should be much faster.

Screenshots (optional)

No response

Additional Context (optional)

Possible Solutions:

  1. avoid strings in distance_elements (and use https://pypi.org/project/Levenshtein/ for them)
  2. use threads
  3. Instead of splitting by each character, split by case changing or by non-alphanumerical characters.