timtadh/zhang-shasha

NumPy Integration

Closed this issue · 1 comments

Hi,

I'm trying to compute the edit distance between two trees (build of directories and files) and it functions well, when doing that with test folders with little content. Nevertheless, when I use folders with a lot of content (respectively, the both trees become extremely big) I get a Memory Error.

Do you have any Idea how I can fix the problem?
And can I somewhere find information on how to use NumPy for the computation and speed up the library?

You are probably going to have a bad time with this library for "very large trees."

  1. It isn't really designed for that use case.
  2. The zhang-shasha algorithm is probably too slow

I would recommend looking at approximate algorithms such at PQ-Gram https://doi.org/10.1145/1670243.1670247 .

Sorry for the slow response, I somehow missed this issue.