Needing Advice: Best algo(s) for distance based on "proportion of shared substrings"
Opened this issue · 0 comments
davidmcnabnz commented
Hi there, I'm just getting started with string similarity processing.
In my application, I need to compare short-ish strings of length 25-300 characters, and I need the 'distance between any two' metric to reward things like:
- Proportion of each string which is shared substrings, and
- Sizes of shared substrings, especially relative to the lengths of the strings being compared
Any suggestions, among the wealth of algorithms and modes supported in this package?
Cheers
David