ashvardanian/StringZilla

Broader benchmarks

ashvardanian opened this issue · 1 comments

Every heuristic has its weaknesses. Current benchmarks could be more helpful in understanding them. The bench.py should be changed to allow command-line arguments for various patterns, and if those aren't provided, it should, by default, cover a diverse set of use cases, printing final results into the console.

I've separated the benchmarks into separate categories - similarity functions, search operations, basic class interfaces and so on. They will support both user-provided input as text file (will be tokenized with 6 ASCII whitespace characters used as delimiters), as well as synthetic runs.