/peach

an elixir library to generate potential fuzzy matches

Primary LanguageElixirMIT LicenseMIT

Peach

An elixir library to do approximate/fuzzy string matching.

Installation

If available in Hex, the package can be installed by adding peach to your list of dependencies in mix.exs:

def deps do
  [
    {:peach, "~> 0.2.0"}
  ]
end

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/peach.

Testing

To test run mix test.

To test with CSV data, create a folder in the /test/ folder called function_test_data and put the following CSVs in them:

  • normalise_whitespace.csv
  • remove_punc.csv
  • pre_process.csv
  • remove_emojis.csv
  • normalise_text.csv
  • replace_punc.csv
  • get_brief.csv
  • remove_numbers.csv

then run mix test or mix test.watch test/peach_test.exs --max-failures=1 --seed=0

Using

Below are some examples of how Peach might be used to do the type of fuzzy matching automation required in the first tier of a menu centred chatbot.

Menu number and keyword match.

    input = "2.)"
    keyword_set = MapSet.new(["1", "2", "menu"])

    matches =
      Peach.pre_process(input)
      |> Peach.find_exact_match(keyword_set)

    assert matches == "2"
    input = "_menu_"
    keyword_set = MapSet.new(["1", "2", "menu"])

    matches =
      Peach.pre_process(input)
      |> Peach.find_exact_match(keyword_set)

    assert matches == "menu"

Fuzzy keyword match with global threshold.

    input = "menuu"
    keyword_set = MapSet.new(["menu", "optin", "optout"])
    threshold = 1

    matches =
      Peach.pre_process(input)
      |> Peach.find_fuzzy_matches(keyword_set, threshold)

    assert matches == [{"menu", 1}]

Fuzzy keyword match with a threshold per keyword.

    input = "optint"
    keyword_threshold_set = MapSet.new([{"menu", 1}, {"optin", 2}, {"optout", 2}])

    matches =
      Peach.pre_process(input)
      |> Peach.find_fuzzy_matches(keyword_threshold_set)

    assert matches == [{"optin", 1}, {"optout", 2}]