You are tasked by the business owners on the Marketing team to improve the matching process of a Lead in the system. Currently, a person (lead) is found in the system and considered a "match" if the input (search criteria) First and Last Name
is equal to the system's First and Last Name
.
We need to relax our matching logic slightly and implement fuzzy (approximate) string matching on the first name and last name.
We will be using the Levenshtein Distance
to implement this.
- Read the file,
leads.csv
which is located at the project root and use as your data store. - The application should accept a
first name
andlast name
(or a data structure representing each) as INPUT. - The INPUT will be compared against the Leads in the data store (
leads.csv
) using Levenshtein Distance. - The best match's
row ID
will be returned with theFirst Name
and theLast Name
of the respective match (again, feel free to return 3 values if your language supports it; or a data structure representing the 3 values).
The Marketing team has come up with the following match requirements using this fuzzy matching algorithm.
It is possible that there could be multiple matches that meet the below criteria. You should return the best match (lowest Levenshtein Distance). If there are 2+ matches with the same distance, then simply return one of them.
- The Levenshtein Distance for
First
Name should be no larger than 3. - The Levenshtein Distance for
Last
Name should be no larger than 1. - The total Levenshtein Distance between
First
ANDLast
name should be no larger than 3. - The required Levenshtein Distance for
Last Name
should be configurable so that the business can increase/decrease the tolerance as needed. A change to this configuration should reflect immediately and without restarting the application. How you implement this is up to you. NOTE if theLast Name
tolerance is configurable, you will need to update theTotal
's tolerance appropriately. - Case sensitivity should be ignored and should not contribute to the Levenshtein Distance (ie.
e
==E
).
leads.csv
is normalized to UPPERCASE- There WILL NOT be any surprises in the data, such as First/Last Name in the
FirstName
column and anull
LastName
column. - The only possible characters within
leads.csv
areA-Z
or'
, plus0-9
for the row ID.
- Use a language of your choice.
- NO 3rd party libraries or frameworks.
- Include your solution in a subdirectory that is named appropriately to determine who submitted it.
- Feel free to add names to
leads.csv
. The initial data in the file is meant to provide some examples of input data. - After submissions are in, a more extensive input file will be created and used for the testing/benchmarking.