Names in administrative datasets often vary; a name in one dataset may contain just the first and last name, whereas in another dataset it may also contain the middle initial and/or a suffix. For instance, John Doe in one dataset is John J. Doe Jr. in another dataset. Names across datasets also are often misspelled or spelled differently, i.e. Stephen as opposed to Steven. The problem of name matching becomes an issue when datasets need to be merged for analysis.
This Python program resolves this problem by matching names using fuzzy logic. The target name is matched with the best match in another dataset. The accuracy is very high because the coupling of the first and last name is often unique and the best match among other names in a list. For instance, John Doe will receive a higher matching "score" with John H. Doe Jr. than with John Needle. The algorithm therefore works best with a name list that tends to have names that are heterogeneous rather than homogeneous.