MacHu-GWU/uszipcode-project

Fuzzy match error: by_city_and_state changes the city when it is not ambiguously spelled.

Opened this issue · 4 comments

kosar commented

Describe the bug
Searching for a zip code by city (by_city_and_state) is returning a zip code for a city who's name is close to the one provided, but unnecessarily so, since the city provided is unambiguous.

A clear and concise description of what the bug is.
Searching for city state : 'burien' , 'wa' returns a zip code for 'Burlington, WA'

To Reproduce
by_city_and_state using above test case.

Steps to reproduce the behavior:
import pandas as pd
from uszipcode import SearchEngine
searchObject = SearchEngine(simple_zipcode=True)
strCity='burien'
strState='WA'
res = searchObject.by_city_and_state(strCity, strState, returns=100)
res[0]
SimpleZipcode(zipcode='98233', zipcode_type='Standard', major_city='Burlington', post_office_city='Burlington, WA', common_city_list=['Burlington'], county='Skagit County', state='WA', lat=48.5, lng=-122.4, timezone='Pacific', radius_in_miles=10.0, area_code_list=['360'], population=14871, population_density=439.0, land_area_in_sqmi=33.85, water_area_in_sqmi=0.26, housing_units=5897, occupied_housing_units=5522, median_home_value=232700, median_household_income=52906, bounds_west=-122.444478, bounds_east=-122.285302, bounds_north=48.620048, bounds_south=48.444658)

Expected behavior
Should match on 'burien' as that is a valid city in WA.

Screenshots
see above, code snippet

Additional context

I love your work, and hope it helps to report this issue. keep it up.
I am working on a workaround to this issue, to search by state, and then try to find the match myself for the city name, which defeats some of the purpose of this awesome library. Wondering if there is a better pattern or workaround I could consider, if the above is just a side effect of fuzzy matching. Thanks so much.

@kosar good catch. I cannot fix it because it is highly depends on the fuzzy match algorithm I am using.

Yossi commented

Perhaps this project could migrate to https://github.com/maxbachmann/RapidFuzz which seems to still be maintained.

see here seatgeek/fuzzywuzzy#318 (comment)

@Yossi it says

On Windows the [Visual C++ 2019 redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) is required

This would be too harsh for windows user. Maybe I can use try ... except ... to let user to choose fuzzywuzzy or rapidfuzz

Actually this is not really the whole truth anymore. In case the c++ implementation is not available it falls back to a pure Python implementation similar to fuzzywuzzy (but without behavior differences between the Python and C++ version).

So while yes it is recommended to install the c++ redistributable for performance reasons, this is not really needed for the library to work anymore.