Country name cleaning failed example
yibenhuang opened this issue · 0 comments
yibenhuang commented
Describe the bug
Hi, just found the country name "Virgin Islands (British)" would be failed to clean to the correct name.
To Reproduce
import pandas as pd
from dataprep.clean import clean_country
df = pd.DataFrame({"country": ["Virgin Islands (British)", "Virgin Islands (U.S.)"]})
clean_country(df, column="country", output_format="name")
Output:
country | country_clean | |
---|---|---|
0 | Virgin Islands (British) | NaN |
1 | Virgin Islands (U.S.) | United States Virgin Islands |
Expected behavior
The based on project country_converter can work like below.
import country_converter as coco
names = ["Virgin Islands (British)", "Virgin Islands (U.S.)"]
cc = coco.CountryConverter()
cc.convert(names=names, to="name_short")
# Output: ['British Virgin Islands', 'United States Virgin Islands']
Desktop (please complete the following information):
- OS: macOS
- Browser: Chrome
- Platform: Jupyter Notebook
- Platform Version 6.4.12
- Python Version: 3.10.5
- Dataprep Version: 0.4.5