sfu-db/dataprep

Country name cleaning failed example

yibenhuang opened this issue · 0 comments

Describe the bug
Hi, just found the country name "Virgin Islands (British)" would be failed to clean to the correct name.

To Reproduce

import pandas as pd
from dataprep.clean import clean_country

df = pd.DataFrame({"country": ["Virgin Islands (British)", "Virgin Islands (U.S.)"]})
clean_country(df, column="country", output_format="name")

Output:

country country_clean
0 Virgin Islands (British) NaN
1 Virgin Islands (U.S.) United States Virgin Islands

Expected behavior
The based on project country_converter can work like below.

import country_converter as coco

names = ["Virgin Islands (British)", "Virgin Islands (U.S.)"]
cc = coco.CountryConverter()

cc.convert(names=names, to="name_short")
# Output: ['British Virgin Islands', 'United States Virgin Islands']

Desktop (please complete the following information):

  • OS: macOS
  • Browser: Chrome
  • Platform: Jupyter Notebook
  • Platform Version 6.4.12
  • Python Version: 3.10.5
  • Dataprep Version: 0.4.5