clean_phone module doesn't recognize e.164 extension format
yukewang1 opened this issue · 1 comments
Describe the bug
The E.164 standards state that phone numbers can be written in a format of +<CountryCode><City/AreaCode><LocalNumber>;ext=<ext>
. An example could be +19052223333;ext=555
. The current clean_phone()
function doesn't recognize such numbers because this rule is not specified in the regex at line 16, clean_phone.py.
To Reproduce
Steps to reproduce the behavior:
from dataprep.clean import clean_phone
import pandas as pd
df = pd.DataFrame({
"phone": ["+19052223333;ext=555"]
})
clean_phone(df, "phone", output_format="e164")
Expected behavior
The correct output should be +12345678901 ext. 1234
where as it doesn't regonize this format and outputs np.NaN
.
Desktop (please complete the following information):
- OS: macOS Monterey
- Browser: Chrome
- Platform: Jupyter Notebook
- Platform Version: 6.4.8
- Python Version: 3.9.9
- Dataprep Version: 0.4.2
Additional context
Here's a blog explaining e.164 standards, specifically about how to specify an extension. Link
Good catch! Thanks for your context, we will fix it soon!