skrub-data/skrub

TableVectorizer raises when a categorical column contains `pd.NA`

Closed this issue · 1 comments

Describe the bug

import pandas as pd
from skrub import TableVectorizer


a = pd.Series(['one', 'two', None], dtype='string').astype('category')
df = pd.DataFrame(dict(a=a))
tv = TableVectorizer()
tv.fit_transform(df)

output:

TypeError: Encoders require their input argument must be uniformly strings or numbers. Got ['NAType', 'str']

Steps/Code to Reproduce

see above

Expected Results

see above

Actual Results

see above

Versions

latest scikit-learn & skrub

fixed by #902