lenguyenthedat/kaggle-for-fun

Issue with the Cabin feature

Opened this issue · 2 comments

Running LabelEncoder on the Cabins feature gives an error:

Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Cabin
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-121-48f3aad5f78e> in <module>()
      4     print(col)
      5     le.fit(list(train[col]) + list(cv[col]))
----> 6     train[col] = le.transform(train[col])
      7     cv[col] = le.transform(cv[col])

/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/label.py in transform(self, y)
    128         y = column_or_1d(y, warn=True)
    129 
--> 130         classes = np.unique(y)
    131         if len(np.intersect1d(classes, self.classes_)) < len(classes):
    132             diff = np.setdiff1d(classes, self.classes_)

/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
    208     ar = np.asanyarray(ar)
    209     if axis is None:
--> 210         return _unique1d(ar, return_index, return_inverse, return_counts)
    211     if not (-ar.ndim <= axis < ar.ndim):
    212         raise ValueError('Invalid axis kwarg specified for unique')

/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
    275         aux = ar[perm]
    276     else:
--> 277         ar.sort()
    278         aux = ar
    279     flag = np.concatenate(([True], aux[1:] != aux[:-1]))

TypeError: '>' not supported between instances of 'float' and 'str'

It looks like the reason is because there are missing values in the Cabins feature. How did you overcome this?

The same issue comes up for the Embarked feature.

I don't think it is supporting python 3 yet :)