scikit-learn-contrib/category_encoders

Cat_boost encoder

xuanqing94 opened this issue · 2 comments

level_means = ((colmap['sum'] + self._mean) / (colmap['count'] + self.a)).where(level_notunique, self._mean)

That does't look like an interpolation of mean of global vs. mean of subgroup. I think it should be something like:

 level_means = ((colmap['sum'] + self._mean * self.a) / (colmap['count'] + self.a)).where(level_notunique, self._mean) 
glevv commented

Yep, this commit forget to add multiplication by a. It's a bug, good catch.

Reference paper, eq 1

Also could be that this line also needs fix?

X[col] = (temp['cumsum'] - y + self._mean) / (temp['cumcount'] + self.a)

fixed by #339
Thanks :)